[ https://issues.apache.org/jira/browse/HBASE-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
terry zhang updated HBASE-6695: ------------------------------- Description: When we ware testing Replication failover feature we found if we kill a regionserver during it transferqueue ,we found only part of the hlog znode copy to the right path because failover process is interrupted. Log: 2012-08-29 12:20:05,660 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Moving dw92.kgb.sqa.cm4,60020,1346210789716's hlogs to my queue 2012-08-29 12:20:05,765 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dw92.kgb.sqa.cm4%2C60020%2C13462107 89716.1346213720708 with data 210508162 2012-08-29 12:20:05,850 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dw92.kgb.sqa.cm4%2C60020%2C13462107 89716.1346213886800 with data 2012-08-29 12:20:05,938 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dw92.kgb.sqa.cm4%2C60020%2C1346210789716.1346213830559 with data 2012-08-29 12:20:06,055 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dw92.kgb.sqa.cm4%2C60020%2C1346210789716.1346213775146 with data 2012-08-29 12:20:06,277 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Failed all from region=.ME TA.,,1.1028785192, hostname=dw93.kgb.sqa.cm4, port=60020 java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at ...... {color:red} This server is down ..... {color} ZK node status: [zk: 10.232.98.77:2181(CONNECTED) 6] ls /hbase-test3-repl/replication/rs/dw92.kgb.sqa.cm4,60020,1346210789716 [lock, 1, 1-dw89.kgb.sqa.cm4,60020,1346202436268] {color:red} dw92 is down , but Node dw92.kgb.sqa.cm4,60020,1346210789716 can't be deleted {color} was: When we ware testing Replication failover feature we found if we kill a regionserver during it transferqueue ,we found only part of the hlog znode copy to the right path because failover process is interrupted. Log: 2012-08-29 12:20:05,660 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Moving dw92.kgb.sqa.cm4,60020,1346210789716's hlogs to my queue 2012-08-29 12:20:05,765 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dw92.kgb.sqa.cm4%2C60020%2C13462107 89716.1346213720708 with data 210508162 2012-08-29 12:20:05,850 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dw92.kgb.sqa.cm4%2C60020%2C13462107 89716.1346213886800 with data 2012-08-29 12:20:05,938 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dw92.kgb.sqa.cm4%2C60020%2C1346210789716.1346213830559 with data 2012-08-29 12:20:06,055 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dw92.kgb.sqa.cm4%2C60020%2C1346210789716.1346213775146 with data 2012-08-29 12:20:06,277 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Failed all from region=.ME TA.,,1.1028785192, hostname=dw93.kgb.sqa.cm4, port=60020 java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at ...... {color:red} This server is down ..... {color} ZK node status: [zk: 10.232.98.77:2181(CONNECTED) 6] ls /hbase-test3-repl/replication/rs/dw92.kgb.sqa.cm4,60020,1346210789716 [lock, 1, 1-dw89.kgb.sqa.cm4,60020,1346202436268] {color:red} dw92 is down , but Node dw92.kgb.sqa.cm4,60020,1346210789716 can't be deleted {color} > [Replication] Data will lose if RegionServer down during transferqueue > ---------------------------------------------------------------------- > > Key: HBASE-6695 > URL: https://issues.apache.org/jira/browse/HBASE-6695 > Project: HBase > Issue Type: Bug > Components: replication > Affects Versions: 0.94.1 > Reporter: terry zhang > Priority: Critical > > When we ware testing Replication failover feature we found if we kill a > regionserver during it transferqueue ,we found only part of the hlog znode > copy to the right path because failover process is interrupted. > Log: > 2012-08-29 12:20:05,660 INFO > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: > Moving dw92.kgb.sqa.cm4,60020,1346210789716's hlogs to my queue > 2012-08-29 12:20:05,765 DEBUG > org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating > dw92.kgb.sqa.cm4%2C60020%2C13462107 89716.1346213720708 with data 210508162 > 2012-08-29 12:20:05,850 DEBUG > org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating > dw92.kgb.sqa.cm4%2C60020%2C13462107 89716.1346213886800 with data > 2012-08-29 12:20:05,938 DEBUG > org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating > dw92.kgb.sqa.cm4%2C60020%2C1346210789716.1346213830559 with data > 2012-08-29 12:20:06,055 DEBUG > org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating > dw92.kgb.sqa.cm4%2C60020%2C1346210789716.1346213775146 with data > 2012-08-29 12:20:06,277 WARN > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > Failed all from region=.ME > TA.,,1.1028785192, hostname=dw93.kgb.sqa.cm4, port=60020 > java.util.concurrent.ExecutionException: java.net.ConnectException: > Connection refused > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at > ...... > {color:red} > This server is down ..... > {color} > ZK node status: > [zk: 10.232.98.77:2181(CONNECTED) 6] ls > /hbase-test3-repl/replication/rs/dw92.kgb.sqa.cm4,60020,1346210789716 > [lock, 1, 1-dw89.kgb.sqa.cm4,60020,1346202436268] > {color:red} > dw92 is down , but Node dw92.kgb.sqa.cm4,60020,1346210789716 can't be deleted > {color} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira