[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9387: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to 0.96 and trunk so resolving. > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Fix For: 0.98.0, 0.96.0 > > Attachments: 9387-v1.txt, 9387-v3.txt, 9387-v4.2.txt, 9387-v4.3.txt, > 9387-v4.4.txt, 9387-v4.txt, 9387-v5.txt, 9387-v6.txt, 9387-v7.txt, > 9387-v8.txt, 9387-v9.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9387: -- Fix Version/s: 0.96.0 Integrated to 0.96 as well. > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Fix For: 0.98.0, 0.96.0 > > Attachments: 9387-v1.txt, 9387-v3.txt, 9387-v4.2.txt, 9387-v4.3.txt, > 9387-v4.4.txt, 9387-v4.txt, 9387-v5.txt, 9387-v6.txt, 9387-v7.txt, > 9387-v8.txt, 9387-v9.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9387: -- Fix Version/s: 0.98.0 Hadoop Flags: Reviewed Integrated to trunk. Thanks for the reviews. > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Fix For: 0.98.0 > > Attachments: 9387-v1.txt, 9387-v3.txt, 9387-v4.2.txt, 9387-v4.3.txt, > 9387-v4.4.txt, 9387-v4.txt, 9387-v5.txt, 9387-v6.txt, 9387-v7.txt, > 9387-v8.txt, 9387-v9.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9387: -- Attachment: 9387-v9.txt Patch v9 uses different messages for znode version mismatch and znode disappearance. > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Attachments: 9387-v1.txt, 9387-v3.txt, 9387-v4.2.txt, 9387-v4.3.txt, > 9387-v4.4.txt, 9387-v4.txt, 9387-v5.txt, 9387-v6.txt, 9387-v7.txt, > 9387-v8.txt, 9387-v9.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9387: -- Attachment: 9387-v8.txt Patch v8 moves the znode existence check and subsequent abortion to transitionToOpened(). This is to avoid unnecessary region server abortion. > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Attachments: 9387-v1.txt, 9387-v3.txt, 9387-v4.2.txt, 9387-v4.3.txt, > 9387-v4.4.txt, 9387-v4.txt, 9387-v5.txt, 9387-v6.txt, 9387-v7.txt, > 9387-v8.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9387: -- Attachment: 9387-v7.txt Patch v7 adds testRegionServerAbortionDueToFailureTransitioningToOpened in TestOpenRegionHandler which simulates the scenario described in this JIRA > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Attachments: 9387-v1.txt, 9387-v3.txt, 9387-v4.2.txt, 9387-v4.3.txt, > 9387-v4.4.txt, 9387-v4.txt, 9387-v5.txt, 9387-v6.txt, 9387-v7.txt, > hbase-9387.patch, org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9387: -- Attachment: 9387-v6.txt > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Attachments: 9387-v1.txt, 9387-v3.txt, 9387-v4.2.txt, 9387-v4.3.txt, > 9387-v4.4.txt, 9387-v4.txt, 9387-v5.txt, 9387-v6.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9387: -- Attachment: 9387-v4.4.txt Patch v4.4 removes unnecessary change. MockRegionServer#abort() calls stop() - HRegionServer#abort() does the same. Added comment in TestOpenRegionHandler#testYankingRegionFromUnderIt() explaining why region server abortion is expected. > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Attachments: 9387-v1.txt, 9387-v3.txt, 9387-v4.2.txt, 9387-v4.3.txt, > 9387-v4.4.txt, 9387-v4.txt, 9387-v5.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9387: -- Attachment: 9387-v4.3.txt Reused TestOpenRegionHandler#testYankingRegionFromUnderIt() for verification of region server abortion. > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Attachments: 9387-v1.txt, 9387-v3.txt, 9387-v4.2.txt, 9387-v4.3.txt, > 9387-v4.txt, 9387-v5.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9387: -- Attachment: 9387-v4.2.txt The abort() call in tryTransitionFromOfflineToFailedOpen() made TestRegionServerNoMaster fail. > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Attachments: 9387-v1.txt, 9387-v3.txt, 9387-v4.2.txt, 9387-v4.txt, > 9387-v5.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9387: -- Attachment: 9387-v5.txt Patch v5 checks whether znode exists. If znode doesn't exist, abort region server. Otherwise log warning. > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Attachments: 9387-v1.txt, 9387-v3.txt, 9387-v4.txt, 9387-v5.txt, > hbase-9387.patch, org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9387: -- Attachment: 9387-v4.txt Patch v4 removes the LOG.warn() statements. Let me see if I can write a test. > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Attachments: 9387-v1.txt, 9387-v3.txt, 9387-v4.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9387: - Priority: Critical (was: Major) Suggested workaround would be to review all transitions and on edges such as this one, going from OPENING to OPENED, if it fails, do a radical abort (not just for meta region). Then in another issue stepback and revisit our system for managing region manipulation. It is way to complex consuming way too many hours of eng. time and there are holes. > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Attachments: 9387-v1.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9387: -- Attachment: 9387-v3.txt Thanks for the comments. Patch v3 also handles failure scenario in tryTransitionFromOfflineToFailedOpen(). Overnight I looped TestFullLogReconstruction 100 times on the same machine where this issue was first produced, with patch v1. They all passed. Follow-on JIRA can be filed to make region transition handling better. > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Attachments: 9387-v1.txt, 9387-v3.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-9387: - Attachment: (was: hbase-9387.patch) > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 9387-v1.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-9387: - Attachment: hbase-9387.patch > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 9387-v1.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-9387: - Attachment: (was: hbase-9387.patch) > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 9387-v1.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-9387: - Attachment: hbase-9387.patch > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 9387-v1.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-9387: - Attachment: hbase-9387.patch [~te...@apache.org] I posted a patch to minimize RS aborting situation. > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 9387-v1.txt, hbase-9387.patch, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9387: -- Attachment: 9387-v1.txt First attempt at fixing the bug. If OpenRegionHandler#tryTransitionFromOpeningToFailedOpen() couldn't transition to FAILED_OPEN, region server aborts. In the test, one more region server is added. > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu > Attachments: 9387-v1.txt, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9387: -- Assignee: Ted Yu Status: Patch Available (was: Open) > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 9387-v1.txt, > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-9387: - Component/s: Region Assignment > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.95.2 >Reporter: Ted Yu > Attachments: > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9387) Region could get lost during assignment
[ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-9387: - Summary: Region could get lost during assignment (was: TestFullLogReconstruction#testReconstruction occasionally fails when distributed log replay is turned on) > Region could get lost during assignment > --- > > Key: HBASE-9387 > URL: https://issues.apache.org/jira/browse/HBASE-9387 > Project: HBase > Issue Type: Bug >Affects Versions: 0.95.2 >Reporter: Ted Yu > Attachments: > org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt > > > I observed test timeout running against hadoop 2.1.0 with distributed log > replay turned on. > Looks like region state for 1588230740 became inconsistent between master and > the surviving region server: > {code} > 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4] > master.RegionStates(299): Onlined 1588230740 on > kiyo.gq1.ygridcore.net,57016,1377814510039 > ... > 2013-08-29 22:15:34,587 DEBUG [Thread-221] > client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta > parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, > hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of > 35 failed; retrying after sleep of 302 because: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being > opened: 1588230740 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063) > at > org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira