[jira] [Updated] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-12166: -- Attachment: 12166.txt A bit of cleanup while in here. TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery. This would seem to indicate that we successfully wrote zk that we are recovering: {code} 2014-10-03 01:57:47,672 DEBUG [MASTER_SERVER_OPERATIONS-asf900:37113-0]
[jira] [Commented] (HBASE-12151) Make dev scripts executable
[ https://issues.apache.org/jira/browse/HBASE-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157718#comment-14157718 ] Hudson commented on HBASE-12151: FAILURE: Integrated in HBase-TRUNK #5613 (See [https://builds.apache.org/job/HBase-TRUNK/5613/]) HBASE-12151 Set mode to 755 on executable scripts in dev-support directory (mstanleyjones: rev 7219471081ab5f65ad7ae3b2deeb3c1659922102) * dev-support/jdiffHBasePublicAPI_common.sh * dev-support/publish_hbase_website.sh * dev-support/jenkinsEnv.sh * dev-support/hbase_docker.sh Make dev scripts executable --- Key: HBASE-12151 URL: https://issues.apache.org/jira/browse/HBASE-12151 Project: HBase Issue Type: Bug Components: scripts Reporter: Misty Stanley-Jones Assignee: Misty Stanley-Jones Priority: Minor Fix For: 2.0.0, 0.99.1 Attachments: HBASE-12151.patch Is there any reason not to make dev-support/*.sh executable? It would make it possible to sym-link to them from a directory in the executable path for easier execution of the definitive scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator
[ https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-11907: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to 0.98+ Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator --- Key: HBASE-11907 URL: https://issues.apache.org/jira/browse/HBASE-11907 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch The joni regex engine (https://github.com/jruby/joni), a Java port of Oniguruma regexp library done by the JRuby project, is: - MIT licensed - Designed to work with byte[] arguments instead of String - Capable of handling UTF8 encoding - Regex syntax compatible - Interruptible - *About twice as fast as j.u.regex* - Has JRuby's jcodings library as a dependency, also MIT licensed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12164) Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate
[ https://issues.apache.org/jira/browse/HBASE-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157717#comment-14157717 ] Hudson commented on HBASE-12164: FAILURE: Integrated in HBase-TRUNK #5613 (See [https://builds.apache.org/job/HBase-TRUNK/5613/]) HBASE-12164 Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate (tedyu: rev a17614d5b27936c64af47d90408df007b1112d89) * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.java Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate Key: HBASE-12164 URL: https://issues.apache.org/jira/browse/HBASE-12164 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: 12164-v1.txt, 12164-v1.txt Here is the code: {code} if (request.getFsToken().hasIdentifier() request.getFsToken().hasPassword()) { {code} In test case, request.getFsToken().hasIdentifier() returns false, leading to userToken being null. This would make secure bulk load unsuccessful because the body of secureBulkLoadHFiles() is skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10153) improve VerifyReplication to compute BADROWS more accurately
[ https://issues.apache.org/jira/browse/HBASE-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157719#comment-14157719 ] Hudson commented on HBASE-10153: FAILURE: Integrated in HBase-TRUNK #5613 (See [https://builds.apache.org/job/HBase-TRUNK/5613/]) HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev 8dbf7b22381dab18f9af13318c16181c42824d46) * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java improve VerifyReplication to compute BADROWS more accurately Key: HBASE-10153 URL: https://issues.apache.org/jira/browse/HBASE-10153 Project: HBase Issue Type: Improvement Components: Operability, Replication Affects Versions: 0.94.14 Reporter: cuijianwei Assignee: cuijianwei Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: 10153-0.98.txt, 10153-v2-trunk.txt, HBASE-10153-0.94-v1.patch, HBASE-10153-trunk.patch VerifyReplicaiton could compare the source table with its peer table and compute BADROWS. However, the current BADROWS computing method might not be accurate enough. For example, if source table contains rows as {r1, r2, r3, r4} and peer table contains rows as {r1, r3, r4} BADROWS will be 3 because 'r2' in source table will make all the later row comparisons fail. Will it be better if the BADROWS is computed to 1 in this situation? Maybe, we can compute the BADROWS more accurately in merge comparison? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator
[ https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157726#comment-14157726 ] Hudson commented on HBASE-11907: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #538 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/538/]) HBASE-11907 Use the joni byte[] regex engine in place of j.u.regex (apurtell: rev 579ce7a0d610352a7bcff5527ce24b04e8b2292a) * hbase-protocol/src/main/protobuf/Comparator.proto * hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java * hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ComparatorProtos.java * hbase-client/pom.xml * pom.xml * hbase-client/src/main/java/org/apache/hadoop/hbase/filter/RegexStringComparator.java Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator --- Key: HBASE-11907 URL: https://issues.apache.org/jira/browse/HBASE-11907 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch The joni regex engine (https://github.com/jruby/joni), a Java port of Oniguruma regexp library done by the JRuby project, is: - MIT licensed - Designed to work with byte[] arguments instead of String - Capable of handling UTF8 encoding - Regex syntax compatible - Interruptible - *About twice as fast as j.u.regex* - Has JRuby's jcodings library as a dependency, also MIT licensed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12164) Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate
[ https://issues.apache.org/jira/browse/HBASE-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157752#comment-14157752 ] Hudson commented on HBASE-12164: SUCCESS: Integrated in HBase-0.98 #566 (See [https://builds.apache.org/job/HBase-0.98/566/]) HBASE-12164 Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate (tedyu: rev 0409d22a15d6656d0368b6343b7b3349d22bdd77) * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.java Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate Key: HBASE-12164 URL: https://issues.apache.org/jira/browse/HBASE-12164 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: 12164-v1.txt, 12164-v1.txt Here is the code: {code} if (request.getFsToken().hasIdentifier() request.getFsToken().hasPassword()) { {code} In test case, request.getFsToken().hasIdentifier() returns false, leading to userToken being null. This would make secure bulk load unsuccessful because the body of secureBulkLoadHFiles() is skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12165) TestEndToEndSplitTransaction.testFromClientSideWhileSplitting fails
[ https://issues.apache.org/jira/browse/HBASE-12165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157754#comment-14157754 ] Hudson commented on HBASE-12165: FAILURE: Integrated in HBase-TRUNK #5614 (See [https://builds.apache.org/job/HBase-TRUNK/5614/]) HBASE-12165 TestEndToEndSplitTransaction.testFromClientSideWhileSplitting fails -- DEBUGGING STRINGS (stack: rev da9f2434b2ad9e85a7f726bb5334568ac772ec90) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java TestEndToEndSplitTransaction.testFromClientSideWhileSplitting fails --- Key: HBASE-12165 URL: https://issues.apache.org/jira/browse/HBASE-12165 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0 Reporter: stack Assignee: stack Fix For: 2.0.0, 0.99.1 Attachments: 12165.debug.txt Test fails but exhibited fail reason is complaining about an NPE. java.lang.NullPointerException: null at org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction.blockUntilRegionIsInMeta(TestEndToEndSplitTransaction.java:474) at org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction.blockUntilRegionSplit(TestEndToEndSplitTransaction.java:451) Looks like we are timing out waiting on split but NPE obscures actual reason for failure. Failed here https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12164) Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate
[ https://issues.apache.org/jira/browse/HBASE-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157781#comment-14157781 ] Hudson commented on HBASE-12164: SUCCESS: Integrated in HBase-1.0 #267 (See [https://builds.apache.org/job/HBase-1.0/267/]) HBASE-12164 Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate (tedyu: rev 660f909a58986151f300ebf6c7fbbea963cb3cf3) * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.java Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate Key: HBASE-12164 URL: https://issues.apache.org/jira/browse/HBASE-12164 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: 12164-v1.txt, 12164-v1.txt Here is the code: {code} if (request.getFsToken().hasIdentifier() request.getFsToken().hasPassword()) { {code} In test case, request.getFsToken().hasIdentifier() returns false, leading to userToken being null. This would make secure bulk load unsuccessful because the body of secureBulkLoadHFiles() is skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12165) TestEndToEndSplitTransaction.testFromClientSideWhileSplitting fails
[ https://issues.apache.org/jira/browse/HBASE-12165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157780#comment-14157780 ] Hudson commented on HBASE-12165: SUCCESS: Integrated in HBase-1.0 #267 (See [https://builds.apache.org/job/HBase-1.0/267/]) HBASE-12165 TestEndToEndSplitTransaction.testFromClientSideWhileSplitting fails -- DEBUGGING STRINGS (stack: rev 1dd70307018f9c259b42289ca615ac2d50c30565) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java TestEndToEndSplitTransaction.testFromClientSideWhileSplitting fails --- Key: HBASE-12165 URL: https://issues.apache.org/jira/browse/HBASE-12165 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0 Reporter: stack Assignee: stack Fix For: 2.0.0, 0.99.1 Attachments: 12165.debug.txt Test fails but exhibited fail reason is complaining about an NPE. java.lang.NullPointerException: null at org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction.blockUntilRegionIsInMeta(TestEndToEndSplitTransaction.java:474) at org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction.blockUntilRegionSplit(TestEndToEndSplitTransaction.java:451) Looks like we are timing out waiting on split but NPE obscures actual reason for failure. Failed here https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator
[ https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157784#comment-14157784 ] Hudson commented on HBASE-11907: FAILURE: Integrated in HBase-1.0 #268 (See [https://builds.apache.org/job/HBase-1.0/268/]) HBASE-11907 Use the joni byte[] regex engine in place of j.u.regex (apurtell: rev 5881eed36ebac0939daaa431000fd73fcf796c33) * hbase-client/src/main/java/org/apache/hadoop/hbase/filter/RegexStringComparator.java * pom.xml * hbase-client/pom.xml * hbase-protocol/src/main/protobuf/Comparator.proto * hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java * hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ComparatorProtos.java Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator --- Key: HBASE-11907 URL: https://issues.apache.org/jira/browse/HBASE-11907 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch The joni regex engine (https://github.com/jruby/joni), a Java port of Oniguruma regexp library done by the JRuby project, is: - MIT licensed - Designed to work with byte[] arguments instead of String - Capable of handling UTF8 encoding - Regex syntax compatible - Interruptible - *About twice as fast as j.u.regex* - Has JRuby's jcodings library as a dependency, also MIT licensed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator
[ https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157807#comment-14157807 ] Hudson commented on HBASE-11907: FAILURE: Integrated in HBase-0.98 #567 (See [https://builds.apache.org/job/HBase-0.98/567/]) HBASE-11907 Use the joni byte[] regex engine in place of j.u.regex (apurtell: rev 579ce7a0d610352a7bcff5527ce24b04e8b2292a) * hbase-client/src/main/java/org/apache/hadoop/hbase/filter/RegexStringComparator.java * hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java * hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ComparatorProtos.java * hbase-protocol/src/main/protobuf/Comparator.proto * hbase-client/pom.xml * pom.xml Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator --- Key: HBASE-11907 URL: https://issues.apache.org/jira/browse/HBASE-11907 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch The joni regex engine (https://github.com/jruby/joni), a Java port of Oniguruma regexp library done by the JRuby project, is: - MIT licensed - Designed to work with byte[] arguments instead of String - Capable of handling UTF8 encoding - Regex syntax compatible - Interruptible - *About twice as fast as j.u.regex* - Has JRuby's jcodings library as a dependency, also MIT licensed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator
[ https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157820#comment-14157820 ] Hudson commented on HBASE-11907: FAILURE: Integrated in HBase-TRUNK #5615 (See [https://builds.apache.org/job/HBase-TRUNK/5615/]) HBASE-11907 Use the joni byte[] regex engine in place of j.u.regex (apurtell: rev d8a7b67d798ab5fec399d4a0b97a025d5bff531c) * pom.xml * hbase-client/pom.xml * hbase-client/src/main/java/org/apache/hadoop/hbase/filter/RegexStringComparator.java * hbase-protocol/src/main/protobuf/Comparator.proto * hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ComparatorProtos.java * hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator --- Key: HBASE-11907 URL: https://issues.apache.org/jira/browse/HBASE-11907 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch The joni regex engine (https://github.com/jruby/joni), a Java port of Oniguruma regexp library done by the JRuby project, is: - MIT licensed - Designed to work with byte[] arguments instead of String - Capable of handling UTF8 encoding - Regex syntax compatible - Interruptible - *About twice as fast as j.u.regex* - Has JRuby's jcodings library as a dependency, also MIT licensed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11216) PerformanceEvaluation should provide an option to modify the value length.
[ https://issues.apache.org/jira/browse/HBASE-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158030#comment-14158030 ] Jean-Marc Spaggiari commented on HBASE-11216: - Hum. Sound like I need to rebase this :( PerformanceEvaluation should provide an option to modify the value length. -- Key: HBASE-11216 URL: https://issues.apache.org/jira/browse/HBASE-11216 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Priority: Minor Attachments: HBASE-11216-v0-trunk.patch, HBASE-11216-v1-trunk.patch, HBASE-11216-v2-trunk.patch All in the title. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11216) PerformanceEvaluation should provide an option to modify the value length.
[ https://issues.apache.org/jira/browse/HBASE-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Marc Spaggiari updated HBASE-11216: Resolution: Duplicate Status: Resolved (was: Patch Available) Duplicate of HBASE-11350 PerformanceEvaluation should provide an option to modify the value length. -- Key: HBASE-11216 URL: https://issues.apache.org/jira/browse/HBASE-11216 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Priority: Minor Attachments: HBASE-11216-v0-trunk.patch, HBASE-11216-v1-trunk.patch, HBASE-11216-v2-trunk.patch All in the title. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11350) [PE] Allow random value size
[ https://issues.apache.org/jira/browse/HBASE-11350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158038#comment-14158038 ] Jean-Marc Spaggiari commented on HBASE-11350: - [~lhofhansl] you might this to be backported to 0.94. [PE] Allow random value size Key: HBASE-11350 URL: https://issues.apache.org/jira/browse/HBASE-11350 Project: HBase Issue Type: Improvement Components: Performance Reporter: stack Assignee: stack Fix For: 0.99.0 Attachments: 11348.txt Allow PE to write random value sizes. Helpful mimic'ing 'real' sizings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator
[ https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158041#comment-14158041 ] Ted Yu commented on HBASE-11907: From https://builds.apache.org/job/HBase-1.0/268/console: {code} [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-1.0/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java:[26,49] error: package org.apache.hadoop.hbase.testclassification does not exist [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-1.0/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java:[27,49] error: package org.apache.hadoop.hbase.testclassification does not exist [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-1.0/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java:[32,11] error: cannot find symbol [ERROR] symbol: class FilterTests [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-1.0/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java:[32,30] error: cannot find symbol [ERROR] symbol: class SmallTests {code} Addendum for 1.0 handles the above Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator --- Key: HBASE-11907 URL: https://issues.apache.org/jira/browse/HBASE-11907 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch The joni regex engine (https://github.com/jruby/joni), a Java port of Oniguruma regexp library done by the JRuby project, is: - MIT licensed - Designed to work with byte[] arguments instead of String - Capable of handling UTF8 encoding - Regex syntax compatible - Interruptible - *About twice as fast as j.u.regex* - Has JRuby's jcodings library as a dependency, also MIT licensed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator
[ https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-11907: --- Attachment: 11907-1.0.addendum Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator --- Key: HBASE-11907 URL: https://issues.apache.org/jira/browse/HBASE-11907 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: 11907-1.0.addendum, HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch The joni regex engine (https://github.com/jruby/joni), a Java port of Oniguruma regexp library done by the JRuby project, is: - MIT licensed - Designed to work with byte[] arguments instead of String - Capable of handling UTF8 encoding - Regex syntax compatible - Interruptible - *About twice as fast as j.u.regex* - Has JRuby's jcodings library as a dependency, also MIT licensed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11625) Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum
[ https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158077#comment-14158077 ] Yuliang Jin commented on HBASE-11625: - Thanks for your reply. We are currently using {noformat} java version 1.6.0_37 Java(TM) SE Runtime Environment (build 1.6.0_37-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode) {noformat} and {noformat} Hadoop 2.0.0-cdh4.3.0 HBase 0.94.6-cdh4.3.0 {noformat} Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum - Key: HBASE-11625 URL: https://issues.apache.org/jira/browse/HBASE-11625 Project: HBase Issue Type: Bug Components: HFile Affects Versions: 0.94.21, 0.98.4, 0.98.5 Reporter: qian wang Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it could happen file corruption but it only can switch to hdfs checksum inputstream till validateBlockChecksum(). If the datablock's header corrupted when b = new HFileBlock(),it throws the exception Invalid HFile block magic and the rpc call fail -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12122) Try not to assign user regions to master all the time
[ https://issues.apache.org/jira/browse/HBASE-12122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-12122: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Try not to assign user regions to master all the time - Key: HBASE-12122 URL: https://issues.apache.org/jira/browse/HBASE-12122 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 2.0.0, 0.99.1 Attachments: hbase-12122.patch, hbase-12122_v2.patch Load balancer does a good job not to assign regions of tables not configured to put on the active master. However, if there is no other region server, it still assigns users regions to the master. This happens when all normal region servers are crashed and recovering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator
[ https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158144#comment-14158144 ] Andrew Purtell commented on HBASE-11907: +1 Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator --- Key: HBASE-11907 URL: https://issues.apache.org/jira/browse/HBASE-11907 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: 11907-1.0.addendum, HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch The joni regex engine (https://github.com/jruby/joni), a Java port of Oniguruma regexp library done by the JRuby project, is: - MIT licensed - Designed to work with byte[] arguments instead of String - Capable of handling UTF8 encoding - Regex syntax compatible - Interruptible - *About twice as fast as j.u.regex* - Has JRuby's jcodings library as a dependency, also MIT licensed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12167) NPE in AssignmentManager
Jimmy Xiang created HBASE-12167: --- Summary: NPE in AssignmentManager Key: HBASE-12167 URL: https://issues.apache.org/jira/browse/HBASE-12167 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang If we can't find a region plan, we should check. {noformat} 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12168) Document Rest gateway SPNEGO-based authentication for client
Jerry He created HBASE-12168: Summary: Document Rest gateway SPNEGO-based authentication for client Key: HBASE-12168 URL: https://issues.apache.org/jira/browse/HBASE-12168 Project: HBase Issue Type: Task Components: documentation, REST, security Reporter: Jerry He Fix For: 0.98.8, 0.99.1 After HBASE-5050, we seem to support SPNEGO-based authentication from client on Rest gateway. But I had a tough time finding the info. The support is not mentioned in Security book. In the security book, we still have: bq. It should be possible for clients to authenticate with the HBase cluster through the REST gateway in a pass-through manner via SPEGNO HTTP authentication. This is future work. The release note in HBASE-5050 seems to be obsolete as well. e.g. hbase.rest.kerberos.spnego.principal seems to be obsolete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12167) NPE in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-12167: Attachment: hbase-12167.patch NPE in AssignmentManager Key: HBASE-12167 URL: https://issues.apache.org/jira/browse/HBASE-12167 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: hbase-12167.patch If we can't find a region plan, we should check. {noformat} 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12167) NPE in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-12167: Fix Version/s: 0.99.1 2.0.0 Status: Patch Available (was: Open) NPE in AssignmentManager Key: HBASE-12167 URL: https://issues.apache.org/jira/browse/HBASE-12167 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: hbase-12167.patch If we can't find a region plan, we should check. {noformat} 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12168) Document Rest gateway SPNEGO-based authentication for client
[ https://issues.apache.org/jira/browse/HBASE-12168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158164#comment-14158164 ] Jerry He commented on HBASE-12168: -- The configuration steps are probably similar to or overlap with the set up for Rest security and impersonation support. But it is not clear. Document Rest gateway SPNEGO-based authentication for client Key: HBASE-12168 URL: https://issues.apache.org/jira/browse/HBASE-12168 Project: HBase Issue Type: Task Components: documentation, REST, security Reporter: Jerry He Fix For: 0.98.8, 0.99.1 After HBASE-5050, we seem to support SPNEGO-based authentication from client on Rest gateway. But I had a tough time finding the info. The support is not mentioned in Security book. In the security book, we still have: bq. It should be possible for clients to authenticate with the HBase cluster through the REST gateway in a pass-through manner via SPEGNO HTTP authentication. This is future work. The release note in HBASE-5050 seems to be obsolete as well. e.g. hbase.rest.kerberos.spnego.principal seems to be obsolete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12168) Document Rest gateway SPNEGO-based authentication for client
[ https://issues.apache.org/jira/browse/HBASE-12168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158175#comment-14158175 ] Jimmy Xiang commented on HBASE-12168: - Have you checked the refguid, section 8.1.6, 8.1.7? http://hbase.apache.org/book/security.html#hbase.secure.configuration How should we improve it? Thanks. Document Rest gateway SPNEGO-based authentication for client Key: HBASE-12168 URL: https://issues.apache.org/jira/browse/HBASE-12168 Project: HBase Issue Type: Task Components: documentation, REST, security Reporter: Jerry He Fix For: 0.98.8, 0.99.1 After HBASE-5050, we seem to support SPNEGO-based authentication from client on Rest gateway. But I had a tough time finding the info. The support is not mentioned in Security book. In the security book, we still have: bq. It should be possible for clients to authenticate with the HBase cluster through the REST gateway in a pass-through manner via SPEGNO HTTP authentication. This is future work. The release note in HBASE-5050 seems to be obsolete as well. e.g. hbase.rest.kerberos.spnego.principal seems to be obsolete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12167) NPE in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158183#comment-14158183 ] stack commented on HBASE-12167: --- Go for it. When the IOE comes up, who catches it? What happens? NPE in AssignmentManager Key: HBASE-12167 URL: https://issues.apache.org/jira/browse/HBASE-12167 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: hbase-12167.patch If we can't find a region plan, we should check. {noformat} 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12168) Document Rest gateway SPNEGO-based authentication for client
[ https://issues.apache.org/jira/browse/HBASE-12168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158194#comment-14158194 ] Jerry He commented on HBASE-12168: -- Hi, [~jxiang] Yes. We can improve the sections, and/or add a section to mention SPNEGO-based authentication for client. I am trying to set it up. After that, I probably will have more input. Document Rest gateway SPNEGO-based authentication for client Key: HBASE-12168 URL: https://issues.apache.org/jira/browse/HBASE-12168 Project: HBase Issue Type: Task Components: documentation, REST, security Reporter: Jerry He Fix For: 0.98.8, 0.99.1 After HBASE-5050, we seem to support SPNEGO-based authentication from client on Rest gateway. But I had a tough time finding the info. The support is not mentioned in Security book. In the security book, we still have: bq. It should be possible for clients to authenticate with the HBase cluster through the REST gateway in a pass-through manner via SPEGNO HTTP authentication. This is future work. The release note in HBASE-5050 seems to be obsolete as well. e.g. hbase.rest.kerberos.spnego.principal seems to be obsolete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12167) NPE in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158199#comment-14158199 ] Jimmy Xiang commented on HBASE-12167: - SSH catches it and reprocess the dead server. In SSH, we tried to wait for an extra regionserver. But it is not reliable since the extra regionserver could die after SSH thinks the extra server is there. So it is possible for this NPE and it is rare. Reprocessing the dead server should help. NPE in AssignmentManager Key: HBASE-12167 URL: https://issues.apache.org/jira/browse/HBASE-12167 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: hbase-12167.patch If we can't find a region plan, we should check. {noformat} 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12152) TestLoadIncrementalHFiles shows up as zombie test
[ https://issues.apache.org/jira/browse/HBASE-12152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158217#comment-14158217 ] Elliott Clark commented on HBASE-12152: --- Please take other comments/more findings to a new jira. This one has been rendered un-usable. TestLoadIncrementalHFiles shows up as zombie test - Key: HBASE-12152 URL: https://issues.apache.org/jira/browse/HBASE-12152 Project: HBase Issue Type: Test Reporter: Ted Yu Attachments: TestSecureLoadIncrementalHFilesSplitRecovery-fix.txt TestLoadIncrementalHFiles and TestLoadIncrementalHFilesSplitRecovery frequently show up as zombie tests (from 0.98 to master branch). e.g. https://builds.apache.org/job/hbase-0.98/558/console Here is snippet of stack trace for TestLoadIncrementalHFilesSplitRecovery : {code} main prio=10 tid=0x7f8670008000 nid=0x1105 waiting on condition [0x7f8674b57000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00078d4c3ba0 (a java.util.concurrent.FutureTask$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:248) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:382) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:324) at org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery.testGroupOrSplitWhenRegionHoleExistsInMeta(TestLoadIncrementalHFilesSplitRecovery. java:470) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator
[ https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158238#comment-14158238 ] Hudson commented on HBASE-11907: FAILURE: Integrated in HBase-1.0 #269 (See [https://builds.apache.org/job/HBase-1.0/269/]) HBASE-11907 Addendum fixes test category import for TestRegexComparator (tedyu: rev 566686d9e97a79143d7661ce34587456eed235ff) * hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator --- Key: HBASE-11907 URL: https://issues.apache.org/jira/browse/HBASE-11907 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: 11907-1.0.addendum, HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch The joni regex engine (https://github.com/jruby/joni), a Java port of Oniguruma regexp library done by the JRuby project, is: - MIT licensed - Designed to work with byte[] arguments instead of String - Capable of handling UTF8 encoding - Regex syntax compatible - Interruptible - *About twice as fast as j.u.regex* - Has JRuby's jcodings library as a dependency, also MIT licensed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12167) NPE in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158261#comment-14158261 ] Hadoop QA commented on HBASE-12167: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672784/hbase-12167.patch against trunk revision . ATTACHMENT ID: 12672784 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster org.apache.hadoop.hbase.master.TestMasterFailover org.apache.hadoop.hbase.TestZooKeeper org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11206//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11206//console This message is automatically generated. NPE in AssignmentManager Key: HBASE-12167 URL: https://issues.apache.org/jira/browse/HBASE-12167 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: hbase-12167.patch If we can't find a region plan, we should check. {noformat} 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA
[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158275#comment-14158275 ] stack commented on HBASE-12166: --- [~jxiang] Problem here is that master is carrying hbase:namespace but when recovering we act as though it only is hosting hbase:meta. We mark hbase:meta as recovering and do its log splitting. For hbase:namespace, we find it along w/ other regions and mark it as recovering only because it was on the master -- and master no longer has associated WALs because of above meta processing -- it just stays stuck in recovering mode. If you have suggestion I'm all ears else I'll hack something in. TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66):
[jira] [Commented] (HBASE-12104) Some optimization and bugfix for HTableMultiplexer
[ https://issues.apache.org/jira/browse/HBASE-12104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158290#comment-14158290 ] Yi Deng commented on HBASE-12104: - [~eclark] I think the failing the tests are not related to my code. Some optimization and bugfix for HTableMultiplexer -- Key: HBASE-12104 URL: https://issues.apache.org/jira/browse/HBASE-12104 Project: HBase Issue Type: Sub-task Components: Client Affects Versions: 2.0.0 Reporter: Yi Deng Assignee: Yi Deng Labels: multiplexer Fix For: 2.0.0 Attachments: 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 0001-Make-HTableMultiplexerStatus-public.patch Make HTableMultiplexerStatus public Delay before resubmit. Fix some missing counting on total failure. Use ScheduledExecutorService to simplify the code. Other refactoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12167) NPE in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158306#comment-14158306 ] stack commented on HBASE-12167: --- +1 The test failures are not yours. They are 'classics' NPE in AssignmentManager Key: HBASE-12167 URL: https://issues.apache.org/jira/browse/HBASE-12167 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: hbase-12167.patch If we can't find a region plan, we should check. {noformat} 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12104) Some optimization and bugfix for HTableMultiplexer
[ https://issues.apache.org/jira/browse/HBASE-12104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158309#comment-14158309 ] stack commented on HBASE-12104: --- Those faliures are not related. You committing [~eclark] or I can if you'd like. Some optimization and bugfix for HTableMultiplexer -- Key: HBASE-12104 URL: https://issues.apache.org/jira/browse/HBASE-12104 Project: HBase Issue Type: Sub-task Components: Client Affects Versions: 2.0.0 Reporter: Yi Deng Assignee: Yi Deng Labels: multiplexer Fix For: 2.0.0 Attachments: 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 0001-Make-HTableMultiplexerStatus-public.patch Make HTableMultiplexerStatus public Delay before resubmit. Fix some missing counting on total failure. Use ScheduledExecutorService to simplify the code. Other refactoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158319#comment-14158319 ] Jimmy Xiang commented on HBASE-12166: - Good investigation. Table namespace is handled just the same as any other user tables. Let me take look. TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery. This would seem to indicate that we successfully wrote zk that
[jira] [Updated] (HBASE-12075) Preemptive Fast Fail
[ https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manukranth Kolloju updated HBASE-12075: --- Status: Open (was: Patch Available) Preemptive Fast Fail Key: HBASE-12075 URL: https://issues.apache.org/jira/browse/HBASE-12075 Project: HBase Issue Type: Sub-task Components: Client Affects Versions: 0.98.6.1, 0.99.0, 2.0.0 Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch, 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch In multi threaded clients, we use a feature developed on 0.89-fb branch called Preemptive Fast Fail. This allows the client threads which would potentially fail, fail fast. The idea behind this feature is that we allow, among the hundreds of client threads, one thread to try and establish connection with the regionserver and if that succeeds, we mark it as a live node again. Meanwhile, other threads which are trying to establish connection to the same server would ideally go into the timeouts which is effectively unfruitful. We can in those cases return appropriate exceptions to those clients instead of letting them retry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12075) Preemptive Fast Fail
[ https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158335#comment-14158335 ] Manukranth Kolloju commented on HBASE-12075: Are there any more comments related to the patch ? Preemptive Fast Fail Key: HBASE-12075 URL: https://issues.apache.org/jira/browse/HBASE-12075 Project: HBase Issue Type: Sub-task Components: Client Affects Versions: 0.99.0, 2.0.0, 0.98.6.1 Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch, 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch In multi threaded clients, we use a feature developed on 0.89-fb branch called Preemptive Fast Fail. This allows the client threads which would potentially fail, fail fast. The idea behind this feature is that we allow, among the hundreds of client threads, one thread to try and establish connection with the regionserver and if that succeeds, we mark it as a live node again. Meanwhile, other threads which are trying to establish connection to the same server would ideally go into the timeouts which is effectively unfruitful. We can in those cases return appropriate exceptions to those clients instead of letting them retry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12075) Preemptive Fast Fail
[ https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manukranth Kolloju updated HBASE-12075: --- Status: Patch Available (was: Open) Preemptive Fast Fail Key: HBASE-12075 URL: https://issues.apache.org/jira/browse/HBASE-12075 Project: HBase Issue Type: Sub-task Components: Client Affects Versions: 0.98.6.1, 0.99.0, 2.0.0 Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch, 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch In multi threaded clients, we use a feature developed on 0.89-fb branch called Preemptive Fast Fail. This allows the client threads which would potentially fail, fail fast. The idea behind this feature is that we allow, among the hundreds of client threads, one thread to try and establish connection with the regionserver and if that succeeds, we mark it as a live node again. Meanwhile, other threads which are trying to establish connection to the same server would ideally go into the timeouts which is effectively unfruitful. We can in those cases return appropriate exceptions to those clients instead of letting them retry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158340#comment-14158340 ] stack commented on HBASE-12166: --- [~jxiang] A simple 'fix' would be to not host meta:namespace on master? I don't mind finishing this one (I can make it fail reliably). Was just looking for input on how you'd like it solved. TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192
[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158375#comment-14158375 ] Jimmy Xiang commented on HBASE-12166: - You are right. Not hosting namespace on master can solve the issue. Your fix is fine with me. I'd like to look into it further to find out the root. Thanks. TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery. This would
[jira] [Updated] (HBASE-11394) Replication can have data loss if peer id contains hyphen -
[ https://issues.apache.org/jira/browse/HBASE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-11394: -- Labels: beginner (was: ) Replication can have data loss if peer id contains hyphen - - Key: HBASE-11394 URL: https://issues.apache.org/jira/browse/HBASE-11394 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Talat UYARER Labels: beginner Fix For: 2.0.0, 0.99.1 This is an extension to HBASE-8207. It seems that there is no check for the peer id string (which is the short name for the replication peer) format. So in case a peer id containing -, it will cause data loss silently on server failure. I did not verify the claim via testing though, this is just purely from reading the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11625) Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum
[ https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158395#comment-14158395 ] Enis Soztutar commented on HBASE-11625: --- Nick pointed me out to this issue. I have been trying to nail down a test failure on windows (TestHFileBlock#testConcurrentReading) which fails with the same stack trace. I can repro the failure only on windows, but with jdk6, jdk7u45 and jdk7u67 as well, and hadoop versions 2.2.0, 2.4.0, 2.5.0 and 2.6.0-SNAPSHOT. The test does write a file containing random HFileBlocks and does concurrent reads from multiple threads. Once in a while, whatever we seek() + read() does not match what is there in the file (I've verified multiple times from offsets to the actual file). I think there is a rare edge case that we are hitting. The other interesting bit is that the test only starts failing after 0.98.3. I was not able to get previous versions to fail. But also I was not able to bisect the commit because reproducing the failure is not easy. Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum - Key: HBASE-11625 URL: https://issues.apache.org/jira/browse/HBASE-11625 Project: HBase Issue Type: Bug Components: HFile Affects Versions: 0.94.21, 0.98.4, 0.98.5 Reporter: qian wang Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it could happen file corruption but it only can switch to hdfs checksum inputstream till validateBlockChecksum(). If the datablock's header corrupted when b = new HFileBlock(),it throws the exception Invalid HFile block magic and the rpc call fail -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12104) Some optimization and bugfix for HTableMultiplexer
[ https://issues.apache.org/jira/browse/HBASE-12104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158401#comment-14158401 ] Elliott Clark commented on HBASE-12104: --- If you want to commit that would be awesome. If not I can get it in a couple of hours if you haven't yet. Some optimization and bugfix for HTableMultiplexer -- Key: HBASE-12104 URL: https://issues.apache.org/jira/browse/HBASE-12104 Project: HBase Issue Type: Sub-task Components: Client Affects Versions: 2.0.0 Reporter: Yi Deng Assignee: Yi Deng Labels: multiplexer Fix For: 2.0.0 Attachments: 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 0001-Make-HTableMultiplexerStatus-public.patch Make HTableMultiplexerStatus public Delay before resubmit. Fix some missing counting on total failure. Use ScheduledExecutorService to simplify the code. Other refactoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12039) Lower log level for TableNotFoundException log message when throwing
[ https://issues.apache.org/jira/browse/HBASE-12039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158403#comment-14158403 ] Lars Hofhansl commented on HBASE-12039: --- Nope... Let's just remove that stupid log. We'll get a message as soon as we'll try to connect in earnest (after we preloaded the cache). Lower log level for TableNotFoundException log message when throwing Key: HBASE-12039 URL: https://issues.apache.org/jira/browse/HBASE-12039 Project: HBase Issue Type: Bug Reporter: James Taylor Assignee: stack Priority: Minor Fix For: 0.98.7, 0.94.25 Attachments: 12039-0.94.txt, 12039.txt Our HBase client tries to get the HTable descriptor for a table that may or may not exist. We catch and ignore the TableNotFoundException if it occurs, but the log message appear regardless of this which confuses our users. Would it be possible to lower the log level of this message since the exception is already being throw (making it up to the caller how they want to handle this). 14/09/20 20:01:54 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: _IDX_TEST.TESTING, row=_IDX_TEST.TESTING,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:151) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1059) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1121) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1001) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:958) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:251) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:243) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12164) Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate
[ https://issues.apache.org/jira/browse/HBASE-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-12164: --- Attachment: 12164.addendum There was a test failure in: https://builds.apache.org/job/PreCommit-HBASE-Build/11207/console Proposed addendum which logs the exception. Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate Key: HBASE-12164 URL: https://issues.apache.org/jira/browse/HBASE-12164 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: 12164-v1.txt, 12164-v1.txt, 12164.addendum Here is the code: {code} if (request.getFsToken().hasIdentifier() request.getFsToken().hasPassword()) { {code} In test case, request.getFsToken().hasIdentifier() returns false, leading to userToken being null. This would make secure bulk load unsuccessful because the body of secureBulkLoadHFiles() is skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12075) Preemptive Fast Fail
[ https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158432#comment-14158432 ] Ted Yu commented on HBASE-12075: I manually triggered QA run. See: https://builds.apache.org/job/PreCommit-HBASE-Build/11208/console Not sure why QA bot didn't pick up the latest patch. Preemptive Fast Fail Key: HBASE-12075 URL: https://issues.apache.org/jira/browse/HBASE-12075 Project: HBase Issue Type: Sub-task Components: Client Affects Versions: 0.99.0, 2.0.0, 0.98.6.1 Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch, 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch In multi threaded clients, we use a feature developed on 0.89-fb branch called Preemptive Fast Fail. This allows the client threads which would potentially fail, fail fast. The idea behind this feature is that we allow, among the hundreds of client threads, one thread to try and establish connection with the regionserver and if that succeeds, we mark it as a live node again. Meanwhile, other threads which are trying to establish connection to the same server would ideally go into the timeouts which is effectively unfruitful. We can in those cases return appropriate exceptions to those clients instead of letting them retry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12156) TableName cache isn't used for one of valueOf methods.
[ https://issues.apache.org/jira/browse/HBASE-12156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-12156: --- Summary: TableName cache isn't used for one of valueOf methods. (was: TableName cache doesn't used for once of valueOf methods.) TableName cache isn't used for one of valueOf methods. -- Key: HBASE-12156 URL: https://issues.apache.org/jira/browse/HBASE-12156 Project: HBase Issue Type: Bug Reporter: Andrey Stepachev Assignee: Andrey Stepachev Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-12156-addendum-0.98.patch, HBASE-12156.patch there is wrong comparison, copypaste code compares namespace with qualifier and namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (HBASE-12169) Document IPC binding options
[ https://issues.apache.org/jira/browse/HBASE-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey moved ACCUMULO-3195 to HBASE-12169: --- Component/s: (was: docs) Affects Version/s: (was: 0.99.0) (was: 0.94.7) (was: 0.98.0) 0.98.0 0.94.7 0.99.0 Workflow: no-reopen-closed, patch-avail (was: patch-available, re-open possible) Key: HBASE-12169 (was: ACCUMULO-3195) Project: HBase (was: Accumulo) Document IPC binding options Key: HBASE-12169 URL: https://issues.apache.org/jira/browse/HBASE-12169 Project: HBase Issue Type: Task Affects Versions: 0.99.0, 0.94.7, 0.98.0 Reporter: Sean Busbey Priority: Minor HBASE-8148 added options to change binding component services, but there aren't any docs for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12167) NPE in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158444#comment-14158444 ] Hudson commented on HBASE-12167: FAILURE: Integrated in HBase-1.0 #270 (See [https://builds.apache.org/job/HBase-1.0/270/]) HBASE-12167 NPE in AssignmentManager (jxiang: rev c452942f57daa0ac8075556ed5d03940a0a13571) * hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java NPE in AssignmentManager Key: HBASE-12167 URL: https://issues.apache.org/jira/browse/HBASE-12167 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: hbase-12167.patch If we can't find a region plan, we should check. {noformat} 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12073) Shell command user_permission fails on the table created by user if he is not global admin.
[ https://issues.apache.org/jira/browse/HBASE-12073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158448#comment-14158448 ] Hadoop QA commented on HBASE-12073: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12671988/HBASE-12073.patch against trunk revision . ATTACHMENT ID: 12671988 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + verifyDenied(listTablesRestrictedAction, USER_CREATE, USER_RW, USER_RO, USER_NONE, TABLE_ADMIN); + LOG.error(error during call of AccessControlClient.getUserPermissions. + e.getStackTrace()); {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestSecureLoadIncrementalHFilesSplitRecovery org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11207//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11207//console This message is automatically generated. Shell command user_permission fails on the table created by user if he is not global admin. -- Key: HBASE-12073 URL: https://issues.apache.org/jira/browse/HBASE-12073 Project: HBase Issue Type: Bug Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Priority: Minor Attachments: HBASE-12073.patch The command fails as the changes introduced by HBASE-10892 requires user (because of newly introduced call to getTableDescriptors) to have global admin permission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158455#comment-14158455 ] stack commented on HBASE-12166: --- The problem is that the meta region has no edits in it so when we list the filesystem to find crashed servers, we see this: 476 2014-10-03 10:21:42,470 INFO [ActiveMasterManager] master.ServerManager(918): Finished waiting for region servers count to settle; checked in 6, slept for 1012 ms, expecting minimum of 5, maximum of 6, master is running 477 2014-10-03 10:21:42,471 INFO [ActiveMasterManager] master.MasterFileSystem(253): Log folder hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61565,1412356895892 belongs to an existing region server 478 2014-10-03 10:21:42,471 INFO [ActiveMasterManager] master.MasterFileSystem(249): Log folder hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61572,1412356895952 doesn't belong to a known region server, splitting 479 2014-10-03 10:21:42,471 INFO [ActiveMasterManager] master.MasterFileSystem(253): Log folder hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61576,1412356896091 belongs to an existing region server 480 2014-10-03 10:21:42,471 INFO [ActiveMasterManager] master.MasterFileSystem(253): Log folder hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61579,1412356896131 belongs to an existing region server 481 2014-10-03 10:21:42,471 INFO [ActiveMasterManager] master.MasterFileSystem(253): Log folder hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61582,1412356896169 belongs to an existing region server 482 2014-10-03 10:21:42,471 INFO [ActiveMasterManager] master.MasterFileSystem(253): Log folder hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61586,1412356896205 belongs to an existing region server 483 2014-10-03 10:21:42,471 INFO [ActiveMasterManager] master.MasterFileSystem(253): Log folder hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61591,1412356896245 belongs to an existing region server i.e. all servers but the dead master which in this case is localhost,61562,1412356895859 When we go to online hbase:meta, it complains that there is no WAL file: 501 2014-10-03 10:21:42,478 INFO [ActiveMasterManager] master.MasterFileSystem(325): Log dir for server localhost,61562,1412356895859 does not exist 502 2014-10-03 10:21:42,479 DEBUG [ActiveMasterManager] master.MasterFileSystem(323): Renamed region directory: hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61572,1412356895952-splitting 503 2014-10-03 10:21:42,479 INFO [ActiveMasterManager] master.SplitLogManager(536): dead splitlog workers [localhost,61562,1412356895859, localhost,61572,1412356895952] 504 2014-10-03 10:21:42,480 INFO [ActiveMasterManager] master.SplitLogManager(172): hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61572,1412356895952-splitting is empty dir, no logs to split 505 2014-10-03 10:21:42,480 DEBUG [ActiveMasterManager] master.SplitLogManager(235): Scheduling batch of logs to split 506 2014-10-03 10:21:42,480 INFO [ActiveMasterManager] master.SplitLogManager(237): started splitting 0 logs in [hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61572,1412356895952-splitting] TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at
[jira] [Commented] (HBASE-12167) NPE in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158461#comment-14158461 ] Hudson commented on HBASE-12167: FAILURE: Integrated in HBase-TRUNK #5616 (See [https://builds.apache.org/job/HBase-TRUNK/5616/]) HBASE-12167 NPE in AssignmentManager (jxiang: rev 5375ff07bcb6451e45c09f23f010a4d051968896) * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java NPE in AssignmentManager Key: HBASE-12167 URL: https://issues.apache.org/jira/browse/HBASE-12167 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: hbase-12167.patch If we can't find a region plan, we should check. {noformat} 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12075) Preemptive Fast Fail
[ https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158470#comment-14158470 ] Ted Yu commented on HBASE-12075: Jenkins machine got rebooted. Here is the new run: https://builds.apache.org/job/PreCommit-HBASE-Build/11209/console Preemptive Fast Fail Key: HBASE-12075 URL: https://issues.apache.org/jira/browse/HBASE-12075 Project: HBase Issue Type: Sub-task Components: Client Affects Versions: 0.99.0, 2.0.0, 0.98.6.1 Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch, 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch In multi threaded clients, we use a feature developed on 0.89-fb branch called Preemptive Fast Fail. This allows the client threads which would potentially fail, fail fast. The idea behind this feature is that we allow, among the hundreds of client threads, one thread to try and establish connection with the regionserver and if that succeeds, we mark it as a live node again. Meanwhile, other threads which are trying to establish connection to the same server would ideally go into the timeouts which is effectively unfruitful. We can in those cases return appropriate exceptions to those clients instead of letting them retry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11940) Add utility scripts for snapshotting / restoring all tables in cluster
[ https://issues.apache.org/jira/browse/HBASE-11940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-11940: --- Attachment: 11940-v1.txt Add utility scripts for snapshotting / restoring all tables in cluster -- Key: HBASE-11940 URL: https://issues.apache.org/jira/browse/HBASE-11940 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Attachments: 11940-v1.txt, snapshot-all.sh, snapshot_restore.sh This JIRA is to provide script that snapshot all the tables in a cluster. Another script is to restore all the tables in cluster. Use cases include table backup prior to upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker
[ https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virag Kothari updated HBASE-12136: -- Attachment: HBASE-12136-0.98.patch Thanks [~tedyu] for review. Attached is the patch for 0.98 Race condition between client adding tableCF replication znode and server triggering TableCFsTracker - Key: HBASE-12136 URL: https://issues.apache.org/jira/browse/HBASE-12136 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.6 Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch In ReplicationPeersZKImpl.addPeer(), there is a race between client creating tableCf znode and the server triggering TableCFsTracker. If the server wins, it wont be able to read the data set on tableCF znode and replication will be misconfigured -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HBASE-12166: --- Assignee: Jimmy Xiang (was: stack) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery. This would seem to indicate that we successfully wrote zk that we are recovering: {code} 2014-10-03 01:57:47,672 DEBUG [MASTER_SERVER_OPERATIONS-asf900:37113-0]
[jira] [Commented] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker
[ https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158516#comment-14158516 ] Hadoop QA commented on HBASE-12136: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672836/HBASE-12136-0.98.patch against trunk revision . ATTACHMENT ID: 12672836 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11210//console This message is automatically generated. Race condition between client adding tableCF replication znode and server triggering TableCFsTracker - Key: HBASE-12136 URL: https://issues.apache.org/jira/browse/HBASE-12136 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.6 Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch In ReplicationPeersZKImpl.addPeer(), there is a race between client creating tableCf znode and the server triggering TableCFsTracker. If the server wins, it wont be able to read the data set on tableCF znode and replication will be misconfigured -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158515#comment-14158515 ] Jimmy Xiang commented on HBASE-12166: - I think I found out the cause. In ZKSplitLogManagerCoordination#removeRecoveringRegions: {noformat} listSize = failedServers.size(); for (int j = 0; j listSize; j++) { {noformat} The listSize is redefined. That's not a bug, it is a hidden bomb :) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66):
[jira] [Updated] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-12166: Attachment: hbase-12166.patch TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, hbase-12166.patch, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery. This would seem to indicate that we successfully wrote zk that we are recovering: {code} 2014-10-03 01:57:47,672 DEBUG [MASTER_SERVER_OPERATIONS-asf900:37113-0]
[jira] [Updated] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-12166: Status: Patch Available (was: Open) Attached a simple patch. The test is ok locally now. Let's see what the jenkins says. Hope this is the last DLR bug. TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, hbase-12166.patch, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery. This would seem to indicate that we
[jira] [Commented] (HBASE-11625) Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum
[ https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158526#comment-14158526 ] Paul Fleetwood commented on HBASE-11625: java version 1.6.0_65 Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609) Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode) 0.98.5-hadoop1, rUnknown, Mon Aug 4 23:39:24 PDT 2014 Hadoop 1.2.1 Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152 Compiled by mattf on Mon Jul 22 15:23:09 PDT 2013 From source with checksum 6923c86528809c4e7e6f493b6b413a9a Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum - Key: HBASE-11625 URL: https://issues.apache.org/jira/browse/HBASE-11625 Project: HBase Issue Type: Bug Components: HFile Affects Versions: 0.94.21, 0.98.4, 0.98.5 Reporter: qian wang Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it could happen file corruption but it only can switch to hdfs checksum inputstream till validateBlockChecksum(). If the datablock's header corrupted when b = new HFileBlock(),it throws the exception Invalid HFile block magic and the rpc call fail -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-12166: Component/s: wal TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test, wal Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, hbase-12166.patch, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery. This would seem to indicate that we successfully wrote zk that we are recovering: {code} 2014-10-03 01:57:47,672 DEBUG [MASTER_SERVER_OPERATIONS-asf900:37113-0]
[jira] [Commented] (HBASE-11625) Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum
[ https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158528#comment-14158528 ] Andrew Purtell commented on HBASE-11625: bq. The other interesting bit is that the test only starts failing after 0.98.3. 0.98.3 was released June 7. Here are commits touching o.a.h.h.io.hfile in hbase-server in 0.98 since that date, excluding changes to LRU given the above description: HBASE-8, HBASE-11437, HBASE 11586, HBASE-11331, HBASE-11845, HBASE-12059, HBASE-12076, HBASE-12123. I think we can suspect less later changes like HBASE-11331 and beyond if this is observable in releases like 0.98.4. Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum - Key: HBASE-11625 URL: https://issues.apache.org/jira/browse/HBASE-11625 Project: HBase Issue Type: Bug Components: HFile Affects Versions: 0.94.21, 0.98.4, 0.98.5 Reporter: qian wang Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it could happen file corruption but it only can switch to hdfs checksum inputstream till validateBlockChecksum(). If the datablock's header corrupted when b = new HFileBlock(),it throws the exception Invalid HFile block magic and the rpc call fail -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-11625) Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum
[ https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158528#comment-14158528 ] Andrew Purtell edited comment on HBASE-11625 at 10/3/14 9:12 PM: - bq. The other interesting bit is that the test only starts failing after 0.98.3. 0.98.3 was released June 7. Here are commits touching o.a.h.h.io.hfile in hbase-server in 0.98 since that date, excluding changes to block cache given the above description: HBASE-8, HBASE-11437, HBASE-11586, HBASE-11331, HBASE-11845, HBASE-12059, HBASE-12076, HBASE-12123. I think we can suspect less later changes like HBASE-11331 and beyond if this is observable in releases like 0.98.4. was (Author: apurtell): bq. The other interesting bit is that the test only starts failing after 0.98.3. 0.98.3 was released June 7. Here are commits touching o.a.h.h.io.hfile in hbase-server in 0.98 since that date, excluding changes to LRU given the above description: HBASE-8, HBASE-11437, HBASE 11586, HBASE-11331, HBASE-11845, HBASE-12059, HBASE-12076, HBASE-12123. I think we can suspect less later changes like HBASE-11331 and beyond if this is observable in releases like 0.98.4. Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum - Key: HBASE-11625 URL: https://issues.apache.org/jira/browse/HBASE-11625 Project: HBase Issue Type: Bug Components: HFile Affects Versions: 0.94.21, 0.98.4, 0.98.5 Reporter: qian wang Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it could happen file corruption but it only can switch to hdfs checksum inputstream till validateBlockChecksum(). If the datablock's header corrupted when b = new HFileBlock(),it throws the exception Invalid HFile block magic and the rpc call fail -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12137) Alter table add cf doesn't do compression test
[ https://issues.apache.org/jira/browse/HBASE-12137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virag Kothari updated HBASE-12137: -- Attachment: HBASE-12137.patch HBASE-12137-0.98.patch Thanks [~jmspaggi] for review. Changed name to columnDescriptor. Also attached patch for 0.98 Alter table add cf doesn't do compression test -- Key: HBASE-12137 URL: https://issues.apache.org/jira/browse/HBASE-12137 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.98.6 Reporter: Virag Kothari Assignee: Virag Kothari Attachments: HBASE-12137-0.98.patch, HBASE-12137.patch, HBASE-12137.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12137) Alter table add cf doesn't do compression test
[ https://issues.apache.org/jira/browse/HBASE-12137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virag Kothari updated HBASE-12137: -- Fix Version/s: 0.99.1 0.98.7 2.0.0 Alter table add cf doesn't do compression test -- Key: HBASE-12137 URL: https://issues.apache.org/jira/browse/HBASE-12137 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.98.6 Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-12137-0.98.patch, HBASE-12137.patch, HBASE-12137.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker
[ https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-12136: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker - Key: HBASE-12136 URL: https://issues.apache.org/jira/browse/HBASE-12136 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.6 Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch In ReplicationPeersZKImpl.addPeer(), there is a race between client creating tableCf znode and the server triggering TableCFsTracker. If the server wins, it wont be able to read the data set on tableCF znode and replication will be misconfigured -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker
[ https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158556#comment-14158556 ] Ted Yu commented on HBASE-12136: There was a conflict in hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeersZKImpl.java which I resolved. Integrated to 0.98, branch-1 and master Thanks for the contribution, Virag Race condition between client adding tableCF replication znode and server triggering TableCFsTracker - Key: HBASE-12136 URL: https://issues.apache.org/jira/browse/HBASE-12136 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.6 Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch In ReplicationPeersZKImpl.addPeer(), there is a race between client creating tableCf znode and the server triggering TableCFsTracker. If the server wins, it wont be able to read the data set on tableCF znode and replication will be misconfigured -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12126) Region server coprocessor endpoint
[ https://issues.apache.org/jira/browse/HBASE-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virag Kothari updated HBASE-12126: -- Attachment: HBASE-12126-0.98_1.patch Updated the 98 patch with SingletonCoprocessorService interface Region server coprocessor endpoint -- Key: HBASE-12126 URL: https://issues.apache.org/jira/browse/HBASE-12126 Project: HBase Issue Type: Bug Components: Coprocessors Affects Versions: 0.98.6 Reporter: Virag Kothari Assignee: Virag Kothari Attachments: HBASE-12126-0.98.patch, HBASE-12126-0.98_1.patch Utility to make endpoint calls against region server -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker
[ https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158575#comment-14158575 ] Hudson commented on HBASE-12136: FAILURE: Integrated in HBase-1.0 #271 (See [https://builds.apache.org/job/HBase-1.0/271/]) HBASE-12136 Race condition between client adding tableCF replication znode and server triggering TableCFsTracker (Virag Kothari) (tedyu: rev 6b95b4a8a4a49dc7877271118c36d5e916d336ab) * hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeerZKImpl.java * hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeersZKImpl.java Race condition between client adding tableCF replication znode and server triggering TableCFsTracker - Key: HBASE-12136 URL: https://issues.apache.org/jira/browse/HBASE-12136 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.6 Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch In ReplicationPeersZKImpl.addPeer(), there is a race between client creating tableCf znode and the server triggering TableCFsTracker. If the server wins, it wont be able to read the data set on tableCF znode and replication will be misconfigured -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker
[ https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-12136: Virag: Any idea of the test failure in branch-1 ? I couldn't reproduce locally. Race condition between client adding tableCF replication znode and server triggering TableCFsTracker - Key: HBASE-12136 URL: https://issues.apache.org/jira/browse/HBASE-12136 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.6 Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch In ReplicationPeersZKImpl.addPeer(), there is a race between client creating tableCf znode and the server triggering TableCFsTracker. If the server wins, it wont be able to read the data set on tableCF znode and replication will be misconfigured -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12075) Preemptive Fast Fail
[ https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158584#comment-14158584 ] Hadoop QA commented on HBASE-12075: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12671882/0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch against trunk revision . ATTACHMENT ID: 12671882 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpointNoMaster org.apache.hadoop.hbase.master.TestDistributedLogSplitting {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.master.TestMasterNoCluster.testNotPullingDeadRegionServerFromZK(TestMasterNoCluster.java:306) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11209//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11209//console This message is automatically generated. Preemptive Fast Fail Key: HBASE-12075 URL: https://issues.apache.org/jira/browse/HBASE-12075 Project: HBase Issue Type: Sub-task Components: Client Affects Versions: 0.99.0, 2.0.0, 0.98.6.1 Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch, 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch In multi threaded clients, we use a feature developed on 0.89-fb branch called Preemptive Fast Fail. This allows the client threads which would potentially fail, fail fast. The idea behind this feature is that we allow, among the hundreds of client threads, one thread to try and establish connection with the regionserver and if that succeeds, we mark it as a live node again. Meanwhile, other threads which are trying to establish connection to the same server would ideally go into the timeouts which is effectively unfruitful. We can in those cases return appropriate exceptions to those clients instead of letting them retry. -- This message was sent by Atlassian JIRA
[jira] [Commented] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker
[ https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158590#comment-14158590 ] Virag Kothari commented on HBASE-12136: --- In https://builds.apache.org/job/HBase-1.0/271, org.apache.hadoop.hbase.util.TestTableName.testValueOf fails. That might be related to HBASE-12156 Race condition between client adding tableCF replication znode and server triggering TableCFsTracker - Key: HBASE-12136 URL: https://issues.apache.org/jira/browse/HBASE-12136 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.6 Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch In ReplicationPeersZKImpl.addPeer(), there is a race between client creating tableCf znode and the server triggering TableCFsTracker. If the server wins, it wont be able to read the data set on tableCF znode and replication will be misconfigured -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker
[ https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-12136. Resolution: Fixed Oops, TestTableName is not related to the change in this JIRA. Race condition between client adding tableCF replication znode and server triggering TableCFsTracker - Key: HBASE-12136 URL: https://issues.apache.org/jira/browse/HBASE-12136 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.6 Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch In ReplicationPeersZKImpl.addPeer(), there is a race between client creating tableCf znode and the server triggering TableCFsTracker. If the server wins, it wont be able to read the data set on tableCF znode and replication will be misconfigured -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12167) NPE in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-12167: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Integrated into branch 1 and master. Thanks. NPE in AssignmentManager Key: HBASE-12167 URL: https://issues.apache.org/jira/browse/HBASE-12167 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: hbase-12167.patch If we can't find a region plan, we should check. {noformat} 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12167) NPE in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158598#comment-14158598 ] Jimmy Xiang commented on HBASE-12167: - Checked in an addendum to fix TestMasterObserver. NPE in AssignmentManager Key: HBASE-12167 URL: https://issues.apache.org/jira/browse/HBASE-12167 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: hbase-12167.patch If we can't find a region plan, we should check. {noformat} 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker
[ https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158610#comment-14158610 ] Hudson commented on HBASE-12136: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #539 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/539/]) HBASE-12136 Race condition between client adding tableCF replication znode and server triggering TableCFsTracker (Virag Kothari) (tedyu: rev a9138d7f96910f09e52b226248ccb169c98d6bd4) * hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeersZKImpl.java Race condition between client adding tableCF replication znode and server triggering TableCFsTracker - Key: HBASE-12136 URL: https://issues.apache.org/jira/browse/HBASE-12136 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.6 Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch In ReplicationPeersZKImpl.addPeer(), there is a race between client creating tableCf znode and the server triggering TableCFsTracker. If the server wins, it wont be able to read the data set on tableCF znode and replication will be misconfigured -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12075) Preemptive Fast Fail
[ https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158613#comment-14158613 ] Manukranth Kolloju commented on HBASE-12075: Is 900 for surefire.timeout too low? Preemptive Fast Fail Key: HBASE-12075 URL: https://issues.apache.org/jira/browse/HBASE-12075 Project: HBase Issue Type: Sub-task Components: Client Affects Versions: 0.99.0, 2.0.0, 0.98.6.1 Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch, 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch, 0001-Implement-Preemptive-Fast-Fail.patch In multi threaded clients, we use a feature developed on 0.89-fb branch called Preemptive Fast Fail. This allows the client threads which would potentially fail, fail fast. The idea behind this feature is that we allow, among the hundreds of client threads, one thread to try and establish connection with the regionserver and if that succeeds, we mark it as a live node again. Meanwhile, other threads which are trying to establish connection to the same server would ideally go into the timeouts which is effectively unfruitful. We can in those cases return appropriate exceptions to those clients instead of letting them retry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158621#comment-14158621 ] Hadoop QA commented on HBASE-12166: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672838/hbase-12166.patch against trunk revision . ATTACHMENT ID: 12672838 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpoint Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11211//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11211//console This message is automatically generated. TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test, wal Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, hbase-12166.patch, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946
[jira] [Updated] (HBASE-11764) Support per cell TTLs
[ https://issues.apache.org/jira/browse/HBASE-11764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-11764: --- Attachment: HBASE-11764-0.98.patch Support per cell TTLs - Key: HBASE-11764 URL: https://issues.apache.org/jira/browse/HBASE-11764 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-11764-0.98.patch, HBASE-11764-0.98.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11764) Support per cell TTLs
[ https://issues.apache.org/jira/browse/HBASE-11764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-11764: --- Status: Patch Available (was: Open) Support per cell TTLs - Key: HBASE-11764 URL: https://issues.apache.org/jira/browse/HBASE-11764 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-11764-0.98.patch, HBASE-11764-0.98.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11764) Support per cell TTLs
[ https://issues.apache.org/jira/browse/HBASE-11764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-11764: --- Attachment: HBASE-11764.patch Updated patches for master and 0.98 that adjust implementation of cell TTLs to avoid changes to ColumnTrackers (HBASE-11763, moved out) Support per cell TTLs - Key: HBASE-11764 URL: https://issues.apache.org/jira/browse/HBASE-11764 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-11764-0.98.patch, HBASE-11764-0.98.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158641#comment-14158641 ] Jimmy Xiang commented on HBASE-12166: - TestMasterObserver should be fixed by the addendumo of HBASE-12167. TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test, wal Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, hbase-12166.patch, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery. This would seem to indicate that we successfully wrote zk that we are
[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158645#comment-14158645 ] Jimmy Xiang commented on HBASE-12166: - [~stack], [~jeffreyz], could you take a look the patch? Thanks. TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test, wal Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, hbase-12166.patch, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery. This would seem to indicate that we successfully wrote zk that we are
[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158649#comment-14158649 ] Jimmy Xiang commented on HBASE-12166: - TestRegionReplicaReplicationEndpoint is ok locally. I can increase the timeout a little at checkin (from 1000 to 6000?). TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test, wal Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, hbase-12166.patch, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery. This would seem
[jira] [Commented] (HBASE-12137) Alter table add cf doesn't do compression test
[ https://issues.apache.org/jira/browse/HBASE-12137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158655#comment-14158655 ] Hadoop QA commented on HBASE-12137: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672842/HBASE-12137.patch against trunk revision . ATTACHMENT ID: 12672842 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.util.TestHBaseFsck org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11212//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11212//console This message is automatically generated. Alter table add cf doesn't do compression test -- Key: HBASE-12137 URL: https://issues.apache.org/jira/browse/HBASE-12137 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.98.6 Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-12137-0.98.patch, HBASE-12137.patch, HBASE-12137.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158515#comment-14158515 ] Jimmy Xiang edited comment on HBASE-12166 at 10/3/14 11:08 PM: --- I think I found out the cause. In ZKSplitLogManagerCoordination#removeRecoveringRegions: {noformat} listSize = failedServers.size(); for (int j = 0; j listSize; j++) { {noformat} The listSize is redefined. was (Author: jxiang): I think I found out the cause. In ZKSplitLogManagerCoordination#removeRecoveringRegions: {noformat} listSize = failedServers.size(); for (int j = 0; j listSize; j++) { {noformat} The listSize is redefined. That's not a bug, it is a hidden bomb :) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test, wal Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, hbase-12166.patch, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03
[jira] [Updated] (HBASE-11764) Support per cell TTLs
[ https://issues.apache.org/jira/browse/HBASE-11764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-11764: --- Status: Open (was: Patch Available) Support per cell TTLs - Key: HBASE-11764 URL: https://issues.apache.org/jira/browse/HBASE-11764 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-11764-0.98.patch, HBASE-11764-0.98.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker
[ https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158659#comment-14158659 ] Hudson commented on HBASE-12136: FAILURE: Integrated in HBase-TRUNK #5617 (See [https://builds.apache.org/job/HBase-TRUNK/5617/]) HBASE-12136 Race condition between client adding tableCF replication znode and server triggering TableCFsTracker (Virag Kothari) (tedyu: rev efe0787c87ca03e548bec13d8ae24200f582b438) * hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeerZKImpl.java * hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeersZKImpl.java Race condition between client adding tableCF replication znode and server triggering TableCFsTracker - Key: HBASE-12136 URL: https://issues.apache.org/jira/browse/HBASE-12136 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.6 Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch In ReplicationPeersZKImpl.addPeer(), there is a race between client creating tableCf znode and the server triggering TableCFsTracker. If the server wins, it wont be able to read the data set on tableCF znode and replication will be misconfigured -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12167) NPE in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158660#comment-14158660 ] Hudson commented on HBASE-12167: FAILURE: Integrated in HBase-TRUNK #5617 (See [https://builds.apache.org/job/HBase-TRUNK/5617/]) HBASE-12167 addendum; fix TestMasterObserver (jxiang: rev dbef2bdafe5500c0abc8fc61d3539d3b7a2132b9) * hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java NPE in AssignmentManager Key: HBASE-12167 URL: https://issues.apache.org/jira/browse/HBASE-12167 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: hbase-12167.patch If we can't find a region plan, we should check. {noformat} 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158663#comment-14158663 ] stack commented on HBASE-12166: --- [~jxiang] Yeah, thats it. I just ran into it (Didn't believe it...). Test passed for me when I made the change, +1 and +1 to upping timeout (Am checking other uses of 'listSize' -- smile). TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test, wal Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, hbase-12166.patch, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region:
[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158667#comment-14158667 ] Jeffrey Zhong commented on HBASE-12166: --- [~jxiang]Good catch! Looks good to me(+1). Better change the variable name listSize2 to tmpFailedServerSizse though. TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test, wal Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, hbase-12166.patch, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery. This would seem
[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158673#comment-14158673 ] stack commented on HBASE-12166: --- There is another bomb in that same class [~jxiang] TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test, wal Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, hbase-12166.patch, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery. This would seem to indicate that we successfully wrote zk that we are recovering: {code}
[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158679#comment-14158679 ] Jimmy Xiang commented on HBASE-12166: - [~stack], good catch! Unbeliveable! TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test, wal Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, hbase-12166.patch, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery. This would seem to indicate that we successfully wrote zk that we are recovering: {code}
[jira] [Updated] (HBASE-11764) Support per cell TTLs
[ https://issues.apache.org/jira/browse/HBASE-11764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-11764: --- Attachment: HBASE-11764-0.98.patch Support per cell TTLs - Key: HBASE-11764 URL: https://issues.apache.org/jira/browse/HBASE-11764 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: HBASE-11764-0.98.patch, HBASE-11764-0.98.patch, HBASE-11764-0.98.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
[ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-12166: Attachment: hbase-12166_v2.patch Attched v2 that fixed the issue Stack found. TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork --- Key: HBASE-12166 URL: https://issues.apache.org/jira/browse/HBASE-12166 Project: HBase Issue Type: Bug Components: test, wal Reporter: stack Assignee: Jimmy Xiang Fix For: 2.0.0, 0.99.1 Attachments: 12166.txt, hbase-12166.patch, hbase-12166_v2.patch, log.txt See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/ The namespace region gets stuck. It is never 'recovered' even though we have finished log splitting. Here is the main exception: {code} 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: ClientService methodName: Get size: 99 connection: 67.195.81.144:44526 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering 4943 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058) 4944 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086) 4945 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072) 4946 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014) 4947 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988) 4948 at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690) 4949 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418) 4950 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) 4951 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) 4952 at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) 4953 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) 4954 at java.lang.Thread.run(Thread.java:744) {code} See how we've finished log splitting long time previous: {code} 2014-10-03 01:57:48,129 INFO [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294): finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting] in 379ms {code} If I grep for the deleting of znodes on recovery, which is when we set the recovering flag to false, I see a bunch of regions but not my namespace one: 2014-10-03 01:57:47,330 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery. 2014-10-03 01:57:48,119 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery. 2014-10-03 01:57:48,121 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde completes recovery. 2014-10-03 01:57:48,122 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery. 2014-10-03 01:57:48,124 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery. 2014-10-03 01:57:48,125 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery. 2014-10-03 01:57:48,126 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583 completes recovery. 2014-10-03 01:57:48,128 INFO [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery. This would seem to indicate that we successfully wrote zk that we are recovering:
[jira] [Commented] (HBASE-11350) [PE] Allow random value size
[ https://issues.apache.org/jira/browse/HBASE-11350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158697#comment-14158697 ] Lars Hofhansl commented on HBASE-11350: --- I agree. This'd be useful in 0.94, 0.98, and 1.0 as well. [~apurtell], [~enis], I assume you have no objections. This is PE only. [PE] Allow random value size Key: HBASE-11350 URL: https://issues.apache.org/jira/browse/HBASE-11350 Project: HBase Issue Type: Improvement Components: Performance Reporter: stack Assignee: stack Fix For: 0.99.0 Attachments: 11348.txt Allow PE to write random value sizes. Helpful mimic'ing 'real' sizings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)