[jira] [Commented] (HBASE-10788) Add 99th percentile of latency in PE
[ https://issues.apache.org/jira/browse/HBASE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940227#comment-13940227 ] Liu Shaohui commented on HBASE-10788: - [~ndimiduk] Yes, I would like to add percentiles in the base class: Test and add all percentiles for all tests. Sorry for not noticing the percentiles code in the randomRead test. I will redo the patch based on HBASE-10007. Add 99th percentile of latency in PE Key: HBASE-10788 URL: https://issues.apache.org/jira/browse/HBASE-10788 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Attachments: HBASE-10788-trunk-v1.diff In production env, 99th percentile of latency is more important than the avg. The 99th percentile is helpful to measure the influence of GC, slow read/write of HDFS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-9740) A corrupt HFile could cause endless attempts to assign the region without a chance of success
[ https://issues.apache.org/jira/browse/HBASE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ping updated HBASE-9740: Attachment: HBase-9740_0.94_v4.patch hi, Hofhansl, thanks for your suggestions, I modified hashmap to ConcurrentHashMap and also replace Integer with AtomicInteger for its values. please review that A corrupt HFile could cause endless attempts to assign the region without a chance of success - Key: HBASE-9740 URL: https://issues.apache.org/jira/browse/HBASE-9740 Project: HBase Issue Type: Bug Affects Versions: 0.94.16 Reporter: Aditya Kishore Assignee: Ping Fix For: 0.94.19 Attachments: HBase-9740_0.94_v4.patch, HBase-9749_0.94_v2.patch, HBase-9749_0.94_v3.patch, patch-9740_0.94.txt As described in HBASE-9737, a corrupt HFile in a region could lead to an assignment storm in the cluster since the Master will keep trying to assign the region to each region server one after another and obviously none will succeed. The region server, upon detecting such a scenario should mark the region as RS_ZK_REGION_FAILED_ERROR (or something to the effect) in the Zookeeper which should indicate the Master to stop assigning the region until the error has been resolved (via an HBase shell command, probably assign?) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10787) TestHCM#testConnection* take too long
[ https://issues.apache.org/jira/browse/HBASE-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940247#comment-13940247 ] Anoop Sam John commented on HBASE-10787: Looks good. TestHCM#testConnection* take too long - Key: HBASE-10787 URL: https://issues.apache.org/jira/browse/HBASE-10787 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10787-v1.txt TestHCM#testConnectionClose takes more than 5 minutes on Apache Jenkins. The test can be shortened when retry count is lowered. On my Mac, for TestHCM#testConnection* (two tests) without patch: {code} Running org.apache.hadoop.hbase.client.TestHCM 2014-03-18 15:46:57.695 java[71368:1203] Unable to load realm info from SCDynamicStore Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 242.2 sec {code} with patch: {code} Running org.apache.hadoop.hbase.client.TestHCM 2014-03-18 15:40:44.013 java[71184:1203] Unable to load realm info from SCDynamicStore Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 100.465 sec {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1
[ https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10781: -- Attachment: 10690v2.txt Send folks to 0.96 and 0.98 doc if building those releases (A note in the doc under build does that). Make this doc change about building 99 and 1.0. Changed the make-rc.sh script to do 1.0. Let me test a bit more then will commit if it works. Remove hadoop-one-compat module and all references to hadoop1 - Key: HBASE-10781 URL: https://issues.apache.org/jira/browse/HBASE-10781 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Fix For: 0.99.0 Attachments: 10690.txt, 10690v2.txt Clean out hadoop1 references. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10531) Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo
[ https://issues.apache.org/jira/browse/HBASE-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940264#comment-13940264 ] Hadoop QA commented on HBASE-10531: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635491/HBASE-10531_7.patch against trunk revision . ATTACHMENT ID: 12635491 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 32 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9041//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9041//console This message is automatically generated. Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo Key: HBASE-10531 URL: https://issues.apache.org/jira/browse/HBASE-10531 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.99.0 Attachments: HBASE-10531.patch, HBASE-10531_1.patch, HBASE-10531_2.patch, HBASE-10531_3.patch, HBASE-10531_4.patch, HBASE-10531_5.patch, HBASE-10531_6.patch, HBASE-10531_7.patch Currently the byte[] key passed to HFileScanner.seekTo and HFileScanner.reseekTo, is a combination of row, cf, qual, type and ts. And the caller forms this by using kv.getBuffer, which is actually deprecated. So see how this can be achieved considering kv.getBuffer is removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-9740) A corrupt HFile could cause endless attempts to assign the region without a chance of success
[ https://issues.apache.org/jira/browse/HBASE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940273#comment-13940273 ] Hadoop QA commented on HBASE-9740: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635500/HBase-9740_0.94_v4.patch against trunk revision . ATTACHMENT ID: 12635500 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9042//console This message is automatically generated. A corrupt HFile could cause endless attempts to assign the region without a chance of success - Key: HBASE-9740 URL: https://issues.apache.org/jira/browse/HBASE-9740 Project: HBase Issue Type: Bug Affects Versions: 0.94.16 Reporter: Aditya Kishore Assignee: Ping Fix For: 0.94.19 Attachments: HBase-9740_0.94_v4.patch, HBase-9749_0.94_v2.patch, HBase-9749_0.94_v3.patch, patch-9740_0.94.txt As described in HBASE-9737, a corrupt HFile in a region could lead to an assignment storm in the cluster since the Master will keep trying to assign the region to each region server one after another and obviously none will succeed. The region server, upon detecting such a scenario should mark the region as RS_ZK_REGION_FAILED_ERROR (or something to the effect) in the Zookeeper which should indicate the Master to stop assigning the region until the error has been resolved (via an HBase shell command, probably assign?) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10648) Pluggable Memstore
[ https://issues.apache.org/jira/browse/HBASE-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HBASE-10648: -- Attachment: HBASE-10648-0.94_v3.patch Thanks [~lhofhansl] for review. Have replaced Cell related comments with KeyValue in the new patch. Pluggable Memstore -- Key: HBASE-10648 URL: https://issues.apache.org/jira/browse/HBASE-10648 Project: HBase Issue Type: Sub-task Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.99.0 Attachments: HBASE-10648-0.94_v1.patch, HBASE-10648-0.94_v2.patch, HBASE-10648-0.94_v3.patch, HBASE-10648.patch, HBASE-10648_V2.patch, HBASE-10648_V3.patch, HBASE-10648_V4.patch, HBASE-10648_V5.patch, HBASE-10648_V6.patch Make Memstore into an interface implementation. Also make it pluggable by configuring the FQCN of the impl. This will allow us to have different impl and optimizations in the Memstore DataStructure and the upper layers untouched. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10648) Pluggable Memstore
[ https://issues.apache.org/jira/browse/HBASE-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940317#comment-13940317 ] haosdent commented on HBASE-10648: -- well done! [~carp84] Pluggable Memstore -- Key: HBASE-10648 URL: https://issues.apache.org/jira/browse/HBASE-10648 Project: HBase Issue Type: Sub-task Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.99.0 Attachments: HBASE-10648-0.94_v1.patch, HBASE-10648-0.94_v2.patch, HBASE-10648-0.94_v3.patch, HBASE-10648.patch, HBASE-10648_V2.patch, HBASE-10648_V3.patch, HBASE-10648_V4.patch, HBASE-10648_V5.patch, HBASE-10648_V6.patch Make Memstore into an interface implementation. Also make it pluggable by configuring the FQCN of the impl. This will allow us to have different impl and optimizations in the Memstore DataStructure and the upper layers untouched. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1
[ https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940359#comment-13940359 ] Hadoop QA commented on HBASE-10781: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635507/10690v2.txt against trunk revision . ATTACHMENT ID: 12635507 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 24 new or modified tests. {color:red}-1 hadoop1.0{color}. The patch failed to compile against the hadoop 1.0 profile. Here is snippet of errors: {code}{code} {color:red}-1 hadoop1.1{color}. The patch failed to compile against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +These instructions are for building HBase 1.0.x. For building earlier versions, the process is different. See this section +paraNow, build the src tarball. This tarball is hadoop version independent. It is just the pure src code and documentation without a particular hadoop taint, etc. +Add the varname-Prelease/varname profile when building; it checks files for licenses and will fail the build if unlicensed files present. +notetitlePoint Release Only/titleparaThe following step that creates a new tag can be skipped since you've already created the point release tag/para/note +The last command above copies all artifacts up to a temporary staging apache mvn repo in an 'open' state. + paraThe script filenamedev-support/make_rc.sh/filename automates alot of the above listed release steps. + staging repository up in apache maven (human intervention is needed here), the checking of + the produced artifacts to ensure they are 'good' -- e.g. undoing the produced tarballs, eyeballing them to make + sure they look right then starting and checking all is running properly -- and then the signing and pushing of +paraNow lets get back to what is up in maven. Our artifacts should be up in maven repository in the staging area {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9043//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9043//console This message is automatically generated. Remove hadoop-one-compat module and all references to hadoop1 - Key: HBASE-10781 URL: https://issues.apache.org/jira/browse/HBASE-10781 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Fix For: 0.99.0 Attachments: 10690.txt, 10690v2.txt Clean
[jira] [Created] (HBASE-10789) Add NumberComparator
haosdent created HBASE-10789: Summary: Add NumberComparator Key: HBASE-10789 URL: https://issues.apache.org/jira/browse/HBASE-10789 Project: HBase Issue Type: Improvement Components: Filters Reporter: haosdent Assignee: haosdent Sometimes user may want to filter out which value less than a positive number. But they finally would get a result contains negative number. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8963) Add configuration option to skip HFile archiving
[ https://issues.apache.org/jira/browse/HBASE-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940431#comment-13940431 ] Jean-Marc Spaggiari commented on HBASE-8963: I agree with Lars. Makes more sense to me too to have a parameter to drop table to allow skipping archiving instead of setting that at any required level. Add configuration option to skip HFile archiving Key: HBASE-8963 URL: https://issues.apache.org/jira/browse/HBASE-8963 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: bharath v Fix For: 0.99.0 Attachments: HBASE-8963.trunk.v1.patch, HBASE-8963.trunk.v2.patch, HBASE-8963.trunk.v3.patch, HBASE-8963.trunk.v4.patch, HBASE-8963.trunk.v5.patch, HBASE-8963.trunk.v6.patch, HBASE-8963.trunk.v7.patch Currently HFileArchiver is always called when a table is dropped. A configuration option (either global or per table) should be provided so that archiving can be skipped when table is deleted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8963) Add configuration option to skip HFile archiving
[ https://issues.apache.org/jira/browse/HBASE-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940449#comment-13940449 ] Matteo Bertozzi commented on HBASE-8963: I don't think is a drop table argument, since the compaction will archive stuff anyway. I like the global level property to have a general settings, but we should also have a table level option something like create 'testtb' {SKIP_ARCHIVE = true} that will work on both compaction + delete Add configuration option to skip HFile archiving Key: HBASE-8963 URL: https://issues.apache.org/jira/browse/HBASE-8963 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: bharath v Fix For: 0.99.0 Attachments: HBASE-8963.trunk.v1.patch, HBASE-8963.trunk.v2.patch, HBASE-8963.trunk.v3.patch, HBASE-8963.trunk.v4.patch, HBASE-8963.trunk.v5.patch, HBASE-8963.trunk.v6.patch, HBASE-8963.trunk.v7.patch Currently HFileArchiver is always called when a table is dropped. A configuration option (either global or per table) should be provided so that archiving can be skipped when table is deleted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10790) make assembly:single as default in pom.xml
Liu Shaohui created HBASE-10790: --- Summary: make assembly:single as default in pom.xml Key: HBASE-10790 URL: https://issues.apache.org/jira/browse/HBASE-10790 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Priority: Minor Now to compile a HBase tar release package, we should use the cmd: {code} mvn clean package assembly:single {code}, which is not convenient. We can make assembly:single as a default option and run the assembly plugin in maven package phase. Then we can just use the cmd {code} mvn clean package {code} to get a release package. Other suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10790) make assembly:single as default in pom.xml
[ https://issues.apache.org/jira/browse/HBASE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated HBASE-10790: Attachment: HBASE-10790-trunk-v1.diff make assembly:single as default in pom.xml -- Key: HBASE-10790 URL: https://issues.apache.org/jira/browse/HBASE-10790 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Priority: Minor Attachments: HBASE-10790-trunk-v1.diff Now to compile a HBase tar release package, we should use the cmd: {code} mvn clean package assembly:single {code}, which is not convenient. We can make assembly:single as a default option and run the assembly plugin in maven package phase. Then we can just use the cmd {code} mvn clean package {code} to get a release package. Other suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10791) Add integration test to demonstrate performance improvement
Nick Dimiduk created HBASE-10791: Summary: Add integration test to demonstrate performance improvement Key: HBASE-10791 URL: https://issues.apache.org/jira/browse/HBASE-10791 Project: HBase Issue Type: Sub-task Components: Performance, test Affects Versions: hbase-10070 Reporter: Nick Dimiduk Assignee: Nick Dimiduk It would be good to demonstrate that use of region replicas reduces read latency. PerformanceEvaluation can be used manually for this purpose, but it's not able to use ChaosMonkey. An integration test can set up the monkey actions and automate execution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10791) Add integration test to demonstrate performance improvement
[ https://issues.apache.org/jira/browse/HBASE-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10791: - Attachment: HBASE-10791.00.patch Here's a sketch of a patch, it requires HBASE-10548, HBASE-10419, HBASE-10592 be brought over to the branch. Assuming this direction looks good, I'll bring those tickets onto the feature branch. Testing this has revealed some issues. Spoke with [~stack] yesterday about one that exists on trunk, will catch up with [~enis] and [~devaraj] about the other today. New tickets to follow. Add integration test to demonstrate performance improvement --- Key: HBASE-10791 URL: https://issues.apache.org/jira/browse/HBASE-10791 Project: HBase Issue Type: Sub-task Components: Performance, test Affects Versions: hbase-10070 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10791.00.patch It would be good to demonstrate that use of region replicas reduces read latency. PerformanceEvaluation can be used manually for this purpose, but it's not able to use ChaosMonkey. An integration test can set up the monkey actions and automate execution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10789) Add NumberComparator
[ https://issues.apache.org/jira/browse/HBASE-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940742#comment-13940742 ] Lars Hofhansl commented on HBASE-10789: --- I think this is a bit more tricky. You may want to sort things correctly too, and in that case you'd need to change the encoding. We can add a one-off comparator now. [~ndimiduk], FYI. Add NumberComparator Key: HBASE-10789 URL: https://issues.apache.org/jira/browse/HBASE-10789 Project: HBase Issue Type: Improvement Components: Filters Reporter: haosdent Assignee: haosdent Sometimes user may want to filter out which value less than a positive number. But they finally would get a result contains negative number. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10788) Add 99th percentile of latency in PE
[ https://issues.apache.org/jira/browse/HBASE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940794#comment-13940794 ] Nick Dimiduk commented on HBASE-10788: -- Could be your use of a real metrics library is the right way to go. My version allocates arrays of doubles, which can become expensive. I'd also like to add a mixed-workload test, in which case it'll be good to isolate read from write metrics, etc. Maybe your use of the yammer metrics library will support this, and also help minimize memory footprint while maintaining statical significance of the results. If you're adding a new dependency, be sure to include the jar in the mapreduce job. Good on you [~liushaohui]. Add 99th percentile of latency in PE Key: HBASE-10788 URL: https://issues.apache.org/jira/browse/HBASE-10788 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Attachments: HBASE-10788-trunk-v1.diff In production env, 99th percentile of latency is more important than the avg. The 99th percentile is helpful to measure the influence of GC, slow read/write of HDFS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10789) Add NumberComparator
[ https://issues.apache.org/jira/browse/HBASE-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940797#comment-13940797 ] Nick Dimiduk commented on HBASE-10789: -- DataType-aware filters are also on my todo list, though I wanted to get a little further along down that road before speculating about them. The advantage of using an order-preserving encoding (like {{OrderedBytes}}) is that the data is ordered this way by HBase and these filters can efficiently skip over swaths of data (depending on the use-case). There's definitely more to explore here. For the time being, it makes sense to have filters that support the different encoding formats produced by {{Bytes}}, but I think this is the wrong level of abstraction for the long run. Add NumberComparator Key: HBASE-10789 URL: https://issues.apache.org/jira/browse/HBASE-10789 Project: HBase Issue Type: Improvement Components: Filters Reporter: haosdent Assignee: haosdent Sometimes user may want to filter out which value less than a positive number. But they finally would get a result contains negative number. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure
[ https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940808#comment-13940808 ] Hadoop QA commented on HBASE-10786: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635570/10786-v2.txt against trunk revision . ATTACHMENT ID: 12635570 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9044//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9044//console This message is automatically generated. If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure -- Key: HBASE-10786 URL: https://issues.apache.org/jira/browse/HBASE-10786 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10786-v1.txt, 10786-v2.txt I was trying to find cause for test failure in https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/ : {code} org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an error. Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] } at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342) at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at
[jira] [Commented] (HBASE-10787) TestHCM#testConnection* take too long
[ https://issues.apache.org/jira/browse/HBASE-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940823#comment-13940823 ] Andrew Purtell commented on HBASE-10787: Scratch that, just trunk. Earlier branches don't have this test. TestHCM#testConnection* take too long - Key: HBASE-10787 URL: https://issues.apache.org/jira/browse/HBASE-10787 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10787-v1.txt TestHCM#testConnectionClose takes more than 5 minutes on Apache Jenkins. The test can be shortened when retry count is lowered. On my Mac, for TestHCM#testConnection* (two tests) without patch: {code} Running org.apache.hadoop.hbase.client.TestHCM 2014-03-18 15:46:57.695 java[71368:1203] Unable to load realm info from SCDynamicStore Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 242.2 sec {code} with patch: {code} Running org.apache.hadoop.hbase.client.TestHCM 2014-03-18 15:40:44.013 java[71184:1203] Unable to load realm info from SCDynamicStore Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 100.465 sec {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10787) TestHCM#testConnection* take too long
[ https://issues.apache.org/jira/browse/HBASE-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940814#comment-13940814 ] Andrew Purtell commented on HBASE-10787: Wow, the previous idea of retry a lot was excessive. Going to commit this to 0.96+ in a few minutes unless objection. TestHCM#testConnection* take too long - Key: HBASE-10787 URL: https://issues.apache.org/jira/browse/HBASE-10787 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10787-v1.txt TestHCM#testConnectionClose takes more than 5 minutes on Apache Jenkins. The test can be shortened when retry count is lowered. On my Mac, for TestHCM#testConnection* (two tests) without patch: {code} Running org.apache.hadoop.hbase.client.TestHCM 2014-03-18 15:46:57.695 java[71368:1203] Unable to load realm info from SCDynamicStore Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 242.2 sec {code} with patch: {code} Running org.apache.hadoop.hbase.client.TestHCM 2014-03-18 15:40:44.013 java[71184:1203] Unable to load realm info from SCDynamicStore Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 100.465 sec {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1
[ https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940842#comment-13940842 ] Elliott Clark commented on HBASE-10781: --- Are we expecting that hadoop will stop requiring us to do this kind of shimming (Hadoop 3.0 whenever it becomes a thing) ? or should we consider keeping the hadoop targeted build scripts ? Remove hadoop-one-compat module and all references to hadoop1 - Key: HBASE-10781 URL: https://issues.apache.org/jira/browse/HBASE-10781 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Fix For: 0.99.0 Attachments: 10690.txt, 10690v2.txt Clean out hadoop1 references. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10531) Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo
[ https://issues.apache.org/jira/browse/HBASE-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940845#comment-13940845 ] Andrew Purtell commented on HBASE-10531: +1 I reviewed the patch as a transitional change and the refactoring looks good to me. Followed long up on reviewboard where Stack gave this a good look. What are the follow on JIRAs? Maybe put a comment here leading to them. Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo Key: HBASE-10531 URL: https://issues.apache.org/jira/browse/HBASE-10531 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.99.0 Attachments: HBASE-10531.patch, HBASE-10531_1.patch, HBASE-10531_2.patch, HBASE-10531_3.patch, HBASE-10531_4.patch, HBASE-10531_5.patch, HBASE-10531_6.patch, HBASE-10531_7.patch Currently the byte[] key passed to HFileScanner.seekTo and HFileScanner.reseekTo, is a combination of row, cf, qual, type and ts. And the caller forms this by using kv.getBuffer, which is actually deprecated. So see how this can be achieved considering kv.getBuffer is removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10793) AuthFailed as a valid zookeeper state
Demai Ni created HBASE-10793: Summary: AuthFailed as a valid zookeeper state Key: HBASE-10793 URL: https://issues.apache.org/jira/browse/HBASE-10793 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.0 Reporter: Demai Ni Assignee: Demai Ni Fix For: 0.96.2, 0.98.2 In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed message indicates the client could not be authenticated, but it should proceed anyway, because only access to znodes that require SASL authentication will be denied and this client may never need to access them. Furthermore, AuthFailed is a valid event supported by Zookeeper, and following are valid Zookeeper events: case0: return KeeperState.Disconnected; case3: return KeeperState.SyncConnected; case4: return KeeperState.AuthFailed; case5: return KeeperState.ConnectedReadOnly; case6: return KeeperState.SaslAuthenticated; case -112: return KeeperState.Expired; Based on above, ZooKeeperWatcher should not throw exception for AuthFailed event as an invalid event. For this kind of event, Zookeeper already logs it as a warning and proceed with non-SASL connection. {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid} hbase(main):006:0 list TABLE 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher java.lang.IllegalStateException: Received event is not valid: AuthFailed at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) BIMonitoring BIMonitoringSummary BIMonitoringSummary180 BIMonitoringSummary900 LogMetadata LogRecords Mtable t1 t2 9 row(s) in 0.4040 seconds = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2] {code} the patch will be similar as HBase-8757 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10792) RingBufferTruck does not release its payload
Nick Dimiduk created HBASE-10792: Summary: RingBufferTruck does not release its payload Key: HBASE-10792 URL: https://issues.apache.org/jira/browse/HBASE-10792 Project: HBase Issue Type: Bug Components: Performance, wal Affects Versions: 0.99.0 Reporter: Nick Dimiduk Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox and watch as HBase eventually dies with an OOM: heap space. Examining the heap dump shows an extremely large retained size of KeyValue and RingBufferTrunk instances. By my eye, the default value of {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a small default heap size, or the RBT instances need to release their payloads after consumers retrieve them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HBASE-10531) Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo
[ https://issues.apache.org/jira/browse/HBASE-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940845#comment-13940845 ] Andrew Purtell edited comment on HBASE-10531 at 3/19/14 7:00 PM: - +1 I reviewed the patch as a transitional change and the refactoring looks good to me. Also followed along up on reviewboard where Stack gave this a good look. What are the follow up JIRAs? Maybe put a comment here leading to them. was (Author: apurtell): +1 I reviewed the patch as a transitional change and the refactoring looks good to me. Followed long up on reviewboard where Stack gave this a good look. What are the follow on JIRAs? Maybe put a comment here leading to them. Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo Key: HBASE-10531 URL: https://issues.apache.org/jira/browse/HBASE-10531 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.99.0 Attachments: HBASE-10531.patch, HBASE-10531_1.patch, HBASE-10531_2.patch, HBASE-10531_3.patch, HBASE-10531_4.patch, HBASE-10531_5.patch, HBASE-10531_6.patch, HBASE-10531_7.patch Currently the byte[] key passed to HFileScanner.seekTo and HFileScanner.reseekTo, is a combination of row, cf, qual, type and ts. And the caller forms this by using kv.getBuffer, which is actually deprecated. So see how this can be achieved considering kv.getBuffer is removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10788) Add 99th percentile of latency in PE
[ https://issues.apache.org/jira/browse/HBASE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940826#comment-13940826 ] Andrew Purtell commented on HBASE-10788: bq. Usually I find min, avg, 95th, 99th, and 99.9th percentiles, and max useful. Certainly average, max, and 95th are useful information in addition to higher percentiles, +1 Add 99th percentile of latency in PE Key: HBASE-10788 URL: https://issues.apache.org/jira/browse/HBASE-10788 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Attachments: HBASE-10788-trunk-v1.diff In production env, 99th percentile of latency is more important than the avg. The 99th percentile is helpful to measure the influence of GC, slow read/write of HDFS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10774) Restore TestMultiTableInputFormat
[ https://issues.apache.org/jira/browse/HBASE-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940830#comment-13940830 ] Andrew Purtell commented on HBASE-10774: 286 seconds is borderline, 557 was too long. Based on above comment, looks like we can go with patch v2. We still have to care about test running time on Jenkins because if the suite runs while the underlying system is particularly loaded, we will get a spurious test timeout and build failure. Restore TestMultiTableInputFormat - Key: HBASE-10774 URL: https://issues.apache.org/jira/browse/HBASE-10774 Project: HBase Issue Type: Test Affects Versions: 0.99.0 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Attachments: HBASE-10774-trunk-v2.diff, HBASE-10774-v1.diff TestMultiTableInputFormat is removed in HBASE-9009 for this test made the ci failed. But in HBASE-10692 we need to add a new test TestSecureMultiTableInputFormat which is depends on it. So we try to restore it in this issue. I rerun the test for several times and it passed. {code} Running org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 314.163 sec {code} [~stack] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure
[ https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10786: --- Attachment: 10786-v2.txt Patch v2 logs the regions whose snapshot directory cannot be found. [~mbertozzi]: What do you think ? If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure -- Key: HBASE-10786 URL: https://issues.apache.org/jira/browse/HBASE-10786 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10786-v1.txt, 10786-v2.txt I was trying to find cause for test failure in https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/ : {code} org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an error. Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] } at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342) at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8 at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332) ... 11 more {code} However, it is not clear which region caused the verification to fail. I searched for log from balancer but found none. The exception message should include region name which caused the verification to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10792) RingBufferTruck does not release its payload
[ https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10792: - Status: Patch Available (was: Open) RingBufferTruck does not release its payload Key: HBASE-10792 URL: https://issues.apache.org/jira/browse/HBASE-10792 Project: HBase Issue Type: Bug Components: Performance, wal Affects Versions: 0.99.0 Reporter: Nick Dimiduk Attachments: HBASE-10792.00.patch Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox and watch as HBase eventually dies with an OOM: heap space. Examining the heap dump shows an extremely large retained size of KeyValue and RingBufferTrunk instances. By my eye, the default value of {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a small default heap size, or the RBT instances need to release their payloads after consumers retrieve them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10792) RingBufferTruck does not release its payload
[ https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10792: - Attachment: HBASE-10792.00.patch Here's a patch that changes RBT a little. Payload content can now be inspected and references are removed at unload time. I don't know how this impacts failure cases, I need to read up on the disruptor a bit more. (cc [~fenghh], [~stack]) RingBufferTruck does not release its payload Key: HBASE-10792 URL: https://issues.apache.org/jira/browse/HBASE-10792 Project: HBase Issue Type: Bug Components: Performance, wal Affects Versions: 0.99.0 Reporter: Nick Dimiduk Attachments: HBASE-10792.00.patch Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox and watch as HBase eventually dies with an OOM: heap space. Examining the heap dump shows an extremely large retained size of KeyValue and RingBufferTrunk instances. By my eye, the default value of {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a small default heap size, or the RBT instances need to release their payloads after consumers retrieve them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10787) TestHCM#testConnection* take too long
[ https://issues.apache.org/jira/browse/HBASE-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10787: --- Resolution: Fixed Fix Version/s: 0.99.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) TestHCM#testConnection* take too long - Key: HBASE-10787 URL: https://issues.apache.org/jira/browse/HBASE-10787 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.99.0 Attachments: 10787-v1.txt TestHCM#testConnectionClose takes more than 5 minutes on Apache Jenkins. The test can be shortened when retry count is lowered. On my Mac, for TestHCM#testConnection* (two tests) without patch: {code} Running org.apache.hadoop.hbase.client.TestHCM 2014-03-18 15:46:57.695 java[71368:1203] Unable to load realm info from SCDynamicStore Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 242.2 sec {code} with patch: {code} Running org.apache.hadoop.hbase.client.TestHCM 2014-03-18 15:40:44.013 java[71184:1203] Unable to load realm info from SCDynamicStore Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 100.465 sec {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10793) AuthFailed as a valid zookeeper state
[ https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-10793: - Attachment: HBASE-10793-trunk-v0.patch AuthFailed as a valid zookeeper state -- Key: HBASE-10793 URL: https://issues.apache.org/jira/browse/HBASE-10793 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.0 Reporter: Demai Ni Assignee: Demai Ni Fix For: 0.96.2, 0.99.0, 0.98.2 Attachments: HBASE-10793-trunk-v0.patch In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed message indicates the client could not be authenticated, but it should proceed anyway, because only access to znodes that require SASL authentication will be denied and this client may never need to access them. Furthermore, AuthFailed is a valid event supported by Zookeeper, and following are valid Zookeeper events: case0: return KeeperState.Disconnected; case3: return KeeperState.SyncConnected; case4: return KeeperState.AuthFailed; case5: return KeeperState.ConnectedReadOnly; case6: return KeeperState.SaslAuthenticated; case -112: return KeeperState.Expired; Based on above, ZooKeeperWatcher should not throw exception for AuthFailed event as an invalid event. For this kind of event, Zookeeper already logs it as a warning and proceed with non-SASL connection. {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid} hbase(main):006:0 list TABLE 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher java.lang.IllegalStateException: Received event is not valid: AuthFailed at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) BIMonitoring BIMonitoringSummary BIMonitoringSummary180 BIMonitoringSummary900 LogMetadata LogRecords Mtable t1 t2 9 row(s) in 0.4040 seconds = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2] {code} the patch will be similar as HBase-8757 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10793) AuthFailed as a valid zookeeper state
[ https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-10793: - Fix Version/s: 0.99.0 Status: Patch Available (was: Open) AuthFailed as a valid zookeeper state -- Key: HBASE-10793 URL: https://issues.apache.org/jira/browse/HBASE-10793 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.0 Reporter: Demai Ni Assignee: Demai Ni Fix For: 0.96.2, 0.99.0, 0.98.2 Attachments: HBASE-10793-trunk-v0.patch In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed message indicates the client could not be authenticated, but it should proceed anyway, because only access to znodes that require SASL authentication will be denied and this client may never need to access them. Furthermore, AuthFailed is a valid event supported by Zookeeper, and following are valid Zookeeper events: case0: return KeeperState.Disconnected; case3: return KeeperState.SyncConnected; case4: return KeeperState.AuthFailed; case5: return KeeperState.ConnectedReadOnly; case6: return KeeperState.SaslAuthenticated; case -112: return KeeperState.Expired; Based on above, ZooKeeperWatcher should not throw exception for AuthFailed event as an invalid event. For this kind of event, Zookeeper already logs it as a warning and proceed with non-SASL connection. {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid} hbase(main):006:0 list TABLE 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher java.lang.IllegalStateException: Received event is not valid: AuthFailed at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) BIMonitoring BIMonitoringSummary BIMonitoringSummary180 BIMonitoringSummary900 LogMetadata LogRecords Mtable t1 t2 9 row(s) in 0.4040 seconds = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2] {code} the patch will be similar as HBase-8757 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure
[ https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940898#comment-13940898 ] Matteo Bertozzi commented on HBASE-10786: - +1, maybe just rename that msg to errorMsg or something to make clear what we are checking when throwing the exception If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure -- Key: HBASE-10786 URL: https://issues.apache.org/jira/browse/HBASE-10786 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10786-v1.txt, 10786-v2.txt I was trying to find cause for test failure in https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/ : {code} org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an error. Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] } at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342) at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8 at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332) ... 11 more {code} However, it is not clear which region caused the verification to fail. I searched for log from balancer but found none. The exception message should include region name which caused the verification to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure
[ https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10786: --- Attachment: 10786-v3.txt Patch v3 addresses Matteo's comments. If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure -- Key: HBASE-10786 URL: https://issues.apache.org/jira/browse/HBASE-10786 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10786-v1.txt, 10786-v2.txt, 10786-v3.txt I was trying to find cause for test failure in https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/ : {code} org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an error. Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] } at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342) at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8 at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332) ... 11 more {code} However, it is not clear which region caused the verification to fail. I searched for log from balancer but found none. The exception message should include region name which caused the verification to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure
[ https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10786: --- Status: Open (was: Patch Available) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure -- Key: HBASE-10786 URL: https://issues.apache.org/jira/browse/HBASE-10786 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10786-v1.txt, 10786-v2.txt, 10786-v3.txt I was trying to find cause for test failure in https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/ : {code} org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an error. Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] } at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342) at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8 at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332) ... 11 more {code} However, it is not clear which region caused the verification to fail. I searched for log from balancer but found none. The exception message should include region name which caused the verification to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure
[ https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10786: --- Fix Version/s: 0.98.2 0.99.0 Hadoop Flags: Reviewed If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure -- Key: HBASE-10786 URL: https://issues.apache.org/jira/browse/HBASE-10786 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.99.0, 0.98.2 Attachments: 10786-v1.txt, 10786-v2.txt, 10786-v3.txt I was trying to find cause for test failure in https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/ : {code} org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an error. Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] } at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342) at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8 at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332) ... 11 more {code} However, it is not clear which region caused the verification to fail. I searched for log from balancer but found none. The exception message should include region name which caused the verification to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10792) RingBufferTruck does not release its payload
[ https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940910#comment-13940910 ] stack commented on HBASE-10792: --- +1 Its great. Thanks for keeping on w/ the strained metaphor! RingBufferTruck does not release its payload Key: HBASE-10792 URL: https://issues.apache.org/jira/browse/HBASE-10792 Project: HBase Issue Type: Bug Components: Performance, wal Affects Versions: 0.99.0 Reporter: Nick Dimiduk Attachments: HBASE-10792.00.patch Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox and watch as HBase eventually dies with an OOM: heap space. Examining the heap dump shows an extremely large retained size of KeyValue and RingBufferTrunk instances. By my eye, the default value of {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a small default heap size, or the RBT instances need to release their payloads after consumers retrieve them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure
[ https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-10786. Resolution: Fixed Thanks for the review, Matteo. If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure -- Key: HBASE-10786 URL: https://issues.apache.org/jira/browse/HBASE-10786 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.99.0, 0.98.2 Attachments: 10786-v1.txt, 10786-v2.txt, 10786-v3.txt I was trying to find cause for test failure in https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/ : {code} org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an error. Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] } at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342) at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8 at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332) ... 11 more {code} However, it is not clear which region caused the verification to fail. I searched for log from balancer but found none. The exception message should include region name which caused the verification to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-7847) Use zookeeper multi to clear znodes
[ https://issues.apache.org/jira/browse/HBASE-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-7847: -- Attachment: (was: 7847_v6.patch) Use zookeeper multi to clear znodes --- Key: HBASE-7847 URL: https://issues.apache.org/jira/browse/HBASE-7847 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Attachments: 7847-v1.txt, HBASE-7847.patch, HBASE-7847.patch, HBASE-7847.patch, HBASE-7847_v4.patch, HBASE-7847_v5.patch, HBASE-7847_v6.patch In ZKProcedureUtil, clearChildZNodes() and clearZNodes(String procedureName) should utilize zookeeper multi so that they're atomic -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-7847) Use zookeeper multi to clear znodes
[ https://issues.apache.org/jira/browse/HBASE-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-7847: -- Attachment: 7847_v6.patch Use zookeeper multi to clear znodes --- Key: HBASE-7847 URL: https://issues.apache.org/jira/browse/HBASE-7847 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Attachments: 7847-v1.txt, 7847_v6.patch, HBASE-7847.patch, HBASE-7847.patch, HBASE-7847.patch, HBASE-7847_v4.patch, HBASE-7847_v5.patch, HBASE-7847_v6.patch In ZKProcedureUtil, clearChildZNodes() and clearZNodes(String procedureName) should utilize zookeeper multi so that they're atomic -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10794) multi-get should handle missing replica location from cache
Sergey Shelukhin created HBASE-10794: Summary: multi-get should handle missing replica location from cache Key: HBASE-10794 URL: https://issues.apache.org/jira/browse/HBASE-10794 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-10070 Currently the way cache works is that the meta row is stored together for all replicas of a region, so if some replicas are in recovery, getting locations for a region will still go to cache only and return null locations for these. Multi-get currently ignores such replicas. It should instead try to get location again from meta if any replica is null. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10794) multi-get should handle missing replica location from cache
[ https://issues.apache.org/jira/browse/HBASE-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940966#comment-13940966 ] Sergey Shelukhin commented on HBASE-10794: -- [~enis] [~devaraj] fyi multi-get should handle missing replica location from cache --- Key: HBASE-10794 URL: https://issues.apache.org/jira/browse/HBASE-10794 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-10070 Currently the way cache works is that the meta row is stored together for all replicas of a region, so if some replicas are in recovery, getting locations for a region will still go to cache only and return null locations for these. Multi-get currently ignores such replicas. It should instead try to get location again from meta if any replica is null. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10634) Multiget doesn't fully work
[ https://issues.apache.org/jira/browse/HBASE-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-10634: Attachment: 10634-1.1.txt Patch that has been tested. Include Sergey's last patch and some fixes on top for getting the locations of regions when there is a server crash. Also assumes HBASE-10701's last patch. Multiget doesn't fully work --- Key: HBASE-10634 URL: https://issues.apache.org/jira/browse/HBASE-10634 Project: HBase Issue Type: Sub-task Reporter: Devaraj Das Assignee: Sergey Shelukhin Fix For: hbase-10070 Attachments: 10634-1.1.txt, 10634-1.txt, HBASE-10634.02.patch, HBASE-10634.patch, HBASE-10634.patch, multi.out, no-multi.out -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10794) multi-get should handle missing replica location from cache
[ https://issues.apache.org/jira/browse/HBASE-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-10794: - Issue Type: Sub-task (was: Improvement) Parent: HBASE-10070 multi-get should handle missing replica location from cache --- Key: HBASE-10794 URL: https://issues.apache.org/jira/browse/HBASE-10794 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-10070 Currently the way cache works is that the meta row is stored together for all replicas of a region, so if some replicas are in recovery, getting locations for a region will still go to cache only and return null locations for these. Multi-get currently ignores such replicas. It should instead try to get location again from meta if any replica is null. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10792) RingBufferTruck does not release its payload
[ https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940974#comment-13940974 ] Hadoop QA commented on HBASE-10792: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635618/HBASE-10792.00.patch against trunk revision . ATTACHMENT ID: 12635618 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestImportExport.testImport94Table(TestImportExport.java:230) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9045//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9045//console This message is automatically generated. RingBufferTruck does not release its payload Key: HBASE-10792 URL: https://issues.apache.org/jira/browse/HBASE-10792 Project: HBase Issue Type: Bug Components: Performance, wal Affects Versions: 0.99.0 Reporter: Nick Dimiduk Attachments: HBASE-10792.00.patch Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox and watch as HBase eventually dies with an OOM: heap space. Examining the heap dump shows an extremely large retained size of KeyValue and RingBufferTrunk instances. By my eye, the default value of {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a small default heap size, or the RBT instances need to release their payloads after consumers retrieve them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10793) AuthFailed as a valid zookeeper state
[ https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940987#comment-13940987 ] Hadoop QA commented on HBASE-10793: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635623/HBASE-10793-trunk-v0.patch against trunk revision . ATTACHMENT ID: 12635623 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9046//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9046//console This message is automatically generated. AuthFailed as a valid zookeeper state -- Key: HBASE-10793 URL: https://issues.apache.org/jira/browse/HBASE-10793 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.0 Reporter: Demai Ni Assignee: Demai Ni Fix For: 0.96.2, 0.99.0, 0.98.2 Attachments: HBASE-10793-trunk-v0.patch In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed message indicates the client could not be authenticated, but it should proceed anyway, because only access to znodes that require SASL authentication will be denied and this client may never need to access them. Furthermore, AuthFailed is a valid event supported by Zookeeper, and following are valid Zookeeper events: case0: return KeeperState.Disconnected; case3: return KeeperState.SyncConnected; case4: return KeeperState.AuthFailed; case5: return KeeperState.ConnectedReadOnly; case6: return KeeperState.SaslAuthenticated; case -112: return KeeperState.Expired; Based on above, ZooKeeperWatcher should not throw exception for AuthFailed event as an invalid event. For this kind of event, Zookeeper already logs it as a warning and proceed with non-SASL connection. {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid} hbase(main):006:0 list TABLE
[jira] [Commented] (HBASE-10793) AuthFailed as a valid zookeeper state
[ https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940995#comment-13940995 ] Ted Yu commented on HBASE-10793: +1 AuthFailed as a valid zookeeper state -- Key: HBASE-10793 URL: https://issues.apache.org/jira/browse/HBASE-10793 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.0 Reporter: Demai Ni Assignee: Demai Ni Fix For: 0.96.2, 0.99.0, 0.98.2 Attachments: HBASE-10793-trunk-v0.patch In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed message indicates the client could not be authenticated, but it should proceed anyway, because only access to znodes that require SASL authentication will be denied and this client may never need to access them. Furthermore, AuthFailed is a valid event supported by Zookeeper, and following are valid Zookeeper events: case0: return KeeperState.Disconnected; case3: return KeeperState.SyncConnected; case4: return KeeperState.AuthFailed; case5: return KeeperState.ConnectedReadOnly; case6: return KeeperState.SaslAuthenticated; case -112: return KeeperState.Expired; Based on above, ZooKeeperWatcher should not throw exception for AuthFailed event as an invalid event. For this kind of event, Zookeeper already logs it as a warning and proceed with non-SASL connection. {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid} hbase(main):006:0 list TABLE 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher java.lang.IllegalStateException: Received event is not valid: AuthFailed at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) BIMonitoring BIMonitoringSummary BIMonitoringSummary180 BIMonitoringSummary900 LogMetadata LogRecords Mtable t1 t2 9 row(s) in 0.4040 seconds = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2] {code} the patch will be similar as HBase-8757 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10794) multi-get should handle missing replica location from cache
[ https://issues.apache.org/jira/browse/HBASE-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-10794: - Attachment: HBASE-10794.patch This patch is on top of two blocking patches multi-get should handle missing replica location from cache --- Key: HBASE-10794 URL: https://issues.apache.org/jira/browse/HBASE-10794 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-10070 Attachments: HBASE-10794.patch Currently the way cache works is that the meta row is stored together for all replicas of a region, so if some replicas are in recovery, getting locations for a region will still go to cache only and return null locations for these. Multi-get currently ignores such replicas. It should instead try to get location again from meta if any replica is null. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10634) Multiget doesn't fully work
[ https://issues.apache.org/jira/browse/HBASE-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941005#comment-13941005 ] Sergey Shelukhin commented on HBASE-10634: -- +1 for combined patch. Also improves some confusing logging... Multiget doesn't fully work --- Key: HBASE-10634 URL: https://issues.apache.org/jira/browse/HBASE-10634 Project: HBase Issue Type: Sub-task Reporter: Devaraj Das Assignee: Sergey Shelukhin Fix For: hbase-10070 Attachments: 10634-1.1.txt, 10634-1.txt, HBASE-10634.02.patch, HBASE-10634.patch, HBASE-10634.patch, multi.out, no-multi.out -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HBASE-10792) RingBufferTruck does not release its payload
[ https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk reassigned HBASE-10792: Assignee: Nick Dimiduk RingBufferTruck does not release its payload Key: HBASE-10792 URL: https://issues.apache.org/jira/browse/HBASE-10792 Project: HBase Issue Type: Bug Components: Performance, wal Affects Versions: 0.99.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10792.00.patch Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox and watch as HBase eventually dies with an OOM: heap space. Examining the heap dump shows an extremely large retained size of KeyValue and RingBufferTrunk instances. By my eye, the default value of {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a small default heap size, or the RBT instances need to release their payloads after consumers retrieve them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10634) Multiget doesn't fully work
[ https://issues.apache.org/jira/browse/HBASE-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941014#comment-13941014 ] Sergey Shelukhin commented on HBASE-10634: -- Wait, this is not the combined patch Multiget doesn't fully work --- Key: HBASE-10634 URL: https://issues.apache.org/jira/browse/HBASE-10634 Project: HBase Issue Type: Sub-task Reporter: Devaraj Das Assignee: Sergey Shelukhin Fix For: hbase-10070 Attachments: 10634-1.1.txt, 10634-1.txt, HBASE-10634.02.patch, HBASE-10634.patch, HBASE-10634.patch, multi.out, no-multi.out -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10792) RingBufferTruck does not release its payload
[ https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10792: -- Attachment: HBASE-10792.00.patch Retry RingBufferTruck does not release its payload Key: HBASE-10792 URL: https://issues.apache.org/jira/browse/HBASE-10792 Project: HBase Issue Type: Bug Components: Performance, wal Affects Versions: 0.99.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10792.00.patch, HBASE-10792.00.patch Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox and watch as HBase eventually dies with an OOM: heap space. Examining the heap dump shows an extremely large retained size of KeyValue and RingBufferTrunk instances. By my eye, the default value of {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a small default heap size, or the RBT instances need to release their payloads after consumers retrieve them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10531) Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo
[ https://issues.apache.org/jira/browse/HBASE-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941036#comment-13941036 ] stack commented on HBASE-10531: --- +1 Add followup jiras here as per Andrew. Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo Key: HBASE-10531 URL: https://issues.apache.org/jira/browse/HBASE-10531 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.99.0 Attachments: HBASE-10531.patch, HBASE-10531_1.patch, HBASE-10531_2.patch, HBASE-10531_3.patch, HBASE-10531_4.patch, HBASE-10531_5.patch, HBASE-10531_6.patch, HBASE-10531_7.patch Currently the byte[] key passed to HFileScanner.seekTo and HFileScanner.reseekTo, is a combination of row, cf, qual, type and ts. And the caller forms this by using kv.getBuffer, which is actually deprecated. So see how this can be achieved considering kv.getBuffer is removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10792) RingBufferTruck does not release its payload
[ https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941051#comment-13941051 ] Himanshu Vashishtha commented on HBASE-10792: - +1 RingBufferTruck does not release its payload Key: HBASE-10792 URL: https://issues.apache.org/jira/browse/HBASE-10792 Project: HBase Issue Type: Bug Components: Performance, wal Affects Versions: 0.99.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10792.00.patch, HBASE-10792.00.patch Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox and watch as HBase eventually dies with an OOM: heap space. Examining the heap dump shows an extremely large retained size of KeyValue and RingBufferTrunk instances. By my eye, the default value of {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a small default heap size, or the RBT instances need to release their payloads after consumers retrieve them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1
[ https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941052#comment-13941052 ] stack commented on HBASE-10781: --- [~eclark] The hadoop3 patch doesn't include a hadoop3-compat-module, HBASE-6581, as yet. We discussed adding one though IIRC because its all about changes in sync API. So I don't see us giving up your little compat 'system' yet. I got rid of the little build script because while it served a purpose, it is ugly. We can revive it if we need such a beast going forward (maven will be 'fixed' the next time we need this kind of facility -- smile). I'm testing out my little make-rc.sh changes to run against trunk.. The built tarball has some CLASSPATH issues. Trying to fix before commit. Remove hadoop-one-compat module and all references to hadoop1 - Key: HBASE-10781 URL: https://issues.apache.org/jira/browse/HBASE-10781 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Fix For: 0.99.0 Attachments: 10690.txt, 10690v2.txt Clean out hadoop1 references. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10634) Multiget doesn't fully work
[ https://issues.apache.org/jira/browse/HBASE-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941072#comment-13941072 ] Devaraj Das commented on HBASE-10634: - [~sershe], you got confused :-) The combined patch will be the one with HBASE-10794. Multiget doesn't fully work --- Key: HBASE-10634 URL: https://issues.apache.org/jira/browse/HBASE-10634 Project: HBase Issue Type: Sub-task Reporter: Devaraj Das Assignee: Sergey Shelukhin Fix For: hbase-10070 Attachments: 10634-1.1.txt, 10634-1.txt, HBASE-10634.02.patch, HBASE-10634.patch, HBASE-10634.patch, multi.out, no-multi.out -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10776) Separate HConnectionManager into several parts
[ https://issues.apache.org/jira/browse/HBASE-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941075#comment-13941075 ] Yi Deng commented on HBASE-10776: - Cool. Another refactoring job I'm thinking is to remove the unnecessary inheritance whenever possible, using composition instead. Hopefully the class tree could become much flatter. Separate HConnectionManager into several parts -- Key: HBASE-10776 URL: https://issues.apache.org/jira/browse/HBASE-10776 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.89-fb Reporter: Yi Deng Priority: Minor Fix For: 0.89-fb HConnectionManager is too large to effectively maintain. This Jira records some refactoring jobs: 1. Move TableServers out as a standalone class 2. Move region-locating code as a class -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10793) AuthFailed as a valid zookeeper state
[ https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941078#comment-13941078 ] Andrew Purtell commented on HBASE-10793: lgtm Going to commit to 0.96+ shortly Ping [~stack] AuthFailed as a valid zookeeper state -- Key: HBASE-10793 URL: https://issues.apache.org/jira/browse/HBASE-10793 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.0 Reporter: Demai Ni Assignee: Demai Ni Fix For: 0.96.2, 0.99.0, 0.98.2 Attachments: HBASE-10793-trunk-v0.patch In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed message indicates the client could not be authenticated, but it should proceed anyway, because only access to znodes that require SASL authentication will be denied and this client may never need to access them. Furthermore, AuthFailed is a valid event supported by Zookeeper, and following are valid Zookeeper events: case0: return KeeperState.Disconnected; case3: return KeeperState.SyncConnected; case4: return KeeperState.AuthFailed; case5: return KeeperState.ConnectedReadOnly; case6: return KeeperState.SaslAuthenticated; case -112: return KeeperState.Expired; Based on above, ZooKeeperWatcher should not throw exception for AuthFailed event as an invalid event. For this kind of event, Zookeeper already logs it as a warning and proceed with non-SASL connection. {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid} hbase(main):006:0 list TABLE 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher java.lang.IllegalStateException: Received event is not valid: AuthFailed at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) BIMonitoring BIMonitoringSummary BIMonitoringSummary180 BIMonitoringSummary900 LogMetadata LogRecords Mtable t1 t2 9 row(s) in 0.4040 seconds = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2] {code} the patch will be similar as HBase-8757 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10794) multi-get should handle missing replica location from cache
[ https://issues.apache.org/jira/browse/HBASE-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-10794: - Attachment: HBASE-10794.patch include other changes that are not part of HBASE-10634 multi-get should handle missing replica location from cache --- Key: HBASE-10794 URL: https://issues.apache.org/jira/browse/HBASE-10794 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-10070 Attachments: HBASE-10794.patch, HBASE-10794.patch Currently the way cache works is that the meta row is stored together for all replicas of a region, so if some replicas are in recovery, getting locations for a region will still go to cache only and return null locations for these. Multi-get currently ignores such replicas. It should instead try to get location again from meta if any replica is null. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1
[ https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941099#comment-13941099 ] Enis Soztutar commented on HBASE-10781: --- Yeah, we can keep the hadoop-compat modules, since I think some hadoop-2.x might also require a shim of its own even before 3.0. Remove hadoop-one-compat module and all references to hadoop1 - Key: HBASE-10781 URL: https://issues.apache.org/jira/browse/HBASE-10781 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Fix For: 0.99.0 Attachments: 10690.txt, 10690v2.txt Clean out hadoop1 references. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10793) AuthFailed as a valid zookeeper state
[ https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941101#comment-13941101 ] stack commented on HBASE-10793: --- ok AuthFailed as a valid zookeeper state -- Key: HBASE-10793 URL: https://issues.apache.org/jira/browse/HBASE-10793 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.0 Reporter: Demai Ni Assignee: Demai Ni Fix For: 0.96.2, 0.99.0, 0.98.2 Attachments: HBASE-10793-trunk-v0.patch In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed message indicates the client could not be authenticated, but it should proceed anyway, because only access to znodes that require SASL authentication will be denied and this client may never need to access them. Furthermore, AuthFailed is a valid event supported by Zookeeper, and following are valid Zookeeper events: case0: return KeeperState.Disconnected; case3: return KeeperState.SyncConnected; case4: return KeeperState.AuthFailed; case5: return KeeperState.ConnectedReadOnly; case6: return KeeperState.SaslAuthenticated; case -112: return KeeperState.Expired; Based on above, ZooKeeperWatcher should not throw exception for AuthFailed event as an invalid event. For this kind of event, Zookeeper already logs it as a warning and proceed with non-SASL connection. {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid} hbase(main):006:0 list TABLE 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher java.lang.IllegalStateException: Received event is not valid: AuthFailed at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) BIMonitoring BIMonitoringSummary BIMonitoringSummary180 BIMonitoringSummary900 LogMetadata LogRecords Mtable t1 t2 9 row(s) in 0.4040 seconds = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2] {code} the patch will be similar as HBase-8757 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-7847) Use zookeeper multi to clear znodes
[ https://issues.apache.org/jira/browse/HBASE-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941106#comment-13941106 ] Hadoop QA commented on HBASE-7847: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635627/7847_v6.patch against trunk revision . ATTACHMENT ID: 12635627 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestTableMapReduceBase.testMultiRegionTable(TestTableMapReduceBase.java:96) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9047//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9047//console This message is automatically generated. Use zookeeper multi to clear znodes --- Key: HBASE-7847 URL: https://issues.apache.org/jira/browse/HBASE-7847 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Attachments: 7847-v1.txt, 7847_v6.patch, HBASE-7847.patch, HBASE-7847.patch, HBASE-7847.patch, HBASE-7847_v4.patch, HBASE-7847_v5.patch, HBASE-7847_v6.patch In ZKProcedureUtil, clearChildZNodes() and clearZNodes(String procedureName) should utilize zookeeper multi so that they're atomic -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1
[ https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10781: -- Attachment: 10781v3.txt This works for me. What I'll commit unless objection. Remove hadoop-one-compat module and all references to hadoop1 - Key: HBASE-10781 URL: https://issues.apache.org/jira/browse/HBASE-10781 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Fix For: 0.99.0 Attachments: 10690.txt, 10690v2.txt, 10781v3.txt Clean out hadoop1 references. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10782) Hadoop2 MR tests fail occasionally because of mapreduce.jobhistory.address is no set in job conf
[ https://issues.apache.org/jira/browse/HBASE-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941119#comment-13941119 ] stack commented on HBASE-10782: --- +1 Hadoop2 MR tests fail occasionally because of mapreduce.jobhistory.address is no set in job conf Key: HBASE-10782 URL: https://issues.apache.org/jira/browse/HBASE-10782 Project: HBase Issue Type: Test Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Attachments: HBASE-10782-trunk-v1.diff Hadoop2 MR tests fail occasionally with output like this: {code} --- Test set: org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1 --- Tests run: 5, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 347.57 sec FAILURE! testScanEmptyToAPP(org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1) Time elapsed: 50.047 sec ERROR! java.io.IOException: java.net.ConnectException: Call From liushaohui-OptiPlex-990/127.0.0.1 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:334) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:524) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311) at java.security.AccessController.doPrivileged(Native Method) ... {code} The reason is that when MR job was running, the job client pulled the job status from AppMaster. When the job is completed, the AppMaster will exit. At this time, if the job client have not got the job completed event from AppMaster, it will switch to get job report from history server. But in HBaseTestingUtility#startMiniMapReduceCluster, the config: mapreduce.jobhistory.address is not copied to TestUtil's config. CRUNCH-249 reported the same problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10790) make assembly:single as default in pom.xml
[ https://issues.apache.org/jira/browse/HBASE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941130#comment-13941130 ] Enis Soztutar commented on HBASE-10790: --- I am not in favor of this, since install requires package, it would mean that every mvn install would build the tarball. And even in my SSD MBP, it takes 40 sec to build the tarball. Building the tarball is a much less frequent operation than mvn install (at least in my daily development). make assembly:single as default in pom.xml -- Key: HBASE-10790 URL: https://issues.apache.org/jira/browse/HBASE-10790 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Priority: Minor Attachments: HBASE-10790-trunk-v1.diff Now to compile a HBase tar release package, we should use the cmd: {code} mvn clean package assembly:single {code}, which is not convenient. We can make assembly:single as a default option and run the assembly plugin in maven package phase. Then we can just use the cmd {code} mvn clean package {code} to get a release package. Other suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1
[ https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941131#comment-13941131 ] Enis Soztutar commented on HBASE-10781: --- lgtm. Remove hadoop-one-compat module and all references to hadoop1 - Key: HBASE-10781 URL: https://issues.apache.org/jira/browse/HBASE-10781 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Fix For: 0.99.0 Attachments: 10690.txt, 10690v2.txt, 10781v3.txt Clean out hadoop1 references. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1
[ https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10781: -- Resolution: Fixed Hadoop Flags: Incompatible change,Reviewed (was: Incompatible change) Status: Resolved (was: Patch Available) Committed since got +1 from the RM. Remove hadoop-one-compat module and all references to hadoop1 - Key: HBASE-10781 URL: https://issues.apache.org/jira/browse/HBASE-10781 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Fix For: 0.99.0 Attachments: 10690.txt, 10690v2.txt, 10781v3.txt Clean out hadoop1 references. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10782) Hadoop2 MR tests fail occasionally because of mapreduce.jobhistory.address is no set in job conf
[ https://issues.apache.org/jira/browse/HBASE-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941139#comment-13941139 ] Nick Dimiduk commented on HBASE-10782: -- +1 Hadoop2 MR tests fail occasionally because of mapreduce.jobhistory.address is no set in job conf Key: HBASE-10782 URL: https://issues.apache.org/jira/browse/HBASE-10782 Project: HBase Issue Type: Test Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Attachments: HBASE-10782-trunk-v1.diff Hadoop2 MR tests fail occasionally with output like this: {code} --- Test set: org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1 --- Tests run: 5, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 347.57 sec FAILURE! testScanEmptyToAPP(org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1) Time elapsed: 50.047 sec ERROR! java.io.IOException: java.net.ConnectException: Call From liushaohui-OptiPlex-990/127.0.0.1 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:334) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:524) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311) at java.security.AccessController.doPrivileged(Native Method) ... {code} The reason is that when MR job was running, the job client pulled the job status from AppMaster. When the job is completed, the AppMaster will exit. At this time, if the job client have not got the job completed event from AppMaster, it will switch to get job report from history server. But in HBaseTestingUtility#startMiniMapReduceCluster, the config: mapreduce.jobhistory.address is not copied to TestUtil's config. CRUNCH-249 reported the same problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8963) Add configuration option to skip HFile archiving
[ https://issues.apache.org/jira/browse/HBASE-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941142#comment-13941142 ] Enis Soztutar commented on HBASE-8963: -- I don't imagine anyone wanting to run with skip archive as a global config on production. I think we should not do the global config at all, but allow drop table to have an option to skip. Doesn't snapshots refer to files in the archive? If we do SKIP_ARCHIVE as a table property, the previous snapshots will be broken with compactions I guess. I think we should do a rm -rf kind of think in drop table. If the files are not referred, they are not moved to archive, but deleted instead. Add configuration option to skip HFile archiving Key: HBASE-8963 URL: https://issues.apache.org/jira/browse/HBASE-8963 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: bharath v Fix For: 0.99.0 Attachments: HBASE-8963.trunk.v1.patch, HBASE-8963.trunk.v2.patch, HBASE-8963.trunk.v3.patch, HBASE-8963.trunk.v4.patch, HBASE-8963.trunk.v5.patch, HBASE-8963.trunk.v6.patch, HBASE-8963.trunk.v7.patch Currently HFileArchiver is always called when a table is dropped. A configuration option (either global or per table) should be provided so that archiving can be skipped when table is deleted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HBASE-10690) Drop Hadoop-1 support
[ https://issues.apache.org/jira/browse/HBASE-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-10690. --- Resolution: Fixed Assignee: stack Release Note: Trunk no longer has support for hadoop1. You cannot build against hadoop1. We now generate one artifact only and our artifact naming no longer includes the hadoop version we were built against since we only build against one version, hadoop2: e.g. the hbase 1.0.0 release will be named hbase-1.0.0, not hbase-1.0.0-hadoop1 (or hbase-1.0.0-hadoop2). Hadoop Flags: Incompatible change Resolving this umbrella issue. All sub tasks are done. Documentation is in the refguide, the assembly does not include hadoop1, and the build is straight-forward w/ no need to make a hadoop1 or hadoop2 artifact -- it is all hadoop2 all the time from here on out -- and doc'd in the refguide. Drop Hadoop-1 support - Key: HBASE-10690 URL: https://issues.apache.org/jira/browse/HBASE-10690 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: stack Priority: Critical Fix For: 0.99.0 As per thread: http://mail-archives.apache.org/mod_mbox/hbase-dev/201403.mbox/%3ccamuu0w93mgp7zbbxgccov+be3etmkvn5atzowvzqd_gegdk...@mail.gmail.com%3E It seems that the consensus is that supporting Hadoop-1 in HBase-1.x will be costly, so we should drop the support. In this issue: - We'll document that Hadoop-1 support is deprecated in HBase-0.98. And users should switch to hadoop-2.2+ anyway. - Document that upcoming HBase-0.99 and HBase-1.0 releases will not have Hadoop-1 support. - Document that there is no rolling upgrade support for going between Hadoop-1 and Hadoop-2 (using HBase-0.96 or 0.98). - Release artifacts won't contain HBase build with Hadoop-1. - We may keep the profile, jenkins job etc if we want. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates
Ted Yu created HBASE-10795: -- Summary: TestHBaseFsck#testHBaseFsck() should drop the table it creates Key: HBASE-10795 URL: https://issues.apache.org/jira/browse/HBASE-10795 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Attachments: 10795-v1.txt When investigating TestHBaseFsck test failures, I often saw the following (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/): {code} Number of Tables: 3 Table: tableBadMetaAssign rw families: 1 Table: hbase:namespacerw families: 1 Table: testSplitdaughtersNotInMetarw families: 1 {code} TestHBaseFsck#testHBaseFsck() should drop the table it creates. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates
[ https://issues.apache.org/jira/browse/HBASE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10795: --- Attachment: 10795-v1.txt Patch v1 adds a finally clause to drop the table. TestHBaseFsck#testHBaseFsck() should drop the table it creates -- Key: HBASE-10795 URL: https://issues.apache.org/jira/browse/HBASE-10795 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Attachments: 10795-v1.txt When investigating TestHBaseFsck test failures, I often saw the following (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/): {code} Number of Tables: 3 Table: tableBadMetaAssign rw families: 1 Table: hbase:namespace rw families: 1 Table: testSplitdaughtersNotInMeta rw families: 1 {code} TestHBaseFsck#testHBaseFsck() should drop the table it creates. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates
[ https://issues.apache.org/jira/browse/HBASE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10795: --- Status: Patch Available (was: Open) TestHBaseFsck#testHBaseFsck() should drop the table it creates -- Key: HBASE-10795 URL: https://issues.apache.org/jira/browse/HBASE-10795 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Attachments: 10795-v1.txt When investigating TestHBaseFsck test failures, I often saw the following (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/): {code} Number of Tables: 3 Table: tableBadMetaAssign rw families: 1 Table: hbase:namespace rw families: 1 Table: testSplitdaughtersNotInMeta rw families: 1 {code} TestHBaseFsck#testHBaseFsck() should drop the table it creates. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10791) Add integration test to demonstrate performance improvement
[ https://issues.apache.org/jira/browse/HBASE-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941160#comment-13941160 ] Nick Dimiduk commented on HBASE-10791: -- Looks like I posted an outdated stash from yesterday, instead of the current patch, so some of these details don't make sense (like the PerfEvalCallable constructor args). Updating patch momentarily. Add integration test to demonstrate performance improvement --- Key: HBASE-10791 URL: https://issues.apache.org/jira/browse/HBASE-10791 Project: HBase Issue Type: Sub-task Components: Performance, test Affects Versions: hbase-10070 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10791.00.patch It would be good to demonstrate that use of region replicas reduces read latency. PerformanceEvaluation can be used manually for this purpose, but it's not able to use ChaosMonkey. An integration test can set up the monkey actions and automate execution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10792) RingBufferTruck does not release its payload
[ https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941162#comment-13941162 ] Hadoop QA commented on HBASE-10792: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635650/HBASE-10792.00.patch against trunk revision . ATTACHMENT ID: 12635650 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9048//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9048//console This message is automatically generated. RingBufferTruck does not release its payload Key: HBASE-10792 URL: https://issues.apache.org/jira/browse/HBASE-10792 Project: HBase Issue Type: Bug Components: Performance, wal Affects Versions: 0.99.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10792.00.patch, HBASE-10792.00.patch Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox and watch as HBase eventually dies with an OOM: heap space. Examining the heap dump shows an extremely large retained size of KeyValue and RingBufferTrunk instances. By my eye, the default value of {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a small default heap size, or the RBT instances need to release their payloads after consumers retrieve them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates
[ https://issues.apache.org/jira/browse/HBASE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941172#comment-13941172 ] stack commented on HBASE-10795: --- Why? Doesn't the cluster get shut down irrespective and then the dirs removed? Why do more work? TestHBaseFsck#testHBaseFsck() should drop the table it creates -- Key: HBASE-10795 URL: https://issues.apache.org/jira/browse/HBASE-10795 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Attachments: 10795-v1.txt When investigating TestHBaseFsck test failures, I often saw the following (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/): {code} Number of Tables: 3 Table: tableBadMetaAssign rw families: 1 Table: hbase:namespace rw families: 1 Table: testSplitdaughtersNotInMeta rw families: 1 {code} TestHBaseFsck#testHBaseFsck() should drop the table it creates. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates
[ https://issues.apache.org/jira/browse/HBASE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10795: -- Priority: Trivial (was: Major) TestHBaseFsck#testHBaseFsck() should drop the table it creates -- Key: HBASE-10795 URL: https://issues.apache.org/jira/browse/HBASE-10795 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Trivial Attachments: 10795-v1.txt When investigating TestHBaseFsck test failures, I often saw the following (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/): {code} Number of Tables: 3 Table: tableBadMetaAssign rw families: 1 Table: hbase:namespace rw families: 1 Table: testSplitdaughtersNotInMeta rw families: 1 {code} TestHBaseFsck#testHBaseFsck() should drop the table it creates. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure
[ https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941174#comment-13941174 ] Hudson commented on HBASE-10786: ABORTED: Integrated in HBase-0.98-on-Hadoop-1.1 #226 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/226/]) HBASE-10786 If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure (tedyu: rev 1579373) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/MasterSnapshotVerifier.java If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure -- Key: HBASE-10786 URL: https://issues.apache.org/jira/browse/HBASE-10786 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.99.0, 0.98.2 Attachments: 10786-v1.txt, 10786-v2.txt, 10786-v3.txt I was trying to find cause for test failure in https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/ : {code} org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an error. Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] } at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342) at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8 at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332) ... 11 more {code} However, it is not clear which region caused the verification to fail. I searched for log from balancer but found none. The exception message should include region name which caused the verification to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure
[ https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941176#comment-13941176 ] Hudson commented on HBASE-10786: ABORTED: Integrated in HBase-TRUNK #5024 (See [https://builds.apache.org/job/HBase-TRUNK/5024/]) HBASE-10786 If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure (tedyu: rev 1579374) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/MasterSnapshotVerifier.java If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure -- Key: HBASE-10786 URL: https://issues.apache.org/jira/browse/HBASE-10786 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.99.0, 0.98.2 Attachments: 10786-v1.txt, 10786-v2.txt, 10786-v3.txt I was trying to find cause for test failure in https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/ : {code} org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an error. Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] } at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342) at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8 at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332) ... 11 more {code} However, it is not clear which region caused the verification to fail. I searched for log from balancer but found none. The exception message should include region name which caused the verification to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10787) TestHCM#testConnection* take too long
[ https://issues.apache.org/jira/browse/HBASE-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941175#comment-13941175 ] Hudson commented on HBASE-10787: ABORTED: Integrated in HBase-TRUNK #5024 (See [https://builds.apache.org/job/HBase-TRUNK/5024/]) HBASE-10787 TestHCM#testConnection* takes too long (Ted Yu) (apurtell: rev 1579358) * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java TestHCM#testConnection* take too long - Key: HBASE-10787 URL: https://issues.apache.org/jira/browse/HBASE-10787 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.99.0 Attachments: 10787-v1.txt TestHCM#testConnectionClose takes more than 5 minutes on Apache Jenkins. The test can be shortened when retry count is lowered. On my Mac, for TestHCM#testConnection* (two tests) without patch: {code} Running org.apache.hadoop.hbase.client.TestHCM 2014-03-18 15:46:57.695 java[71368:1203] Unable to load realm info from SCDynamicStore Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 242.2 sec {code} with patch: {code} Running org.apache.hadoop.hbase.client.TestHCM 2014-03-18 15:40:44.013 java[71184:1203] Unable to load realm info from SCDynamicStore Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 100.465 sec {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure
[ https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941173#comment-13941173 ] Hudson commented on HBASE-10786: ABORTED: Integrated in HBase-0.98 #242 (See [https://builds.apache.org/job/HBase-0.98/242/]) HBASE-10786 If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure (tedyu: rev 1579373) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/MasterSnapshotVerifier.java If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure -- Key: HBASE-10786 URL: https://issues.apache.org/jira/browse/HBASE-10786 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.99.0, 0.98.2 Attachments: 10786-v1.txt, 10786-v2.txt, 10786-v3.txt I was trying to find cause for test failure in https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/ : {code} org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an error. Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] } at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342) at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8 at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332) ... 11 more {code} However, it is not clear which region caused the verification to fail. I searched for log from balancer but found none. The exception message should include region name which caused the verification to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates
[ https://issues.apache.org/jira/browse/HBASE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941178#comment-13941178 ] Ted Yu commented on HBASE-10795: Looking at other tests in TestHBaseFsck, such as testHBaseFsckClean(), the pattern is: {code} try { HBaseFsck hbck = doFsck(conf, false); assertNoErrors(hbck); setupTable(table); ... } finally { deleteTable(table); } {code} TestHBaseFsck#testHBaseFsck() should be consistent with the other tests. TestHBaseFsck#testHBaseFsck() should drop the table it creates -- Key: HBASE-10795 URL: https://issues.apache.org/jira/browse/HBASE-10795 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Trivial Attachments: 10795-v1.txt When investigating TestHBaseFsck test failures, I often saw the following (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/): {code} Number of Tables: 3 Table: tableBadMetaAssign rw families: 1 Table: hbase:namespace rw families: 1 Table: testSplitdaughtersNotInMeta rw families: 1 {code} TestHBaseFsck#testHBaseFsck() should drop the table it creates. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10793) AuthFailed as a valid zookeeper state
[ https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941181#comment-13941181 ] Demai Ni commented on HBASE-10793: -- [~yuzhih...@gmail.com],[~andrew.purt...@gmail.com],[~stack], thanks a lot for the review Demai AuthFailed as a valid zookeeper state -- Key: HBASE-10793 URL: https://issues.apache.org/jira/browse/HBASE-10793 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.0 Reporter: Demai Ni Assignee: Demai Ni Fix For: 0.96.2, 0.99.0, 0.98.2 Attachments: HBASE-10793-trunk-v0.patch In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed message indicates the client could not be authenticated, but it should proceed anyway, because only access to znodes that require SASL authentication will be denied and this client may never need to access them. Furthermore, AuthFailed is a valid event supported by Zookeeper, and following are valid Zookeeper events: case0: return KeeperState.Disconnected; case3: return KeeperState.SyncConnected; case4: return KeeperState.AuthFailed; case5: return KeeperState.ConnectedReadOnly; case6: return KeeperState.SaslAuthenticated; case -112: return KeeperState.Expired; Based on above, ZooKeeperWatcher should not throw exception for AuthFailed event as an invalid event. For this kind of event, Zookeeper already logs it as a warning and proceed with non-SASL connection. {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid} hbase(main):006:0 list TABLE 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher java.lang.IllegalStateException: Received event is not valid: AuthFailed at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) BIMonitoring BIMonitoringSummary BIMonitoringSummary180 BIMonitoringSummary900 LogMetadata LogRecords Mtable t1 t2 9 row(s) in 0.4040 seconds = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2] {code} the patch will be similar as HBase-8757 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10793) AuthFailed as a valid zookeeper state
[ https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10793: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.96-trunk. AuthFailed as a valid zookeeper state -- Key: HBASE-10793 URL: https://issues.apache.org/jira/browse/HBASE-10793 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.0 Reporter: Demai Ni Assignee: Demai Ni Fix For: 0.96.2, 0.99.0, 0.98.2 Attachments: HBASE-10793-trunk-v0.patch In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed message indicates the client could not be authenticated, but it should proceed anyway, because only access to znodes that require SASL authentication will be denied and this client may never need to access them. Furthermore, AuthFailed is a valid event supported by Zookeeper, and following are valid Zookeeper events: case0: return KeeperState.Disconnected; case3: return KeeperState.SyncConnected; case4: return KeeperState.AuthFailed; case5: return KeeperState.ConnectedReadOnly; case6: return KeeperState.SaslAuthenticated; case -112: return KeeperState.Expired; Based on above, ZooKeeperWatcher should not throw exception for AuthFailed event as an invalid event. For this kind of event, Zookeeper already logs it as a warning and proceed with non-SASL connection. {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid} hbase(main):006:0 list TABLE 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher java.lang.IllegalStateException: Received event is not valid: AuthFailed at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) BIMonitoring BIMonitoringSummary BIMonitoringSummary180 BIMonitoringSummary900 LogMetadata LogRecords Mtable t1 t2 9 row(s) in 0.4040 seconds = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2] {code} the patch will be similar as HBase-8757 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10796) Set default log level as INFO
stack created HBASE-10796: - Summary: Set default log level as INFO Key: HBASE-10796 URL: https://issues.apache.org/jira/browse/HBASE-10796 Project: HBase Issue Type: Task Reporter: stack Assignee: stack When we roll out 1.0, the log level should be INFO-level by default, not DEBUG. Proposed on mailing list here http://search-hadoop.com/m/33P7E1GL08b/hbase+1.0subj=DISCUSSION+1+0+0 and at least one other +1 with no objection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HBASE-3014) Change UnknownScannerException log level to WARN
[ https://issues.apache.org/jira/browse/HBASE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-3014. -- Resolution: Cannot Reproduce Marking as 'can not repro'. I think this issue is actually fixed after doing a survey. No where do we log this exception explicitly at the ERROR level (not any more at least). It is all INFO-level that I can see. Change UnknownScannerException log level to WARN Key: HBASE-3014 URL: https://issues.apache.org/jira/browse/HBASE-3014 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.20.6 Reporter: Ken Weiner Priority: Trivial Attachments: hbase-3014.patch I see a lot of UnknownScannerException messages in the log at ERROR level when I'm running a MapReduce job that scans an HBase table. These messages are logged under normal conditions, and according to [~jdcryans], should probably be logged at a less severe log level like WARN. Example error message: {code} 2010-09-16 09:20:52,398 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.hbase.UnknownScannerException: Name: -8711007779313115048 at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1880) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) {code} Reference to the HBase users mailing list thread where this was originally discussed: http://markmail.org/thread/ttzbi6c7et6mrq6o This is a simple, change, so I didn't include a formal patch. If one is required, I will gladly create and attach one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates
[ https://issues.apache.org/jira/browse/HBASE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941198#comment-13941198 ] stack commented on HBASE-10795: --- Is this responsible for the test failure? TestHBaseFsck#testHBaseFsck() should drop the table it creates -- Key: HBASE-10795 URL: https://issues.apache.org/jira/browse/HBASE-10795 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Trivial Attachments: 10795-v1.txt When investigating TestHBaseFsck test failures, I often saw the following (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/): {code} Number of Tables: 3 Table: tableBadMetaAssign rw families: 1 Table: hbase:namespace rw families: 1 Table: testSplitdaughtersNotInMeta rw families: 1 {code} TestHBaseFsck#testHBaseFsck() should drop the table it creates. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates
[ https://issues.apache.org/jira/browse/HBASE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941203#comment-13941203 ] Ted Yu commented on HBASE-10795: I mentioned the test failure since the Standard Output led me to TestHBaseFsck#testHBaseFsck(). This JIRA aligns TestHBaseFsck#testHBaseFsck() with rest of the tests. Investigation of test failure of TestHBaseFsck#testSplitDaughtersNotInMeta is on-going. TestHBaseFsck#testHBaseFsck() should drop the table it creates -- Key: HBASE-10795 URL: https://issues.apache.org/jira/browse/HBASE-10795 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Trivial Attachments: 10795-v1.txt When investigating TestHBaseFsck test failures, I often saw the following (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/): {code} Number of Tables: 3 Table: tableBadMetaAssign rw families: 1 Table: hbase:namespace rw families: 1 Table: testSplitdaughtersNotInMeta rw families: 1 {code} TestHBaseFsck#testHBaseFsck() should drop the table it creates. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10792) RingBufferTruck does not release its payload
[ https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10792: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk. Thanks for the reviews. RingBufferTruck does not release its payload Key: HBASE-10792 URL: https://issues.apache.org/jira/browse/HBASE-10792 Project: HBase Issue Type: Bug Components: Performance, wal Affects Versions: 0.99.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HBASE-10792.00.patch, HBASE-10792.00.patch Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox and watch as HBase eventually dies with an OOM: heap space. Examining the heap dump shows an extremely large retained size of KeyValue and RingBufferTrunk instances. By my eye, the default value of {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a small default heap size, or the RBT instances need to release their payloads after consumers retrieve them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10792) RingBufferTruck does not release its payload
[ https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10792: - Fix Version/s: 0.99.0 RingBufferTruck does not release its payload Key: HBASE-10792 URL: https://issues.apache.org/jira/browse/HBASE-10792 Project: HBase Issue Type: Bug Components: Performance, wal Affects Versions: 0.99.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.99.0 Attachments: HBASE-10792.00.patch, HBASE-10792.00.patch Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox and watch as HBase eventually dies with an OOM: heap space. Examining the heap dump shows an extremely large retained size of KeyValue and RingBufferTrunk instances. By my eye, the default value of {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a small default heap size, or the RBT instances need to release their payloads after consumers retrieve them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10797) Add support for -h and --help to rolling_restart.sh and fix the usage string output
[ https://issues.apache.org/jira/browse/HBASE-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10797: -- Attachment: 10797.txt Small patch Add support for -h and --help to rolling_restart.sh and fix the usage string output --- Key: HBASE-10797 URL: https://issues.apache.org/jira/browse/HBASE-10797 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Priority: Trivial Attachments: 10797.txt Messing with rolling restart, when you pass -h or --help, you get a mess for output w/ an odd 'bad argument' complaint The usage string printed also was incomplete w curlies in it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10797) Add support for -h and --help to rolling_restart.sh and fix the usage string output
stack created HBASE-10797: - Summary: Add support for -h and --help to rolling_restart.sh and fix the usage string output Key: HBASE-10797 URL: https://issues.apache.org/jira/browse/HBASE-10797 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Priority: Trivial Attachments: 10797.txt Messing with rolling restart, when you pass -h or --help, you get a mess for output w/ an odd 'bad argument' complaint The usage string printed also was incomplete w curlies in it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HBASE-10797) Add support for -h and --help to rolling_restart.sh and fix the usage string output
[ https://issues.apache.org/jira/browse/HBASE-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-10797. --- Resolution: Fixed Fix Version/s: 0.96.3 0.98.2 0.99.0 Committed trivial patch to 0.96-0.99 Add support for -h and --help to rolling_restart.sh and fix the usage string output --- Key: HBASE-10797 URL: https://issues.apache.org/jira/browse/HBASE-10797 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Priority: Trivial Fix For: 0.99.0, 0.98.2, 0.96.3 Attachments: 10797.txt Messing with rolling restart, when you pass -h or --help, you get a mess for output w/ an odd 'bad argument' complaint The usage string printed also was incomplete w curlies in it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10797) Add support for -h and --help to rolling_restart.sh and fix the usage string output
[ https://issues.apache.org/jira/browse/HBASE-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941215#comment-13941215 ] Hudson commented on HBASE-10797: FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #123 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/123/]) HBASE-10797 Add support for -h and --help to rolling_restart.sh and fix the usage string output (stack: rev 1579477) * /hbase/trunk/bin/rolling-restart.sh Add support for -h and --help to rolling_restart.sh and fix the usage string output --- Key: HBASE-10797 URL: https://issues.apache.org/jira/browse/HBASE-10797 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Priority: Trivial Fix For: 0.99.0, 0.98.2, 0.96.3 Attachments: 10797.txt Messing with rolling restart, when you pass -h or --help, you get a mess for output w/ an odd 'bad argument' complaint The usage string printed also was incomplete w curlies in it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10787) TestHCM#testConnection* take too long
[ https://issues.apache.org/jira/browse/HBASE-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941213#comment-13941213 ] Hudson commented on HBASE-10787: FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #123 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/123/]) HBASE-10787 TestHCM#testConnection* takes too long (Ted Yu) (apurtell: rev 1579358) * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java TestHCM#testConnection* take too long - Key: HBASE-10787 URL: https://issues.apache.org/jira/browse/HBASE-10787 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.99.0 Attachments: 10787-v1.txt TestHCM#testConnectionClose takes more than 5 minutes on Apache Jenkins. The test can be shortened when retry count is lowered. On my Mac, for TestHCM#testConnection* (two tests) without patch: {code} Running org.apache.hadoop.hbase.client.TestHCM 2014-03-18 15:46:57.695 java[71368:1203] Unable to load realm info from SCDynamicStore Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 242.2 sec {code} with patch: {code} Running org.apache.hadoop.hbase.client.TestHCM 2014-03-18 15:40:44.013 java[71184:1203] Unable to load realm info from SCDynamicStore Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 100.465 sec {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1
[ https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941212#comment-13941212 ] Hudson commented on HBASE-10781: FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #123 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/123/]) HBASE-10781 Remove hadoop-one-compat module and all references to hadoop1 (stack: rev 1579449) * /hbase/trunk/dev-support/generate-hadoopX-poms.sh * /hbase/trunk/dev-support/make_rc.sh * /hbase/trunk/hbase-assembly/src/main/assembly/components.xml * /hbase/trunk/hbase-assembly/src/main/assembly/hadoop-one-compat.xml * /hbase/trunk/hbase-hadoop1-compat * /hbase/trunk/pom.xml * /hbase/trunk/src/main/docbkx/developer.xml Remove hadoop-one-compat module and all references to hadoop1 - Key: HBASE-10781 URL: https://issues.apache.org/jira/browse/HBASE-10781 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Fix For: 0.99.0 Attachments: 10690.txt, 10690v2.txt, 10781v3.txt Clean out hadoop1 references. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10792) RingBufferTruck does not release its payload
[ https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941211#comment-13941211 ] Hudson commented on HBASE-10792: FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #123 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/123/]) HBASE-10792 RingBufferTruck does not release its payload (ndimiduk: rev 1579475) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/RingBufferTruck.java RingBufferTruck does not release its payload Key: HBASE-10792 URL: https://issues.apache.org/jira/browse/HBASE-10792 Project: HBase Issue Type: Bug Components: Performance, wal Affects Versions: 0.99.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.99.0 Attachments: HBASE-10792.00.patch, HBASE-10792.00.patch Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox and watch as HBase eventually dies with an OOM: heap space. Examining the heap dump shows an extremely large retained size of KeyValue and RingBufferTrunk instances. By my eye, the default value of {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a small default heap size, or the RBT instances need to release their payloads after consumers retrieve them. -- This message was sent by Atlassian JIRA (v6.2#6252)