[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460253#comment-13460253 ] Lars Hofhansl commented on HBASE-6852: -- Interesting. Thanks Cheng. I wonder what causes the performance problem then. Is it the get/putIfAbsent of the ConcurrentMap we store the metrics in? I'd probably feel better if you set the threshold to 100 (instead of 2000) - you'd still reduce the time used there by 99%. Also looking at the places where updateOnCacheHit is called... We also increment an AtomicLong (cacheHits), which is never read (WTF). We should remove that counter while we're at it (even when AtomicLongs are not the problem). SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.2, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6841) Meta prefetching is slower than doing multiple meta lookups
[ https://issues.apache.org/jira/browse/HBASE-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460254#comment-13460254 ] Lars Hofhansl commented on HBASE-6841: -- Haven't been able to track down that test failure, yet. It shouldn't happen, but yet somehow it does. @J-D: Since this is (presumably) a long standing condition, how do you feel about moving this to 0.94.3? Meta prefetching is slower than doing multiple meta lookups --- Key: HBASE-6841 URL: https://issues.apache.org/jira/browse/HBASE-6841 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Priority: Critical Fix For: 0.94.2 Attachments: 6841-0.94.txt, 6841-0.96.txt I got myself into a situation where I needed to truncate a massive table while it was getting hits and surprisingly the clients were not recovering. What I see in the logs is that every time we prefetch .META. we setup a new HConnection because we close it on the way out. It's awfully slow. We should just turn it off or make it useful. jstacks coming up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6806) HBASE-4658 breaks backward compatibility / example scripts
[ https://issues.apache.org/jira/browse/HBASE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460258#comment-13460258 ] Hudson commented on HBASE-6806: --- Integrated in HBase-TRUNK #3363 (See [https://builds.apache.org/job/HBase-TRUNK/3363/]) HBASE-6806 HBASE-4658 breaks backward compatibility / example scripts (Revision 1388318) Result = FAILURE stack : Files : * /hbase/trunk/examples/thrift/DemoClient.cpp * /hbase/trunk/examples/thrift/DemoClient.java * /hbase/trunk/examples/thrift/DemoClient.php * /hbase/trunk/examples/thrift/DemoClient.pl * /hbase/trunk/examples/thrift/DemoClient.py * /hbase/trunk/examples/thrift/DemoClient.rb * /hbase/trunk/examples/thrift/Makefile HBASE-4658 breaks backward compatibility / example scripts -- Key: HBASE-6806 URL: https://issues.apache.org/jira/browse/HBASE-6806 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.94.0 Reporter: Lukas Fix For: 0.96.0 Attachments: HBASE-6806-fix-examples.diff HBASE-4658 introduces the new 'attributes' argument as a non optional parameter. This is not backward compatible and also breaks the code in the example section. Resolution: Mark as 'optional' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4658) Put attributes are not exposed via the ThriftServer
[ https://issues.apache.org/jira/browse/HBASE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460259#comment-13460259 ] Hudson commented on HBASE-4658: --- Integrated in HBase-TRUNK #3363 (See [https://builds.apache.org/job/HBase-TRUNK/3363/]) HBASE-6806 HBASE-4658 breaks backward compatibility / example scripts (Revision 1388318) Result = FAILURE stack : Files : * /hbase/trunk/examples/thrift/DemoClient.cpp * /hbase/trunk/examples/thrift/DemoClient.java * /hbase/trunk/examples/thrift/DemoClient.php * /hbase/trunk/examples/thrift/DemoClient.pl * /hbase/trunk/examples/thrift/DemoClient.py * /hbase/trunk/examples/thrift/DemoClient.rb * /hbase/trunk/examples/thrift/Makefile Put attributes are not exposed via the ThriftServer --- Key: HBASE-4658 URL: https://issues.apache.org/jira/browse/HBASE-4658 Project: HBase Issue Type: Bug Components: Thrift Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: ASF.LICENSE.NOT.GRANTED--D1563.1.patch, ASF.LICENSE.NOT.GRANTED--D1563.1.patch, ASF.LICENSE.NOT.GRANTED--D1563.1.patch, ASF.LICENSE.NOT.GRANTED--D1563.2.patch, ASF.LICENSE.NOT.GRANTED--D1563.2.patch, ASF.LICENSE.NOT.GRANTED--D1563.2.patch, ASF.LICENSE.NOT.GRANTED--D1563.3.patch, ASF.LICENSE.NOT.GRANTED--D1563.3.patch, ASF.LICENSE.NOT.GRANTED--D1563.3.patch, ThriftPutAttributes1.txt The Put api also takes in a bunch of arbitrary attributes that an application can use to associate metadata with each put operation. This is not exposed via Thrift. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6524) Hooks for hbase tracing
[ https://issues.apache.org/jira/browse/HBASE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460265#comment-13460265 ] stack commented on HBASE-6524: -- Committed the doc. as appendix I in the manual. Will show next time I push the doc. Thanks Jonathan. Hooks for hbase tracing --- Key: HBASE-6524 URL: https://issues.apache.org/jira/browse/HBASE-6524 Project: HBase Issue Type: Sub-task Reporter: Jonathan Leavitt Assignee: Jonathan Leavitt Fix For: 0.96.0 Attachments: 6524.addendum, 6524-v2.txt, 6524v3.txt, createTableTrace.png, hbase-6524.diff Includes the hooks that use [htrace|http://www.github.com/cloudera/htrace] library to add dapper-like tracing to hbase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460266#comment-13460266 ] Hadoop QA commented on HBASE-6299: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546000/6299v4.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. -1 javadoc. The javadoc tool appears to have generated 139 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2912//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2912//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2912//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2912//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2912//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2912//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2912//console This message is automatically generated. RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems. - Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.94.0 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Fix For: 0.92.3, 0.94.3, 0.96.0 Attachments: 6299v4.txt, HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568
[jira] [Commented] (HBASE-4658) Put attributes are not exposed via the ThriftServer
[ https://issues.apache.org/jira/browse/HBASE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460269#comment-13460269 ] stack commented on HBASE-4658: -- The above comment from hudson is in wrong place. The parse found the second hbase jira referenced which is this one rather than HBASE-6806. Put attributes are not exposed via the ThriftServer --- Key: HBASE-4658 URL: https://issues.apache.org/jira/browse/HBASE-4658 Project: HBase Issue Type: Bug Components: Thrift Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: ASF.LICENSE.NOT.GRANTED--D1563.1.patch, ASF.LICENSE.NOT.GRANTED--D1563.1.patch, ASF.LICENSE.NOT.GRANTED--D1563.1.patch, ASF.LICENSE.NOT.GRANTED--D1563.2.patch, ASF.LICENSE.NOT.GRANTED--D1563.2.patch, ASF.LICENSE.NOT.GRANTED--D1563.2.patch, ASF.LICENSE.NOT.GRANTED--D1563.3.patch, ASF.LICENSE.NOT.GRANTED--D1563.3.patch, ASF.LICENSE.NOT.GRANTED--D1563.3.patch, ThriftPutAttributes1.txt The Put api also takes in a bunch of arbitrary attributes that an application can use to associate metadata with each put operation. This is not exposed via Thrift. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6806) HBASE-4658 breaks backward compatibility / example scripts
[ https://issues.apache.org/jira/browse/HBASE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460271#comment-13460271 ] stack commented on HBASE-6806: -- Hmm... it puts the commit in all issues referenced by the commit message, here and HBASE-4658 HBASE-4658 breaks backward compatibility / example scripts -- Key: HBASE-6806 URL: https://issues.apache.org/jira/browse/HBASE-6806 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.94.0 Reporter: Lukas Fix For: 0.96.0 Attachments: HBASE-6806-fix-examples.diff HBASE-4658 introduces the new 'attributes' argument as a non optional parameter. This is not backward compatible and also breaks the code in the example section. Resolution: Mark as 'optional' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460275#comment-13460275 ] liang xie commented on HBASE-6852: -- Hi Cheng, for running time, could you exclude the system resouce factor ? e.g. you ran the original version with many physical IOs, but reran the patched version without similar physical IO requests due to hitting OS page cache. In other words, could the reduced running time symptom be reproduced always, even you run patched version first, then rerun the original version ? It'd better if you can issue echo 1 /proc/sys/vm/drop_caches to free pagecache between each test. SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.2, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460280#comment-13460280 ] Cheng Hao commented on HBASE-6852: -- Lars, the only place to use the ConcurentMap in SchemaMetrics is tableAndFamilyToMetrics. in this patch, I pre-create an array of AtomicLong for all of the possible oncachehit metrics items, which will avoids the concurrent issue and easy to be indexed while accessing. Thanks stack and Lars for the suggestions, I will create another patch file instead. SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.2, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460297#comment-13460297 ] Cheng Hao commented on HBASE-6852: -- Hi Liang, it's really good suggestion. Actually I didn't free the pagecache of OS before each launch. But I can try that later. In my tests, the table data was about 600GB within 4 machines, I guess the system cache may not impact the entire performance so much for a full table scanning. SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.2, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated HBASE-6852: - Attachment: onhitcache-trunk.patch change the THRESHOLD_METRICS_FLUSH from 2000 to 100, per Lars' suggestion SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.2, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated HBASE-6852: - Attachment: (was: onhitcache-trunk.patch) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.2, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6381) AssignmentManager should use the same logic for clean startup and failover
[ https://issues.apache.org/jira/browse/HBASE-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460315#comment-13460315 ] ramkrishna.s.vasudevan commented on HBASE-6381: --- @Jimmy The fix for Rajesh's comment seems valid. I only have 2 questions -Will these changes solve HBASE-6228 or do we still need to add some synchronization while fixupdaughters? -In GeneralBulkAssigner {code} while (regionInfoIterator.hasNext()) { HRegionInfo hri = regionInfoIterator.next(); RegionState state = regionStates.getRegionState(hri); if ((!regionStates.isRegionInTransition(hri) regionStates.isRegionAssigned(hri)) || state.isSplit() || state.isSplitting()) { regionInfoIterator.remove(); {code} This removal from regionInfoIterator may not be needed. Anyway SSH is handling this case. And also as part of HBASE-6317 EnableTableHandler will handle RIT regions and already assigned region. In CreateTable this problem should not happen. So we can remove this piece of code from GeneralBulkAssigner? what you feel? Other than that I am +1. The ZKTable change can be done in a new JIRA as you said. AssignmentManager should use the same logic for clean startup and failover -- Key: HBASE-6381 URL: https://issues.apache.org/jira/browse/HBASE-6381 Project: HBase Issue Type: Bug Components: master Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: hbase-6381-notes.pdf, hbase-6381.pdf, trunk-6381_v5.patch, trunk-6381_v7.patch, trunk-6381_v8.patch Currently AssignmentManager handles clean startup and failover very differently. Different logic is mingled together so it is hard to find out which is for which. We should clean it up and share the same logic so that AssignmentManager handles both cases the same way. This way, the code will much easier to understand and maintain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460316#comment-13460316 ] Cheng Hao commented on HBASE-6852: -- I didn't remove the cacheHits in the HFileReaderV1 V2, hope it's a good start to design a less overhead metrics framework. SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.2, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460318#comment-13460318 ] Hadoop QA commented on HBASE-6852: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546009/onhitcache-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2913//console This message is automatically generated. SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.2, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6491) add limit function at ClientScanner
[ https://issues.apache.org/jira/browse/HBASE-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460324#comment-13460324 ] ronghai.ma commented on HBASE-6491: --- [~jeason] Scenario : scaning more than one region. add limit function at ClientScanner --- Key: HBASE-6491 URL: https://issues.apache.org/jira/browse/HBASE-6491 Project: HBase Issue Type: New Feature Components: Client Affects Versions: 0.96.0 Reporter: ronghai.ma Assignee: ronghai.ma Labels: patch Fix For: 0.96.0 Attachments: ClientScanner.java, HBASE-6491.patch Add a new method in ClientScanner to implement a function like LIMIT in MySQL. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6524) Hooks for hbase tracing
[ https://issues.apache.org/jira/browse/HBASE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460326#comment-13460326 ] Hudson commented on HBASE-6524: --- Integrated in HBase-TRUNK #3364 (See [https://builds.apache.org/job/HBase-TRUNK/3364/]) HBASE-6524 Hooks for hbase tracing; add documentation as an appendix (Revision 1388337) Result = FAILURE stack : Files : * /hbase/trunk/src/docbkx/book.xml Hooks for hbase tracing --- Key: HBASE-6524 URL: https://issues.apache.org/jira/browse/HBASE-6524 Project: HBase Issue Type: Sub-task Reporter: Jonathan Leavitt Assignee: Jonathan Leavitt Fix For: 0.96.0 Attachments: 6524.addendum, 6524-v2.txt, 6524v3.txt, createTableTrace.png, hbase-6524.diff Includes the hooks that use [htrace|http://www.github.com/cloudera/htrace] library to add dapper-like tracing to hbase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6783) Make read short circuit the default
[ https://issues.apache.org/jira/browse/HBASE-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460357#comment-13460357 ] nkeywal commented on HBASE-6783: I checked, I didn't reproduced the error found on hadoop-qa locally. Committed revision 1388374. Make read short circuit the default --- Key: HBASE-6783 URL: https://issues.apache.org/jira/browse/HBASE-6783 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 6783.v2.patch, 6783.v2.patch, HBASE-6783.v1.patch Per mailing discussion, read short circuit has little or no drawback, hence should used by default. As a consequence, we activate it on the default tests. It's possible to launch the test with -Ddfs.client.read.shortcircuit=false to execute the tests without the shortcircuit, it will be used for some builds on trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
[ https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igal Shilman updated HBASE-6071: Attachment: HBASE-6071.v4.patch Changing debug level to info. getRegionServerWithRetires, should log unsuccessful attempts and exceptions. Key: HBASE-6071 URL: https://issues.apache.org/jira/browse/HBASE-6071 Project: HBase Issue Type: Improvement Components: Client, IPC/RPC Affects Versions: 0.92.0, 0.94.0 Reporter: Igal Shilman Priority: Minor Labels: client, ipc Attachments: HBASE-6071.patch, HBASE-6071.v2.patch, HBASE-6071.v3.patch, HBASE-6071.v4.patch, HConnectionManager_HBASE-6071-0.90.0.patch, lease-exception.txt HConnectionImplementation.getRegionServerWithRetries might terminate w/ an exception different then a DoNotRetryIOException, thus silently drops exceptions from previous attempts. [~ted_yu] suggested ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E]) adding a log message inside the catch block describing the exception type and details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes
[ https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-6488: --- Attachment: HBASE-6488-trunk.txt here is thine trunk. stop renaming paths! HBase wont run on IPv6 on OSes that use zone-indexes Key: HBASE-6488 URL: https://issues.apache.org/jira/browse/HBASE-6488 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ryan rawson Attachments: HBASE-6488-trunk.txt, HBASE-6488.txt In IPv6, an address may have a zone-index, which is specified with a percent, eg: ...%0. This looks like a format string, and thus in a part of the code which uses the hostname as a prefix to another string which is interpreted with String.format, you end up with an exception: 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.util.UnknownFormatConversionException: Conversion = '0' at java.util.Formatter.checkText(Formatter.java:2503) at java.util.Formatter.parse(Formatter.java:2467) at java.util.Formatter.format(Formatter.java:2414) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68) at org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227) at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220) at java.lang.Thread.run(Thread.java:680) 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6783) Make read short circuit the default
[ https://issues.apache.org/jira/browse/HBASE-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460400#comment-13460400 ] Hudson commented on HBASE-6783: --- Integrated in HBase-TRUNK #3365 (See [https://builds.apache.org/job/HBase-TRUNK/3365/]) HBASE-6783 Make read short circuit the default (Revision 1388374) Result = FAILURE nkeywal : Files : * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java Make read short circuit the default --- Key: HBASE-6783 URL: https://issues.apache.org/jira/browse/HBASE-6783 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 6783.v2.patch, 6783.v2.patch, HBASE-6783.v1.patch Per mailing discussion, read short circuit has little or no drawback, hence should used by default. As a consequence, we activate it on the default tests. It's possible to launch the test with -Ddfs.client.read.shortcircuit=false to execute the tests without the shortcircuit, it will be used for some builds on trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6853) IlegalArgument Exception is thrown when an empty region is spliitted.
ramkrishna.s.vasudevan created HBASE-6853: - Summary: IlegalArgument Exception is thrown when an empty region is spliitted. Key: HBASE-6853 URL: https://issues.apache.org/jira/browse/HBASE-6853 Project: HBase Issue Type: Bug Affects Versions: 0.94.1, 0.92.1 Reporter: ramkrishna.s.vasudevan This is w.r.t a mail sent in the dev mail list. Empty region split should be handled gracefully. Either we should not allow the split to happen if we know that the region is empty or we should allow the split to happen by setting the no of threads to the thread pool executor as 1. {code} int nbFiles = hstoreFilesToSplit.size(); ThreadFactoryBuilder builder = new ThreadFactoryBuilder(); builder.setNameFormat(StoreFileSplitter-%1$d); ThreadFactory factory = builder.build(); ThreadPoolExecutor threadPool = (ThreadPoolExecutor) Executors.newFixedThreadPool(nbFiles, factory); ListFutureVoid futures = new ArrayListFutureVoid(nbFiles); {code} Here the nbFiles needs to be a non zero positive value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6783) Make read short circuit the default
[ https://issues.apache.org/jira/browse/HBASE-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6783: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Make read short circuit the default --- Key: HBASE-6783 URL: https://issues.apache.org/jira/browse/HBASE-6783 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 6783.v2.patch, 6783.v2.patch, HBASE-6783.v1.patch Per mailing discussion, read short circuit has little or no drawback, hence should used by default. As a consequence, we activate it on the default tests. It's possible to launch the test with -Ddfs.client.read.shortcircuit=false to execute the tests without the shortcircuit, it will be used for some builds on trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6783) Make read short circuit the default
[ https://issues.apache.org/jira/browse/HBASE-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460406#comment-13460406 ] nkeywal commented on HBASE-6783: I still need to make it non default for some specific builds mentioned in the mails with Andrew. Will do. Make read short circuit the default --- Key: HBASE-6783 URL: https://issues.apache.org/jira/browse/HBASE-6783 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 6783.v2.patch, 6783.v2.patch, HBASE-6783.v1.patch Per mailing discussion, read short circuit has little or no drawback, hence should used by default. As a consequence, we activate it on the default tests. It's possible to launch the test with -Ddfs.client.read.shortcircuit=false to execute the tests without the shortcircuit, it will be used for some builds on trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-6698: - Assignee: Priyadarshini Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Assignee: Priyadarshini Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_7.patch, HBASE-6698_8.patch, HBASE-6698_8.patch, HBASE-6698_8.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
[ https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460415#comment-13460415 ] Harsh J commented on HBASE-6071: Thanks for addressing my comment Igal, this looks good to me (am not a committer on HBase). getRegionServerWithRetires, should log unsuccessful attempts and exceptions. Key: HBASE-6071 URL: https://issues.apache.org/jira/browse/HBASE-6071 Project: HBase Issue Type: Improvement Components: Client, IPC/RPC Affects Versions: 0.92.0, 0.94.0 Reporter: Igal Shilman Priority: Minor Labels: client, ipc Attachments: HBASE-6071.patch, HBASE-6071.v2.patch, HBASE-6071.v3.patch, HBASE-6071.v4.patch, HConnectionManager_HBASE-6071-0.90.0.patch, lease-exception.txt HConnectionImplementation.getRegionServerWithRetries might terminate w/ an exception different then a DoNotRetryIOException, thus silently drops exceptions from previous attempts. [~ted_yu] suggested ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E]) adding a log message inside the catch block describing the exception type and details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes
[ https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460416#comment-13460416 ] Hadoop QA commented on HBASE-6488: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546020/HBASE-6488-trunk.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. -1 javadoc. The javadoc tool appears to have generated 139 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2914//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2914//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2914//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2914//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2914//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2914//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2914//console This message is automatically generated. HBase wont run on IPv6 on OSes that use zone-indexes Key: HBASE-6488 URL: https://issues.apache.org/jira/browse/HBASE-6488 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ryan rawson Attachments: HBASE-6488-trunk.txt, HBASE-6488.txt In IPv6, an address may have a zone-index, which is specified with a percent, eg: ...%0. This looks like a format string, and thus in a part of the code which uses the hostname as a prefix to another string which is interpreted with String.format, you end up with an exception: 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.util.UnknownFormatConversionException: Conversion = '0' at java.util.Formatter.checkText(Formatter.java:2503) at java.util.Formatter.parse(Formatter.java:2467) at java.util.Formatter.format(Formatter.java:2414) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68) at org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227) at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220) at java.lang.Thread.run(Thread.java:680) 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
[ https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460429#comment-13460429 ] Igal Shilman commented on HBASE-6071: - [~qwertymaniac] n/p. getRegionServerWithRetires, should log unsuccessful attempts and exceptions. Key: HBASE-6071 URL: https://issues.apache.org/jira/browse/HBASE-6071 Project: HBase Issue Type: Improvement Components: Client, IPC/RPC Affects Versions: 0.92.0, 0.94.0 Reporter: Igal Shilman Priority: Minor Labels: client, ipc Attachments: HBASE-6071.patch, HBASE-6071.v2.patch, HBASE-6071.v3.patch, HBASE-6071.v4.patch, HConnectionManager_HBASE-6071-0.90.0.patch, lease-exception.txt HConnectionImplementation.getRegionServerWithRetries might terminate w/ an exception different then a DoNotRetryIOException, thus silently drops exceptions from previous attempts. [~ted_yu] suggested ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E]) adding a log message inside the catch block describing the exception type and details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
[ https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460433#comment-13460433 ] Hadoop QA commented on HBASE-6071: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546017/HBASE-6071.v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. -1 javadoc. The javadoc tool appears to have generated 139 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestMultiParallel org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2915//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2915//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2915//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2915//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2915//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2915//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2915//console This message is automatically generated. getRegionServerWithRetires, should log unsuccessful attempts and exceptions. Key: HBASE-6071 URL: https://issues.apache.org/jira/browse/HBASE-6071 Project: HBase Issue Type: Improvement Components: Client, IPC/RPC Affects Versions: 0.92.0, 0.94.0 Reporter: Igal Shilman Priority: Minor Labels: client, ipc Attachments: HBASE-6071.patch, HBASE-6071.v2.patch, HBASE-6071.v3.patch, HBASE-6071.v4.patch, HConnectionManager_HBASE-6071-0.90.0.patch, lease-exception.txt HConnectionImplementation.getRegionServerWithRetries might terminate w/ an exception different then a DoNotRetryIOException, thus silently drops exceptions from previous attempts. [~ted_yu] suggested ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E]) adding a log message inside the catch block describing the exception type and details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6410) Move RegionServer Metrics to metrics2
[ https://issues.apache.org/jira/browse/HBASE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460434#comment-13460434 ] Alex Baranau commented on HBASE-6410: - Oh, sorry, forgot to share some minor notes I took when looking at changes in common classes: 1) TestMasterMetricsSourceFactory: {noformat} //This should throw an exception because there is no compat lib on the class path. CompatibilitySingletonFactory.getInstance(MasterMetricsSource.class); {noformat} This will throw an exception anyways, because now MasterMetricsSource*Factory* is used. 2) MasterMetricsSourceImpl: {noformat} public void getMetrics(MetricsBuilder metricsBuilder, boolean all) { [...] metricsRegistry.snapshot(metricsRecordBuilder, true); } {noformat} Should be metricsRegistry.snapshot(metricsRecordBuilder, *all*) ? Same in hadoop-compat2 Move RegionServer Metrics to metrics2 - Key: HBASE-6410 URL: https://issues.apache.org/jira/browse/HBASE-6410 Project: HBase Issue Type: Sub-task Components: metrics Affects Versions: 0.96.0 Reporter: Elliott Clark Assignee: Elliott Clark Priority: Blocker Attachments: HBASE-6410-1.patch, HBASE-6410.patch Move RegionServer Metrics to metrics2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4658) Put attributes are not exposed via the ThriftServer
[ https://issues.apache.org/jira/browse/HBASE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460446#comment-13460446 ] Hudson commented on HBASE-4658: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #185 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/185/]) HBASE-6806 HBASE-4658 breaks backward compatibility / example scripts (Revision 1388318) Result = FAILURE stack : Files : * /hbase/trunk/examples/thrift/DemoClient.cpp * /hbase/trunk/examples/thrift/DemoClient.java * /hbase/trunk/examples/thrift/DemoClient.php * /hbase/trunk/examples/thrift/DemoClient.pl * /hbase/trunk/examples/thrift/DemoClient.py * /hbase/trunk/examples/thrift/DemoClient.rb * /hbase/trunk/examples/thrift/Makefile Put attributes are not exposed via the ThriftServer --- Key: HBASE-4658 URL: https://issues.apache.org/jira/browse/HBASE-4658 Project: HBase Issue Type: Bug Components: Thrift Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: ASF.LICENSE.NOT.GRANTED--D1563.1.patch, ASF.LICENSE.NOT.GRANTED--D1563.1.patch, ASF.LICENSE.NOT.GRANTED--D1563.1.patch, ASF.LICENSE.NOT.GRANTED--D1563.2.patch, ASF.LICENSE.NOT.GRANTED--D1563.2.patch, ASF.LICENSE.NOT.GRANTED--D1563.2.patch, ASF.LICENSE.NOT.GRANTED--D1563.3.patch, ASF.LICENSE.NOT.GRANTED--D1563.3.patch, ASF.LICENSE.NOT.GRANTED--D1563.3.patch, ThriftPutAttributes1.txt The Put api also takes in a bunch of arbitrary attributes that an application can use to associate metadata with each put operation. This is not exposed via Thrift. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6806) HBASE-4658 breaks backward compatibility / example scripts
[ https://issues.apache.org/jira/browse/HBASE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460445#comment-13460445 ] Hudson commented on HBASE-6806: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #185 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/185/]) HBASE-6806 HBASE-4658 breaks backward compatibility / example scripts (Revision 1388318) Result = FAILURE stack : Files : * /hbase/trunk/examples/thrift/DemoClient.cpp * /hbase/trunk/examples/thrift/DemoClient.java * /hbase/trunk/examples/thrift/DemoClient.php * /hbase/trunk/examples/thrift/DemoClient.pl * /hbase/trunk/examples/thrift/DemoClient.py * /hbase/trunk/examples/thrift/DemoClient.rb * /hbase/trunk/examples/thrift/Makefile HBASE-4658 breaks backward compatibility / example scripts -- Key: HBASE-6806 URL: https://issues.apache.org/jira/browse/HBASE-6806 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.94.0 Reporter: Lukas Fix For: 0.96.0 Attachments: HBASE-6806-fix-examples.diff HBASE-4658 introduces the new 'attributes' argument as a non optional parameter. This is not backward compatible and also breaks the code in the example section. Resolution: Mark as 'optional' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6783) Make read short circuit the default
[ https://issues.apache.org/jira/browse/HBASE-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460448#comment-13460448 ] Hudson commented on HBASE-6783: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #185 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/185/]) HBASE-6783 Make read short circuit the default (Revision 1388374) Result = FAILURE nkeywal : Files : * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java Make read short circuit the default --- Key: HBASE-6783 URL: https://issues.apache.org/jira/browse/HBASE-6783 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 6783.v2.patch, 6783.v2.patch, HBASE-6783.v1.patch Per mailing discussion, read short circuit has little or no drawback, hence should used by default. As a consequence, we activate it on the default tests. It's possible to launch the test with -Ddfs.client.read.shortcircuit=false to execute the tests without the shortcircuit, it will be used for some builds on trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6524) Hooks for hbase tracing
[ https://issues.apache.org/jira/browse/HBASE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460447#comment-13460447 ] Hudson commented on HBASE-6524: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #185 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/185/]) HBASE-6524 Hooks for hbase tracing; add documentation as an appendix (Revision 1388337) Result = FAILURE stack : Files : * /hbase/trunk/src/docbkx/book.xml Hooks for hbase tracing --- Key: HBASE-6524 URL: https://issues.apache.org/jira/browse/HBASE-6524 Project: HBase Issue Type: Sub-task Reporter: Jonathan Leavitt Assignee: Jonathan Leavitt Fix For: 0.96.0 Attachments: 6524.addendum, 6524-v2.txt, 6524v3.txt, createTableTrace.png, hbase-6524.diff Includes the hooks that use [htrace|http://www.github.com/cloudera/htrace] library to add dapper-like tracing to hbase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6853) IlegalArgument Exception is thrown when an empty region is spliitted.
[ https://issues.apache.org/jira/browse/HBASE-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Priyadarshini updated HBASE-6853: - Attachment: HBASE-6853_splitfailure.patch IlegalArgument Exception is thrown when an empty region is spliitted. - Key: HBASE-6853 URL: https://issues.apache.org/jira/browse/HBASE-6853 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.1 Reporter: ramkrishna.s.vasudevan Attachments: HBASE-6853_splitfailure.patch This is w.r.t a mail sent in the dev mail list. Empty region split should be handled gracefully. Either we should not allow the split to happen if we know that the region is empty or we should allow the split to happen by setting the no of threads to the thread pool executor as 1. {code} int nbFiles = hstoreFilesToSplit.size(); ThreadFactoryBuilder builder = new ThreadFactoryBuilder(); builder.setNameFormat(StoreFileSplitter-%1$d); ThreadFactory factory = builder.build(); ThreadPoolExecutor threadPool = (ThreadPoolExecutor) Executors.newFixedThreadPool(nbFiles, factory); ListFutureVoid futures = new ArrayListFutureVoid(nbFiles); {code} Here the nbFiles needs to be a non zero positive value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6853) IlegalArgument Exception is thrown when an empty region is spliitted.
[ https://issues.apache.org/jira/browse/HBASE-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Priyadarshini updated HBASE-6853: - Attachment: HBASE-6853_2_splitsuccess.patch IlegalArgument Exception is thrown when an empty region is spliitted. - Key: HBASE-6853 URL: https://issues.apache.org/jira/browse/HBASE-6853 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.1 Reporter: ramkrishna.s.vasudevan Attachments: HBASE-6853_2_splitsuccess.patch, HBASE-6853_splitfailure.patch This is w.r.t a mail sent in the dev mail list. Empty region split should be handled gracefully. Either we should not allow the split to happen if we know that the region is empty or we should allow the split to happen by setting the no of threads to the thread pool executor as 1. {code} int nbFiles = hstoreFilesToSplit.size(); ThreadFactoryBuilder builder = new ThreadFactoryBuilder(); builder.setNameFormat(StoreFileSplitter-%1$d); ThreadFactory factory = builder.build(); ThreadPoolExecutor threadPool = (ThreadPoolExecutor) Executors.newFixedThreadPool(nbFiles, factory); ListFutureVoid futures = new ArrayListFutureVoid(nbFiles); {code} Here the nbFiles needs to be a non zero positive value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6853) IlegalArgument Exception is thrown when an empty region is spliitted.
[ https://issues.apache.org/jira/browse/HBASE-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460461#comment-13460461 ] Priyadarshini commented on HBASE-6853: -- Attached 2 patches refers to 2 scenarios. SCENARIO 1 (success) : - Making the split to succeed by setting the number of threads for the FixedThreadPool to 1 even if the hstoreFilesToSplit size is zero. SCENARIO 2 (failure) : -- Making the split to fail when there are no store files to split. IlegalArgument Exception is thrown when an empty region is spliitted. - Key: HBASE-6853 URL: https://issues.apache.org/jira/browse/HBASE-6853 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.1 Reporter: ramkrishna.s.vasudevan Attachments: HBASE-6853_2_splitsuccess.patch, HBASE-6853_splitfailure.patch This is w.r.t a mail sent in the dev mail list. Empty region split should be handled gracefully. Either we should not allow the split to happen if we know that the region is empty or we should allow the split to happen by setting the no of threads to the thread pool executor as 1. {code} int nbFiles = hstoreFilesToSplit.size(); ThreadFactoryBuilder builder = new ThreadFactoryBuilder(); builder.setNameFormat(StoreFileSplitter-%1$d); ThreadFactory factory = builder.build(); ThreadPoolExecutor threadPool = (ThreadPoolExecutor) Executors.newFixedThreadPool(nbFiles, factory); ListFutureVoid futures = new ArrayListFutureVoid(nbFiles); {code} Here the nbFiles needs to be a non zero positive value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6853) IllegalArgument Exception is thrown when an empty region is spliitted.
[ https://issues.apache.org/jira/browse/HBASE-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-6853: -- Summary: IllegalArgument Exception is thrown when an empty region is spliitted. (was: IlegalArgument Exception is thrown when an empty region is spliitted.) IllegalArgument Exception is thrown when an empty region is spliitted. -- Key: HBASE-6853 URL: https://issues.apache.org/jira/browse/HBASE-6853 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.1 Reporter: ramkrishna.s.vasudevan Attachments: HBASE-6853_2_splitsuccess.patch, HBASE-6853_splitfailure.patch This is w.r.t a mail sent in the dev mail list. Empty region split should be handled gracefully. Either we should not allow the split to happen if we know that the region is empty or we should allow the split to happen by setting the no of threads to the thread pool executor as 1. {code} int nbFiles = hstoreFilesToSplit.size(); ThreadFactoryBuilder builder = new ThreadFactoryBuilder(); builder.setNameFormat(StoreFileSplitter-%1$d); ThreadFactory factory = builder.build(); ThreadPoolExecutor threadPool = (ThreadPoolExecutor) Executors.newFixedThreadPool(nbFiles, factory); ListFutureVoid futures = new ArrayListFutureVoid(nbFiles); {code} Here the nbFiles needs to be a non zero positive value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6854) Deletion of SPLITTING node on split rollback should clear the region from RIT
ramkrishna.s.vasudevan created HBASE-6854: - Summary: Deletion of SPLITTING node on split rollback should clear the region from RIT Key: HBASE-6854 URL: https://issues.apache.org/jira/browse/HBASE-6854 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Fix For: 0.94.2 If a failure happens in split before OFFLINING_PARENT, we tend to rollback the split including deleting the znodes created. On deletion of the RS_ZK_SPLITTING node we are getting a callback but not remvoving from RIT. We need to remove it from RIT, anyway SSH logic is well guarded in case the delete event comes due to RS down scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6855) Add support for the REST Multi Gets to the RemoteHTable implementation
Erich Hochmuth created HBASE-6855: - Summary: Add support for the REST Multi Gets to the RemoteHTable implementation Key: HBASE-6855 URL: https://issues.apache.org/jira/browse/HBASE-6855 Project: HBase Issue Type: Improvement Components: REST Affects Versions: 0.94.1 Reporter: Erich Hochmuth Priority: Minor REST Multi Gets support was added in HBASE-3541. I'd like to extend this capability into the RemoteHTable implementation. https://issues.apache.org/jira/browse/HBASE-3541 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6798) HDFS always read checksum form meta file
[ https://issues.apache.org/jira/browse/HBASE-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460513#comment-13460513 ] LiuLei commented on HBASE-6798: --- Hi stack, yes, I think we should add a new setSkipChecksum(boolean) method in org.apache.hadoop.hdfs.FileSystem class. When read HLog files to use setSkipChecksum(false), when read HFile file to use setSkipChecksum(true). HDFS always read checksum form meta file Key: HBASE-6798 URL: https://issues.apache.org/jira/browse/HBASE-6798 Project: HBase Issue Type: Bug Components: Performance Affects Versions: 0.94.0, 0.94.1 Reporter: LiuLei Priority: Blocker Attachments: 6798.txt I use hbase0.941 and hadoop-0.20.2-cdh3u5 version. The HBase support checksums in HBase block cache in HBASE-5074 jira. The HBase support checksums for decrease the iops of HDFS, so that HDFS dont't need to read the checksum from meta file of block file. But in hadoop-0.20.2-cdh3u5 version, BlockSender still read the metadata file even if the hbase.regionserver.checksum.verify property is ture. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6714) TestMultiSlaveReplication#testMultiSlaveReplication may fail
[ https://issues.apache.org/jira/browse/HBASE-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460530#comment-13460530 ] Himanshu Vashishtha commented on HBASE-6714: Any comments/reviews? TestMultiSlaveReplication#testMultiSlaveReplication may fail Key: HBASE-6714 URL: https://issues.apache.org/jira/browse/HBASE-6714 Project: HBase Issue Type: Bug Components: Replication, test Affects Versions: 0.92.0, 0.94.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Priority: Minor Attachments: HBase-6714-v1.patch java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.replication.TestMultiSlaveReplication.checkRow(TestMultiSlaveReplication.java:203) at org.apache.hadoop.hbase.replication.TestMultiSlaveReplication.testMultiSlaveReplication(TestMultiSlaveReplication.java:188) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) TestMultiSlaveReplication-testMultiSlaveReplication failed in our local build citing that row was not replicated to second peer. This is because after inserting row, log is rolled and we look for row2 in both the clusters and then we check for existence of row in both clusters. Meanwhile, Replication thread was sleeping for the second cluster and Row row2 is not present in the second cluster from the very beginning. So, the row2 existence check succeeds and control move on to find row in both clusters where it fails for the second cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460532#comment-13460532 ] ramkrishna.s.vasudevan commented on HBASE-6299: --- There are some hanging tests in the hadoop QA build. +1 on patch otherwise. RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems. - Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.94.0 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Fix For: 0.92.3, 0.94.3, 0.96.0 Attachments: 6299v4.txt, HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Deleting existing unassigned node for b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Successfully deleted unassigned node for region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. that was online on serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301) 2012-06-29 07:07:41,140 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to serverName=swbss-hadoop-006,60020,1340890678078,
[jira] [Created] (HBASE-6856) Document the LeaseException thrown in scanner next
Daniel Iancu created HBASE-6856: --- Summary: Document the LeaseException thrown in scanner next Key: HBASE-6856 URL: https://issues.apache.org/jira/browse/HBASE-6856 Project: HBase Issue Type: Improvement Components: documentation Affects Versions: 0.92.0 Reporter: Daniel Iancu In some situations clients that fetch data from a RS get a LeaseException instead of the usual ScannerTimeoutException/UnknownScannerException. This particular case should be documented in the HBase guide. Some key points * the source of exception is: org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230) * it happens in the context of a slow/freezing RS#next * it can be prevented by having hbase.rpc.timeout hbase.regionserver.lease.period Harsh J investigated the issue and has some conclusions, see http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/browser -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6856) Document the LeaseException thrown in scanner next
[ https://issues.apache.org/jira/browse/HBASE-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Iancu updated HBASE-6856: Description: In some situations clients that fetch data from a RS get a LeaseException instead of the usual ScannerTimeoutException/UnknownScannerException. This particular case should be documented in the HBase guide. Some key points * the source of exception is: org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230) * it happens in the context of a slow/freezing RS#next * it can be prevented by having hbase.rpc.timeout hbase.regionserver.lease.period Harsh J investigated the issue and has some conclusions, see http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E was: In some situations clients that fetch data from a RS get a LeaseException instead of the usual ScannerTimeoutException/UnknownScannerException. This particular case should be documented in the HBase guide. Some key points * the source of exception is: org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230) * it happens in the context of a slow/freezing RS#next * it can be prevented by having hbase.rpc.timeout hbase.regionserver.lease.period Harsh J investigated the issue and has some conclusions, see http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/browser Document the LeaseException thrown in scanner next -- Key: HBASE-6856 URL: https://issues.apache.org/jira/browse/HBASE-6856 Project: HBase Issue Type: Improvement Components: documentation Affects Versions: 0.92.0 Reporter: Daniel Iancu Labels: LeaseException In some situations clients that fetch data from a RS get a LeaseException instead of the usual ScannerTimeoutException/UnknownScannerException. This particular case should be documented in the HBase guide. Some key points * the source of exception is: org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230) * it happens in the context of a slow/freezing RS#next * it can be prevented by having hbase.rpc.timeout hbase.regionserver.lease.period Harsh J investigated the issue and has some conclusions, see http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5937) Refactor HLog into an interface.
[ https://issues.apache.org/jira/browse/HBASE-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460592#comment-13460592 ] Flavio Junqueira commented on HBASE-5937: - Thanks for responding, Stack. bq. Sorry for not getting to your log. What have you been having to do to get tests to pass? How did you fix TestMultiParallel? It is stuff to do w/ this refactoring? It was our bug. bq. Currently Reader and Writer are Interfaces defined inside HLog. You get one by calling a static method on HLog. You'd like to getReader non-static, an invocation on a particular instance of HLog. bq. That seems fine by me. It makes sense given what you are trying to do. It is less flexible than what we currently have but its flexible because it presumes a particular implementation of HLog. It is simpler to leave getReader and getWriter as static methods. Given that a reader/writer is for a concrete WAL, Ivan and I thought that it would be best to have these methods available as instance methods. However, it is not looking simple to implement because we don't have HLog objects available in all places we need a reader or a writer, and the initialization of HLog objects makes it tricky to instantiate it only to get a reader or a writer. At this point, I'm tempted to leave them as static methods for now, unless anyone has a strong preference otherwise. bq. I hope you call your HLog Inteface WAL! It is fine with me to make the change. bq. I think this work trying to make an Interface for WAL is kinda important. There is this bookeeping project but the multi-WAL dev – i.e. making the regionserver write more than one WAL at a time (into HDFS) – could use the result of this effort too. BookKeeper provides the ability to write multiple concurrent logs, but if the regionserver code is not prepared, then we won't be able to benefit from this feature. Consequently, it does sound very important to have the regionserver writing to more than one WAL at a time. Currently there is one test failing consistently for me: {noformat} org.apache.hadoop.hbase.TestLocalHBaseCluster {noformat} and I believe the culprit is this: {noformat} WARNING! File system needs to be upgraded. You have version null and I want version 7. Run the '${HBASE_HOME}/bin/hbase migrate' script. 2012-09-21 17:27:06,075 FATAL [Master:0;perfectsalt-lm.barcelona.corp.yahoo.com,52906,1348241225714] master.HMaster(1838): Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.util.FileSystemVersionException: File system needs to be upgraded. You have version null and I want version 7. Run the '${HBASE_HOME}/bin/hbase migrate' script. {noformat} Any clue of why this could be happening? Refactor HLog into an interface. Key: HBASE-5937 URL: https://issues.apache.org/jira/browse/HBASE-5937 Project: HBase Issue Type: Sub-task Reporter: Li Pi Assignee: Flavio Junqueira Priority: Minor Attachments: org.apache.hadoop.hbase.client.TestMultiParallel-output.txt What the summary says. Create HLog interface. Make current implementation use it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6651) Thread safety of HTablePool is doubtful
[ https://issues.apache.org/jira/browse/HBASE-6651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460612#comment-13460612 ] Hiroshi Ikeda commented on HBASE-6651: -- Sorry I didn't realize that PoolMap is also used in HBaseClient, and it is inteneded that the round robin logic gives the same thread-safe object to different threads and sticks to the limit count of the resources (aside from the other factors to break thread-safety of PoolMap). Apparently the requirements of pooling differ in HTablePool and HBaseClient. Thread safety of HTablePool is doubtful --- Key: HBASE-6651 URL: https://issues.apache.org/jira/browse/HBASE-6651 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.1 Reporter: Hiroshi Ikeda Priority: Minor Attachments: sample.zip, sample.zip There are some operations in HTablePool to access to PoolMap in multiple times without any explict synchronization. For example HTablePool.closeTablePool() calles PoolMap.values(), and calles PoolMap.remove(). If other threads add new instances to the pool in the middle of the calls, the new added instances might be dropped. (HTablePool.closeTablePool() also has another problem that calling it by multple threads causes accessing HTable by multiple threads.) Moreover, PoolMap is not thread safe for the same reason. For example PoolMap.put() calles ConcurrentMap.get() and calles ConcurrentMap.put(). If other threads add a new instance to the concurent map in the middle of the calls, the new instance might be dropped. And also implementations of Pool have the same problems. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460615#comment-13460615 ] Lars Hofhansl commented on HBASE-6852: -- Patch looks good. I'll remain sceptical about the real life impact, though. The expensive is taking out the memory barriers. As long as we use AtomicLong (or volatiles, or synchronized, or use ConcurrentMap) this is still going to happen. Lemme move this out to 0.94.3, so that we can performance test this a bit more. SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.3, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6852: - Fix Version/s: (was: 0.94.2) 0.94.3 SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.3, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6854) Deletion of SPLITTING node on split rollback should clear the region from RIT
[ https://issues.apache.org/jira/browse/HBASE-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6854: - Fix Version/s: (was: 0.94.2) 0.94.3 Unless there's patch today, let's move it to 0.94.3. Deletion of SPLITTING node on split rollback should clear the region from RIT - Key: HBASE-6854 URL: https://issues.apache.org/jira/browse/HBASE-6854 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Fix For: 0.94.3 If a failure happens in split before OFFLINING_PARENT, we tend to rollback the split including deleting the znodes created. On deletion of the RS_ZK_SPLITTING node we are getting a callback but not remvoving from RIT. We need to remove it from RIT, anyway SSH logic is well guarded in case the delete event comes due to RS down scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6381) AssignmentManager should use the same logic for clean startup and failover
[ https://issues.apache.org/jira/browse/HBASE-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460618#comment-13460618 ] Jimmy Xiang commented on HBASE-6381: @Ram, thanks a lot for the review. As to the removal from regionInfoIterator, it is used locally to track unassigned regions so that we know if the bulk assign is completed so I think it is needed. As to HBASE-6228, I added the following to the fixupDaughters in HMaster before actually fixing anything in order to avoid duplicated processing: {noformat} if (!serverManager.isServerDead(sn)) { // Otherwise, let SSH take care of it {noformat} Here sn is the parent server name. So the chance of duplicated processing should be very minimal, right? AssignmentManager should use the same logic for clean startup and failover -- Key: HBASE-6381 URL: https://issues.apache.org/jira/browse/HBASE-6381 Project: HBase Issue Type: Bug Components: master Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: hbase-6381-notes.pdf, hbase-6381.pdf, trunk-6381_v5.patch, trunk-6381_v7.patch, trunk-6381_v8.patch Currently AssignmentManager handles clean startup and failover very differently. Different logic is mingled together so it is hard to find out which is for which. We should clean it up and share the same logic so that AssignmentManager handles both cases the same way. This way, the code will much easier to understand and maintain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6841) Meta prefetching is slower than doing multiple meta lookups
[ https://issues.apache.org/jira/browse/HBASE-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460619#comment-13460619 ] Lars Hofhansl commented on HBASE-6841: -- Which client did you jstack? There're at least two clients at work here: The one the attempts the truncate and the ones that still perform the requests. I assume the stuck ones are the those that are still performing the requests. I am beginning to doubt that prefetching is a issue here. This has to do with how these clients connect. There should not need to be a connection setup each time. Meta prefetching is slower than doing multiple meta lookups --- Key: HBASE-6841 URL: https://issues.apache.org/jira/browse/HBASE-6841 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Priority: Critical Fix For: 0.94.2 Attachments: 6841-0.94.txt, 6841-0.96.txt I got myself into a situation where I needed to truncate a massive table while it was getting hits and surprisingly the clients were not recovering. What I see in the logs is that every time we prefetch .META. we setup a new HConnection because we close it on the way out. It's awfully slow. We should just turn it off or make it useful. jstacks coming up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6651) Thread safety of HTablePool is doubtful
[ https://issues.apache.org/jira/browse/HBASE-6651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroshi Ikeda updated HBASE-6651: - Attachment: sharedmap_for_hbaseclient.zip Added sample implementation which might be used to pool and share Connection instances between threads in HBaseClient. I use the name SharedMap, the old name of PoolMap. In HBaseClient I think PoolMap with ThreadLocalPool leaks objects. Connection (extending Thead) automatically tries to remove itself from the pool at the end of its life, but its thread is different from the thread which created the instance of Connection and put into the pool. Thread safety of HTablePool is doubtful --- Key: HBASE-6651 URL: https://issues.apache.org/jira/browse/HBASE-6651 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.1 Reporter: Hiroshi Ikeda Priority: Minor Attachments: sample.zip, sample.zip, sharedmap_for_hbaseclient.zip There are some operations in HTablePool to access to PoolMap in multiple times without any explict synchronization. For example HTablePool.closeTablePool() calles PoolMap.values(), and calles PoolMap.remove(). If other threads add new instances to the pool in the middle of the calls, the new added instances might be dropped. (HTablePool.closeTablePool() also has another problem that calling it by multple threads causes accessing HTable by multiple threads.) Moreover, PoolMap is not thread safe for the same reason. For example PoolMap.put() calles ConcurrentMap.get() and calles ConcurrentMap.put(). If other threads add a new instance to the concurent map in the middle of the calls, the new instance might be dropped. And also implementations of Pool have the same problems. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6857) Investigate slow operations
Amitanand Aiyer created HBASE-6857: -- Summary: Investigate slow operations Key: HBASE-6857 URL: https://issues.apache.org/jira/browse/HBASE-6857 Project: HBase Issue Type: Improvement Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor We see that occassionally regionservers have a spate of slow operations. Need to look into what is causing this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5937) Refactor HLog into an interface.
[ https://issues.apache.org/jira/browse/HBASE-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460631#comment-13460631 ] stack commented on HBASE-5937: -- bq. Any clue of why this could be happening? Somehow the test is pointed at wrong fs? Did you mess w/ that? HBase, when it starts, it looks for a file named hbase.version. If present, it reads it to see that the version therein matches that of version the hbase software expects. We used this facility whenever on-fs formats changed in a way that required you to run a migration step before starting cluster. So, version == null makes me think hbase is looking in wrong place for the hbase.version file... looking in localfs rather than in hdf where it perhaps wrote it on startup? bq. ...and the initialization of HLog objects makes it tricky to instantiate it only to get a reader or a writer HLog construction is the way it is again because we presume one implementation only. I'd suggest you look at what it would take moving the heavyweight stuff done in HLog to an init or start method. NP having us change how we do the HLog setup in HBase. Perhaps it won't help much though as the Reader and Writer might want some of the heavy setup done? I'd also say that HLog is the way it is, not because it was designed, but because it evolved this way over the years. If you fellas want to startover, I'd say go for it: make a clean Interface that will work for our current hdfs use case and for the bkfs. We'll shoehorn it into a 0.98 or whatever suits your schedule. Refactor HLog into an interface. Key: HBASE-5937 URL: https://issues.apache.org/jira/browse/HBASE-5937 Project: HBase Issue Type: Sub-task Reporter: Li Pi Assignee: Flavio Junqueira Priority: Minor Attachments: org.apache.hadoop.hbase.client.TestMultiParallel-output.txt What the summary says. Create HLog interface. Make current implementation use it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau reassigned HBASE-2038: - Assignee: Jacques Nadeau Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: Coprocessors Reporter: Andrew Purtell Assignee: Jacques Nadeau Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
[ https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460645#comment-13460645 ] stack commented on HBASE-6733: -- [~jdcryans] Review this boss? [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2] --- Key: HBASE-6733 URL: https://issues.apache.org/jira/browse/HBASE-6733 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6733-1.patch, 6733-2.patch The failure is in TestReplication.queueFailover (fails due to unreplicated rows). I have come across two problems: 1. The sleepMultiplier is not properly reset when the currentPath is changed (in ReplicationSource.java). 2. ReplicationExecutor sometime removes files to replicate from the queue too early, resulting in corresponding edits missing. Here the problem is due to the fact the log-file length that the replication executor finds is not the most updated one, and hence it doesn't read anything from there, and ultimately, when there is a log roll, the replication-queue gets a new entry, and the executor drops the old entry out of the queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-3340) Eventually Consistent Secondary Indexing via Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau reassigned HBASE-3340: - Assignee: Jacques Nadeau (was: Jonathan Gray) Eventually Consistent Secondary Indexing via Coprocessors - Key: HBASE-3340 URL: https://issues.apache.org/jira/browse/HBASE-3340 Project: HBase Issue Type: New Feature Components: Coprocessors Reporter: Jonathan Gray Assignee: Jacques Nadeau Secondary indexing support via coprocessors with an eventual consistency guarantee. Design to come. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460648#comment-13460648 ] stack commented on HBASE-6758: -- [~devaraj] What you think of Ted comment above boss? [~jdcryans] Any comment on this patch? [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file --- Key: HBASE-6758 URL: https://issues.apache.org/jira/browse/HBASE-6758 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, TEST-org.apache.hadoop.hbase.replication.TestReplication.xml I have seen cases where the replication-executor would lose data to replicate since the file hasn't been closed yet. Upon closing, the new data becomes visible. Before that happens the ZK node shouldn't be deleted in ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes
[ https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6488: - Resolution: Fixed Fix Version/s: 0.96.0 0.94.2 Assignee: ryan rawson Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.94 and trunk. Thanks for the patch Ryan (Should have a test and should be more general escaping but hey...) HBase wont run on IPv6 on OSes that use zone-indexes Key: HBASE-6488 URL: https://issues.apache.org/jira/browse/HBASE-6488 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ryan rawson Assignee: ryan rawson Fix For: 0.94.2, 0.96.0 Attachments: HBASE-6488-trunk.txt, HBASE-6488.txt In IPv6, an address may have a zone-index, which is specified with a percent, eg: ...%0. This looks like a format string, and thus in a part of the code which uses the hostname as a prefix to another string which is interpreted with String.format, you end up with an exception: 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.util.UnknownFormatConversionException: Conversion = '0' at java.util.Formatter.checkText(Formatter.java:2503) at java.util.Formatter.parse(Formatter.java:2467) at java.util.Formatter.format(Formatter.java:2414) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68) at org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227) at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220) at java.lang.Thread.run(Thread.java:680) 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6752) On region server failure, serve writes and timeranged reads during the log split
[ https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460662#comment-13460662 ] Gregory Chanan commented on HBASE-6752: --- On timeranged reads: if the user specified his own timestamps, couldn't the correct value to return be only in the WAL? On region server failure, serve writes and timeranged reads during the log split Key: HBASE-6752 URL: https://issues.apache.org/jira/browse/HBASE-6752 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: nkeywal Priority: Minor Opening for write on failure would mean: - Assign the region to a new regionserver. It marks the region as recovering -- specific exception returned to the client when we cannot server. -- allow them to know where they stand. The exception can include some time information (failure stated on: ...) -- allow them to go immediately on the right regionserver, instead of retrying or calling the region holding meta to get the new address = save network calls, lower the load on meta. - Do the split as today. Priority is given to region server holding the new regions -- help to share the load balancing code: the split is done by region server considered as available for new regions -- help locality (the recovered edits are available on the region server) = lower the network usage - When the split is finished, we're done as of today - while the split is progressing, the region server can -- serve writes --- that's useful for all application that need to write but not read immediately: --- whatever logs events to analyze them later --- opentsdb is a perfect example. -- serve reads if they have a compatible time range. For heavily used tables, it could be an help, because: --- we can expect to have a few minutes of data only (as it's loaded) --- the heaviest queries, often accepts a few -or more- minutes delay. Some What if: 1) the split fails = Retry until it works. As today. Just that we serves writes. We need to know (as today) that the region has not recovered if we fail again. 2) the regionserver fails during the split = As 1 and as of today/ 3) the regionserver fails after the split but before the state change to fully available. = New assign. More logs to split (the ones already dones and the new ones). 4) the assignment fails = Retry until it works. As today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460664#comment-13460664 ] Devaraj Das commented on HBASE-6758: [~stack] I have already responded to Ted's comment. In summary, the problem is that the log-splitter couldn't complete its work soon enough, and hence the file wasn't moved to .oldlogs soon enough. The replicator did the maxRetries and gave up. So this is a different issue (and maybe solved by increasing the value of maxRetries in the config.) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file --- Key: HBASE-6758 URL: https://issues.apache.org/jira/browse/HBASE-6758 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, TEST-org.apache.hadoop.hbase.replication.TestReplication.xml I have seen cases where the replication-executor would lose data to replicate since the file hasn't been closed yet. Upon closing, the new data becomes visible. Before that happens the ZK node shouldn't be deleted in ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6841) Meta prefetching is slower than doing multiple meta lookups
[ https://issues.apache.org/jira/browse/HBASE-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6841: - Fix Version/s: (was: 0.94.2) 0.94.3 I'd like to move this to 0.94.3. Please pull back if you disagree. Meta prefetching is slower than doing multiple meta lookups --- Key: HBASE-6841 URL: https://issues.apache.org/jira/browse/HBASE-6841 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Priority: Critical Fix For: 0.94.3 Attachments: 6841-0.94.txt, 6841-0.96.txt I got myself into a situation where I needed to truncate a massive table while it was getting hits and surprisingly the clients were not recovering. What I see in the logs is that every time we prefetch .META. we setup a new HConnection because we close it on the way out. It's awfully slow. We should just turn it off or make it useful. jstacks coming up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6841) Meta prefetching is slower than doing multiple meta lookups
[ https://issues.apache.org/jira/browse/HBASE-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460674#comment-13460674 ] Jean-Daniel Cryans commented on HBASE-6841: --- bq. @J-D: Since this is (presumably) a long standing condition, how do you feel about moving this to 0.94.3? I'm ok. bq. Which client did you jstack? There're at least two clients at work here: The one the attempts the truncate and the ones that still perform the requests. The clients. I truncated separately. bq. I am beginning to doubt that prefetching is a issue here. This has to do with how these clients connect. There should not need to be a connection setup each time. Something is broken, we agree there :) Meta prefetching is slower than doing multiple meta lookups --- Key: HBASE-6841 URL: https://issues.apache.org/jira/browse/HBASE-6841 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Priority: Critical Fix For: 0.94.3 Attachments: 6841-0.94.txt, 6841-0.96.txt I got myself into a situation where I needed to truncate a massive table while it was getting hits and surprisingly the clients were not recovering. What I see in the logs is that every time we prefetch .META. we setup a new HConnection because we close it on the way out. It's awfully slow. We should just turn it off or make it useful. jstacks coming up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6856) Document the LeaseException thrown in scanner next
[ https://issues.apache.org/jira/browse/HBASE-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-6856. -- Resolution: Fixed Fix Version/s: 0.96.0 Assignee: Daniel Iancu Hadoop Flags: Reviewed Committed. Let me push the book up to the site after this commit. Thanks for the mailing list distillation (and thanks Harsh for digging in on this one...) Document the LeaseException thrown in scanner next -- Key: HBASE-6856 URL: https://issues.apache.org/jira/browse/HBASE-6856 Project: HBase Issue Type: Improvement Components: documentation Affects Versions: 0.92.0 Reporter: Daniel Iancu Assignee: Daniel Iancu Labels: LeaseException Fix For: 0.96.0 In some situations clients that fetch data from a RS get a LeaseException instead of the usual ScannerTimeoutException/UnknownScannerException. This particular case should be documented in the HBase guide. Some key points * the source of exception is: org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230) * it happens in the context of a slow/freezing RS#next * it can be prevented by having hbase.rpc.timeout hbase.regionserver.lease.period Harsh J investigated the issue and has some conclusions, see http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460683#comment-13460683 ] Mikhail Bautin commented on HBASE-6852: --- [~lhofhansl]: what are the other cases when metrics came up as performance issues? [~chenghao_sh]: you said that your dataset size was 600GB, and the total amount of block cache was presumably much smaller than that, which makes me think the workload should have been I/O-bound. What was the CPU utilization on your test? What was the disk throughput? SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.3, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes
[ https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460686#comment-13460686 ] ryan rawson commented on HBASE-6488: One of the problems is we cant generally escape % wherever they come, because most of the string output doesnt actually use format() so therefore we'd just end up doubling percents. Bummer! HBase wont run on IPv6 on OSes that use zone-indexes Key: HBASE-6488 URL: https://issues.apache.org/jira/browse/HBASE-6488 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ryan rawson Assignee: ryan rawson Fix For: 0.94.2, 0.96.0 Attachments: HBASE-6488-trunk.txt, HBASE-6488.txt In IPv6, an address may have a zone-index, which is specified with a percent, eg: ...%0. This looks like a format string, and thus in a part of the code which uses the hostname as a prefix to another string which is interpreted with String.format, you end up with an exception: 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.util.UnknownFormatConversionException: Conversion = '0' at java.util.Formatter.checkText(Formatter.java:2503) at java.util.Formatter.parse(Formatter.java:2467) at java.util.Formatter.format(Formatter.java:2414) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68) at org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227) at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220) at java.lang.Thread.run(Thread.java:680) 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460689#comment-13460689 ] Todd Lipcon commented on HBASE-6852: I have a full table scan in isolation benchmark I've been working on. My benchmark currently disables metrics, so I haven't seen this, but I'll add a flag to it to enable metrics and see if I can reproduce. Since it runs in isolation it's easy to run under perf stat and get cycle counts, etc, out of it. Will report back next week. SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.3, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460695#comment-13460695 ] Lars Hofhansl commented on HBASE-6852: -- HBASE-6603 was the other one. Turns out this is the 2nd time (not the 3rd). The other issue I found through profiling were not metric related. So I was thinking what we should generally do about this. The idea in this patch (using an array indexed by metric) is a good one. Can we generally do that? I.e.: # we know the metric we wish to collect ahead of time # Assign an index to each of them, and collect the value in an array # Simply use long (not volatile, atomiclong, just long) # Upon update or read we access the metric array by index That would eliminate the cost of the ConcurrentMap and of the AtomicXYZ, with the caveat that the metric are only an approximation, which at the very least will make testing much harder. Maybe we have exact and fuzzy metric and only use the fuzzy one on the hot code-paths. SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.3, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6856) Document the LeaseException thrown in scanner next
[ https://issues.apache.org/jira/browse/HBASE-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460697#comment-13460697 ] stack commented on HBASE-6856: -- I meant to say thanks Daniel for distilling the mailing list thread and making this issue. Document the LeaseException thrown in scanner next -- Key: HBASE-6856 URL: https://issues.apache.org/jira/browse/HBASE-6856 Project: HBase Issue Type: Improvement Components: documentation Affects Versions: 0.92.0 Reporter: Daniel Iancu Assignee: Daniel Iancu Labels: LeaseException Fix For: 0.96.0 In some situations clients that fetch data from a RS get a LeaseException instead of the usual ScannerTimeoutException/UnknownScannerException. This particular case should be documented in the HBase guide. Some key points * the source of exception is: org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230) * it happens in the context of a slow/freezing RS#next * it can be prevented by having hbase.rpc.timeout hbase.regionserver.lease.period Harsh J investigated the issue and has some conclusions, see http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460698#comment-13460698 ] Todd Lipcon commented on HBASE-6852: If using an array of longs, we'd get a ton of cache contention effects. Whatever we do should be cache-line padded to avoid this perf hole. Having a per-thread (ThreadLocal) metrics array isn't a bad way to go: no contention, can use non-volatile types, and can be stale-read during metrics snapshots by just iterating over all the threads. SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.3, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460699#comment-13460699 ] stack commented on HBASE-6852: -- Perhaps use the cliffclick counter (if cost volatile) and not have to do fuzzy? SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.3, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6752) On region server failure, serve writes and timeranged reads during the log split
[ https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Chanan reassigned HBASE-6752: - Assignee: Gregory Chanan On region server failure, serve writes and timeranged reads during the log split Key: HBASE-6752 URL: https://issues.apache.org/jira/browse/HBASE-6752 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: Gregory Chanan Priority: Minor Opening for write on failure would mean: - Assign the region to a new regionserver. It marks the region as recovering -- specific exception returned to the client when we cannot server. -- allow them to know where they stand. The exception can include some time information (failure stated on: ...) -- allow them to go immediately on the right regionserver, instead of retrying or calling the region holding meta to get the new address = save network calls, lower the load on meta. - Do the split as today. Priority is given to region server holding the new regions -- help to share the load balancing code: the split is done by region server considered as available for new regions -- help locality (the recovered edits are available on the region server) = lower the network usage - When the split is finished, we're done as of today - while the split is progressing, the region server can -- serve writes --- that's useful for all application that need to write but not read immediately: --- whatever logs events to analyze them later --- opentsdb is a perfect example. -- serve reads if they have a compatible time range. For heavily used tables, it could be an help, because: --- we can expect to have a few minutes of data only (as it's loaded) --- the heaviest queries, often accepts a few -or more- minutes delay. Some What if: 1) the split fails = Retry until it works. As today. Just that we serves writes. We need to know (as today) that the region has not recovered if we fail again. 2) the regionserver fails during the split = As 1 and as of today/ 3) the regionserver fails after the split but before the state change to fully available. = New assign. More logs to split (the ones already dones and the new ones). 4) the assignment fails = Retry until it works. As today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460702#comment-13460702 ] Lars Hofhansl commented on HBASE-6852: -- Oh yeah, you mentioned cliffclick... Need to look at that. SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.3, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6752) On region server failure, serve writes and timeranged reads during the log split
[ https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460703#comment-13460703 ] Gregory Chanan commented on HBASE-6752: --- Assigned to myself, I'm definitely up for the serving writes part, need to think some more about the timeranged reads. May file separate JIRAs. On region server failure, serve writes and timeranged reads during the log split Key: HBASE-6752 URL: https://issues.apache.org/jira/browse/HBASE-6752 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: Gregory Chanan Priority: Minor Opening for write on failure would mean: - Assign the region to a new regionserver. It marks the region as recovering -- specific exception returned to the client when we cannot server. -- allow them to know where they stand. The exception can include some time information (failure stated on: ...) -- allow them to go immediately on the right regionserver, instead of retrying or calling the region holding meta to get the new address = save network calls, lower the load on meta. - Do the split as today. Priority is given to region server holding the new regions -- help to share the load balancing code: the split is done by region server considered as available for new regions -- help locality (the recovered edits are available on the region server) = lower the network usage - When the split is finished, we're done as of today - while the split is progressing, the region server can -- serve writes --- that's useful for all application that need to write but not read immediately: --- whatever logs events to analyze them later --- opentsdb is a perfect example. -- serve reads if they have a compatible time range. For heavily used tables, it could be an help, because: --- we can expect to have a few minutes of data only (as it's loaded) --- the heaviest queries, often accepts a few -or more- minutes delay. Some What if: 1) the split fails = Retry until it works. As today. Just that we serves writes. We need to know (as today) that the region has not recovered if we fail again. 2) the regionserver fails during the split = As 1 and as of today/ 3) the regionserver fails after the split but before the state change to fully available. = New assign. More logs to split (the ones already dones and the new ones). 4) the assignment fails = Retry until it works. As today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6299: - Attachment: 6299v4.txt Retry RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems. - Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.94.0 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Fix For: 0.92.3, 0.94.3, 0.96.0 Attachments: 6299v4.txt, 6299v4.txt, HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Deleting existing unassigned node for b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Successfully deleted unassigned node for region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. that was online on serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301) 2012-06-29 07:07:41,140 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0
[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6299: - Status: Open (was: Patch Available) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems. - Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.90.6 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Fix For: 0.92.3, 0.94.3, 0.96.0 Attachments: 6299v4.txt, 6299v4.txt, HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Deleting existing unassigned node for b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Successfully deleted unassigned node for region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. that was online on serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301) 2012-06-29 07:07:41,140 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0
[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6299: - Hadoop Flags: Reviewed Status: Patch Available (was: Open) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems. - Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.90.6 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Fix For: 0.92.3, 0.94.3, 0.96.0 Attachments: 6299v4.txt, 6299v4.txt, HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Deleting existing unassigned node for b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Successfully deleted unassigned node for region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. that was online on serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301) 2012-06-29 07:07:41,140 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead;
[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
[ https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460709#comment-13460709 ] stack commented on HBASE-6071: -- How much more log does this patch generate Igal? As I read it, it is logging every retry. Maybe we should commit this even if it ups the log noise in buckets so we get more conscious about the retrying that is going on mostly silently currently we might do something more about it in client? getRegionServerWithRetires, should log unsuccessful attempts and exceptions. Key: HBASE-6071 URL: https://issues.apache.org/jira/browse/HBASE-6071 Project: HBase Issue Type: Improvement Components: Client, IPC/RPC Affects Versions: 0.92.0, 0.94.0 Reporter: Igal Shilman Priority: Minor Labels: client, ipc Attachments: HBASE-6071.patch, HBASE-6071.v2.patch, HBASE-6071.v3.patch, HBASE-6071.v4.patch, HConnectionManager_HBASE-6071-0.90.0.patch, lease-exception.txt HConnectionImplementation.getRegionServerWithRetries might terminate w/ an exception different then a DoNotRetryIOException, thus silently drops exceptions from previous attempts. [~ted_yu] suggested ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E]) adding a log message inside the catch block describing the exception type and details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6851) Race condition in TableAuthManager.updateGlobalCache()
[ https://issues.apache.org/jira/browse/HBASE-6851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Helmling reassigned HBASE-6851: Assignee: Gary Helmling Race condition in TableAuthManager.updateGlobalCache() -- Key: HBASE-6851 URL: https://issues.apache.org/jira/browse/HBASE-6851 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.94.1, 0.96.0 Reporter: Gary Helmling Assignee: Gary Helmling Priority: Critical When new global permissions are assigned, there is a race condition, during which further authorization checks relying on global permissions may fail. In TableAuthManager.updateGlobalCache(), we have: {code:java} USER_CACHE.clear(); GROUP_CACHE.clear(); try { initGlobal(conf); } catch (IOException e) { // Never happens LOG.error(Error occured while updating the user cache, e); } for (Map.EntryString,TablePermission entry : userPerms.entries()) { if (AccessControlLists.isGroupPrincipal(entry.getKey())) { GROUP_CACHE.put(AccessControlLists.getGroupName(entry.getKey()), new Permission(entry.getValue().getActions())); } else { USER_CACHE.put(entry.getKey(), new Permission(entry.getValue().getActions())); } } {code} If authorization checks come in following the .clear() but before repopulating, they will fail. We should have some synchronization here to serialize multiple updates and use a COW type rebuild and reassign of the new maps. This particular issue crept in with the fix in HBASE-6157, so I'm flagging for 0.94 and 0.96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6714) TestMultiSlaveReplication#testMultiSlaveReplication may fail
[ https://issues.apache.org/jira/browse/HBASE-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460711#comment-13460711 ] stack commented on HBASE-6714: -- I love patches that fix broke tests. In future, rather than this: {code} + if (i == NB_RETRIES - 1) { +fail(Waited too much time while getting the row.); + } {code} ... just throw an exception rather than call fail? Let me commit. TestMultiSlaveReplication#testMultiSlaveReplication may fail Key: HBASE-6714 URL: https://issues.apache.org/jira/browse/HBASE-6714 Project: HBase Issue Type: Bug Components: Replication, test Affects Versions: 0.92.0, 0.94.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Priority: Minor Attachments: HBase-6714-v1.patch java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.replication.TestMultiSlaveReplication.checkRow(TestMultiSlaveReplication.java:203) at org.apache.hadoop.hbase.replication.TestMultiSlaveReplication.testMultiSlaveReplication(TestMultiSlaveReplication.java:188) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) TestMultiSlaveReplication-testMultiSlaveReplication failed in our local build citing that row was not replicated to second peer. This is because after inserting row, log is rolled and we look for row2 in both the clusters and then we check for existence of row in both clusters. Meanwhile, Replication thread was sleeping for the second cluster and Row row2 is not present in the second cluster from the very beginning. So, the row2 existence check succeeds and control move on to find row in both clusters where it fails for the second cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6714) TestMultiSlaveReplication#testMultiSlaveReplication may fail
[ https://issues.apache.org/jira/browse/HBASE-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6714: - Resolution: Fixed Fix Version/s: 0.96.0 0.94.2 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.94 and to trunk. Thanks for the patch Himanshu. TestMultiSlaveReplication#testMultiSlaveReplication may fail Key: HBASE-6714 URL: https://issues.apache.org/jira/browse/HBASE-6714 Project: HBase Issue Type: Bug Components: Replication, test Affects Versions: 0.92.0, 0.94.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Priority: Minor Fix For: 0.94.2, 0.96.0 Attachments: HBase-6714-v1.patch java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.replication.TestMultiSlaveReplication.checkRow(TestMultiSlaveReplication.java:203) at org.apache.hadoop.hbase.replication.TestMultiSlaveReplication.testMultiSlaveReplication(TestMultiSlaveReplication.java:188) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) TestMultiSlaveReplication-testMultiSlaveReplication failed in our local build citing that row was not replicated to second peer. This is because after inserting row, log is rolled and we look for row2 in both the clusters and then we check for existence of row in both clusters. Meanwhile, Replication thread was sleeping for the second cluster and Row row2 is not present in the second cluster from the very beginning. So, the row2 existence check succeeds and control move on to find row in both clusters where it fails for the second cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460717#comment-13460717 ] stack commented on HBASE-6852: -- [~lhofhansl] I made my comment before I saw Todd's suggestion SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.3, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460725#comment-13460725 ] Elliott Clark commented on HBASE-6852: -- I think we should start doing more of what this patch does. Collect the values locally and then use a single call into the metrics sources to push the collected metrics. In addition I think that we should remove some of the lesser used dynamic metrics, and for other stop using the time varying rate. For the most part I think that will remove the cost of metrics getting too out of control. However I don't think that we should stop using AtomicLong/AtomicInt. From my understanding on most architectures the JVM will turn getAndIncrement into just one cpu instruction, rather than using compare and swap. So there's very little gained by sacrificing correctness. SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.3, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6738) Too aggressive task resubmission from the distributed log manager
[ https://issues.apache.org/jira/browse/HBASE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6738: --- Status: Patch Available (was: Open) Too aggressive task resubmission from the distributed log manager - Key: HBASE-6738 URL: https://issues.apache.org/jira/browse/HBASE-6738 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.1, 0.96.0 Environment: 3 nodes cluster test, but can occur as well on a much bigger one. It's all luck! Reporter: nkeywal Priority: Critical Attachments: 6738.v1.patch With default settings for hbase.splitlog.manager.timeout = 25s and hbase.splitlog.max.resubmit = 3. On tests mentionned on HBASE-5843, I have variations around this scenario, 0.94 + HDFS 1.0.3: The regionserver in charge of the split does not answer in less than 25s, so it gets interrupted but actually continues. Sometimes, we go out of the number of retry, sometimes not, sometimes we're out of retry, but the as the interrupts were ignored we finish nicely. In the mean time, the same single task is executed in parallel by multiple nodes, increasing the probability to get into race conditions. Details: t0: unplug a box with DN+RS t + x: other boxes are already connected, to their connection starts to dies. Nevertheless, they don't consider this node as suspect. t + 180s: zookeeper - master detects the node as dead. recovery start. It can be less than 180s sometimes it around 150s. t + 180s: distributed split starts. There is only 1 task, it's immediately acquired by a one RS. t + 205s: the RS has multiple errors when splitting, because a datanode is missing as well. The master decides to give the task to someone else. But often the task continues in the first RS. Interrupts are often ignored, as it's well stated in the code (// TODO interrupt often gets swallowed, do what else?) {code} 2012-09-04 18:27:30,404 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread {code} t + 211s: two regionsservers are processing the same task. They fight for the leases: {code} 2012-09-04 18:27:32,004 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /hbase/TABLE/4d1c1a4695b1df8c58d13382b834332e/recovered.edits/037.temp owned by DFSClient_hb_rs_BOX2,60020,1346775882980 but is accessed by DFSClient_hb_rs_BOX1,60020,1346775719125 {code} They can fight like this for many files, until the tasks finally get interrupted or finished. The taks on the second box can be cancelled as well. In this case, the task is created again for a new box. The master seems to stop after 3 attemps. It can as well renounce to split the files. Sometimes the tasks were not cancelled on the RS side, so the split is finished despites what the master thinks and logs. In this case, the assignement starts. In the other, it's we've got a problem). {code} 2012-09-04 18:43:52,724 INFO org.apache.hadoop.hbase.master.SplitLogManager: Skipping resubmissions of task /hbase/splitlog/hdfs%3A%2F%2FBOX1%3A9000%2Fhbase%2F.logs%2FBOX0%2C60020%2C1346776587640-splitting%2FBOX0%252C60020%252C1346776587640.1346776587832 because threshold 3 reached {code} t + 300s: split is finished. Assignement starts t + 330s: assignement is finished, regions are available again. There are a lot of subcases possible depending on the number of logs files, of region server and so on. The issues are: 1) it's difficult, especially in HBase but not only, to interrupt a task. The pattern is often {code} void f() throws IOException{ try { // whatever throw InterruptedException }catch(InterruptedException){ throw new InterruptedIOException(); } } boolean g(){ int nbRetry= 0; for(;;) try{ f(); return true; }catch(IOException e){ nbRetry++; if ( nbRetry maxRetry) return false; } } } {code} This tyically shallows the interrupt. There are other variation, but this one seems to be the standard. Even if we fix this in HBase, we need the other layers to be Interrupteble as well. That's not proven. 2) 25s is very aggressive, considering that we have a default timeout of 180s for zookeeper. In other words, we give 180s to a regionserver before acting, but when it comes to split, it's 25s only. There may be reasons for this, but it seems dangerous, as during a failure the cluster is less available than during normal operations. We could do stuff around this, for example: =
[jira] [Updated] (HBASE-6738) Too aggressive task resubmission from the distributed log manager
[ https://issues.apache.org/jira/browse/HBASE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6738: --- Attachment: 6738.v1.patch Too aggressive task resubmission from the distributed log manager - Key: HBASE-6738 URL: https://issues.apache.org/jira/browse/HBASE-6738 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.1, 0.96.0 Environment: 3 nodes cluster test, but can occur as well on a much bigger one. It's all luck! Reporter: nkeywal Priority: Critical Attachments: 6738.v1.patch With default settings for hbase.splitlog.manager.timeout = 25s and hbase.splitlog.max.resubmit = 3. On tests mentionned on HBASE-5843, I have variations around this scenario, 0.94 + HDFS 1.0.3: The regionserver in charge of the split does not answer in less than 25s, so it gets interrupted but actually continues. Sometimes, we go out of the number of retry, sometimes not, sometimes we're out of retry, but the as the interrupts were ignored we finish nicely. In the mean time, the same single task is executed in parallel by multiple nodes, increasing the probability to get into race conditions. Details: t0: unplug a box with DN+RS t + x: other boxes are already connected, to their connection starts to dies. Nevertheless, they don't consider this node as suspect. t + 180s: zookeeper - master detects the node as dead. recovery start. It can be less than 180s sometimes it around 150s. t + 180s: distributed split starts. There is only 1 task, it's immediately acquired by a one RS. t + 205s: the RS has multiple errors when splitting, because a datanode is missing as well. The master decides to give the task to someone else. But often the task continues in the first RS. Interrupts are often ignored, as it's well stated in the code (// TODO interrupt often gets swallowed, do what else?) {code} 2012-09-04 18:27:30,404 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread {code} t + 211s: two regionsservers are processing the same task. They fight for the leases: {code} 2012-09-04 18:27:32,004 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /hbase/TABLE/4d1c1a4695b1df8c58d13382b834332e/recovered.edits/037.temp owned by DFSClient_hb_rs_BOX2,60020,1346775882980 but is accessed by DFSClient_hb_rs_BOX1,60020,1346775719125 {code} They can fight like this for many files, until the tasks finally get interrupted or finished. The taks on the second box can be cancelled as well. In this case, the task is created again for a new box. The master seems to stop after 3 attemps. It can as well renounce to split the files. Sometimes the tasks were not cancelled on the RS side, so the split is finished despites what the master thinks and logs. In this case, the assignement starts. In the other, it's we've got a problem). {code} 2012-09-04 18:43:52,724 INFO org.apache.hadoop.hbase.master.SplitLogManager: Skipping resubmissions of task /hbase/splitlog/hdfs%3A%2F%2FBOX1%3A9000%2Fhbase%2F.logs%2FBOX0%2C60020%2C1346776587640-splitting%2FBOX0%252C60020%252C1346776587640.1346776587832 because threshold 3 reached {code} t + 300s: split is finished. Assignement starts t + 330s: assignement is finished, regions are available again. There are a lot of subcases possible depending on the number of logs files, of region server and so on. The issues are: 1) it's difficult, especially in HBase but not only, to interrupt a task. The pattern is often {code} void f() throws IOException{ try { // whatever throw InterruptedException }catch(InterruptedException){ throw new InterruptedIOException(); } } boolean g(){ int nbRetry= 0; for(;;) try{ f(); return true; }catch(IOException e){ nbRetry++; if ( nbRetry maxRetry) return false; } } } {code} This tyically shallows the interrupt. There are other variation, but this one seems to be the standard. Even if we fix this in HBase, we need the other layers to be Interrupteble as well. That's not proven. 2) 25s is very aggressive, considering that we have a default timeout of 180s for zookeeper. In other words, we give 180s to a regionserver before acting, but when it comes to split, it's 25s only. There may be reasons for this, but it seems dangerous, as during a failure the cluster is less available than during normal operations. We could do stuff around this, for example: = Obvious
[jira] [Commented] (HBASE-6651) Thread safety of HTablePool is doubtful
[ https://issues.apache.org/jira/browse/HBASE-6651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460726#comment-13460726 ] stack commented on HBASE-6651: -- Hiroshi Your work is more palatable as a patch rather than zip file. This might be of use to you: http://hbase.apache.org/book.html#submitting.patches Thanks for looking into this stuff. Thread safety of HTablePool is doubtful --- Key: HBASE-6651 URL: https://issues.apache.org/jira/browse/HBASE-6651 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.1 Reporter: Hiroshi Ikeda Priority: Minor Attachments: sample.zip, sample.zip, sharedmap_for_hbaseclient.zip There are some operations in HTablePool to access to PoolMap in multiple times without any explict synchronization. For example HTablePool.closeTablePool() calles PoolMap.values(), and calles PoolMap.remove(). If other threads add new instances to the pool in the middle of the calls, the new added instances might be dropped. (HTablePool.closeTablePool() also has another problem that calling it by multple threads causes accessing HTable by multiple threads.) Moreover, PoolMap is not thread safe for the same reason. For example PoolMap.put() calles ConcurrentMap.get() and calles ConcurrentMap.put(). If other threads add a new instance to the concurent map in the middle of the calls, the new instance might be dropped. And also implementations of Pool have the same problems. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6738) Too aggressive task resubmission from the distributed log manager
[ https://issues.apache.org/jira/browse/HBASE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460730#comment-13460730 ] nkeywal commented on HBASE-6738: This is a low profile patch, in which I tried to limit the impact to a minimum. I don't know if something more ambitious should not be done (i.e. cleaning up this stuff), but... Reviews welcome. I have not tried it on a real cluster. Too aggressive task resubmission from the distributed log manager - Key: HBASE-6738 URL: https://issues.apache.org/jira/browse/HBASE-6738 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.1, 0.96.0 Environment: 3 nodes cluster test, but can occur as well on a much bigger one. It's all luck! Reporter: nkeywal Priority: Critical Attachments: 6738.v1.patch With default settings for hbase.splitlog.manager.timeout = 25s and hbase.splitlog.max.resubmit = 3. On tests mentionned on HBASE-5843, I have variations around this scenario, 0.94 + HDFS 1.0.3: The regionserver in charge of the split does not answer in less than 25s, so it gets interrupted but actually continues. Sometimes, we go out of the number of retry, sometimes not, sometimes we're out of retry, but the as the interrupts were ignored we finish nicely. In the mean time, the same single task is executed in parallel by multiple nodes, increasing the probability to get into race conditions. Details: t0: unplug a box with DN+RS t + x: other boxes are already connected, to their connection starts to dies. Nevertheless, they don't consider this node as suspect. t + 180s: zookeeper - master detects the node as dead. recovery start. It can be less than 180s sometimes it around 150s. t + 180s: distributed split starts. There is only 1 task, it's immediately acquired by a one RS. t + 205s: the RS has multiple errors when splitting, because a datanode is missing as well. The master decides to give the task to someone else. But often the task continues in the first RS. Interrupts are often ignored, as it's well stated in the code (// TODO interrupt often gets swallowed, do what else?) {code} 2012-09-04 18:27:30,404 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread {code} t + 211s: two regionsservers are processing the same task. They fight for the leases: {code} 2012-09-04 18:27:32,004 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /hbase/TABLE/4d1c1a4695b1df8c58d13382b834332e/recovered.edits/037.temp owned by DFSClient_hb_rs_BOX2,60020,1346775882980 but is accessed by DFSClient_hb_rs_BOX1,60020,1346775719125 {code} They can fight like this for many files, until the tasks finally get interrupted or finished. The taks on the second box can be cancelled as well. In this case, the task is created again for a new box. The master seems to stop after 3 attemps. It can as well renounce to split the files. Sometimes the tasks were not cancelled on the RS side, so the split is finished despites what the master thinks and logs. In this case, the assignement starts. In the other, it's we've got a problem). {code} 2012-09-04 18:43:52,724 INFO org.apache.hadoop.hbase.master.SplitLogManager: Skipping resubmissions of task /hbase/splitlog/hdfs%3A%2F%2FBOX1%3A9000%2Fhbase%2F.logs%2FBOX0%2C60020%2C1346776587640-splitting%2FBOX0%252C60020%252C1346776587640.1346776587832 because threshold 3 reached {code} t + 300s: split is finished. Assignement starts t + 330s: assignement is finished, regions are available again. There are a lot of subcases possible depending on the number of logs files, of region server and so on. The issues are: 1) it's difficult, especially in HBase but not only, to interrupt a task. The pattern is often {code} void f() throws IOException{ try { // whatever throw InterruptedException }catch(InterruptedException){ throw new InterruptedIOException(); } } boolean g(){ int nbRetry= 0; for(;;) try{ f(); return true; }catch(IOException e){ nbRetry++; if ( nbRetry maxRetry) return false; } } } {code} This tyically shallows the interrupt. There are other variation, but this one seems to be the standard. Even if we fix this in HBase, we need the other layers to be Interrupteble as well. That's not proven. 2) 25s is very aggressive, considering that we have a default timeout of 180s for zookeeper. In other words, we give 180s to
[jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
[ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460736#comment-13460736 ] Todd Lipcon commented on HBASE-6852: bq. getAndIncrement into just one cpu instruction True, but it's a pretty expensive instruction, since it has to steal that cache line from whichever other core used it previously, and I believe acts as a full memory barrier as well (eg flushing write-combining buffers) The cliff click counter is effective but has more memory usage. Aggregating stuff locally and pushing to metrics seems ideal, but if we can't do that easily, then having the metrics per-thread and then occasionally grabbing them would work too. Memcached metrics work like that. SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields Key: HBASE-6852 URL: https://issues.apache.org/jira/browse/HBASE-6852 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.94.0 Reporter: Cheng Hao Priority: Minor Labels: performance Fix For: 0.94.3, 0.96.0 Attachments: onhitcache-trunk.patch The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning. Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for the less-well-format) CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 500 samples %image name symbol name --- 9844713.4324 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) 98447100.000 14033.jo void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory, boolean) [self] --- 45814 6.2510 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) 45814100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[], int, int, byte[], int, int) [self] --- 43523 5.9384 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) 43523100.000 14033.jo boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue) [self] --- 42548 5.8054 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) 42548100.000 14033.jo int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[], int, int, byte[], int, int) [self] --- 40572 5.5358 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 40572100.000 14033.jo int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[], int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6738) Too aggressive task resubmission from the distributed log manager
[ https://issues.apache.org/jira/browse/HBASE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460737#comment-13460737 ] stack commented on HBASE-6738: -- What do you mean here: {quote}This allows to continue if the worker cannot actually handle it, + // for any reason.{quote} This seems like a small change extending timeout while also reacting faster if server is actually gone. I'm +1 on patch. Too aggressive task resubmission from the distributed log manager - Key: HBASE-6738 URL: https://issues.apache.org/jira/browse/HBASE-6738 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.1, 0.96.0 Environment: 3 nodes cluster test, but can occur as well on a much bigger one. It's all luck! Reporter: nkeywal Priority: Critical Attachments: 6738.v1.patch With default settings for hbase.splitlog.manager.timeout = 25s and hbase.splitlog.max.resubmit = 3. On tests mentionned on HBASE-5843, I have variations around this scenario, 0.94 + HDFS 1.0.3: The regionserver in charge of the split does not answer in less than 25s, so it gets interrupted but actually continues. Sometimes, we go out of the number of retry, sometimes not, sometimes we're out of retry, but the as the interrupts were ignored we finish nicely. In the mean time, the same single task is executed in parallel by multiple nodes, increasing the probability to get into race conditions. Details: t0: unplug a box with DN+RS t + x: other boxes are already connected, to their connection starts to dies. Nevertheless, they don't consider this node as suspect. t + 180s: zookeeper - master detects the node as dead. recovery start. It can be less than 180s sometimes it around 150s. t + 180s: distributed split starts. There is only 1 task, it's immediately acquired by a one RS. t + 205s: the RS has multiple errors when splitting, because a datanode is missing as well. The master decides to give the task to someone else. But often the task continues in the first RS. Interrupts are often ignored, as it's well stated in the code (// TODO interrupt often gets swallowed, do what else?) {code} 2012-09-04 18:27:30,404 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread {code} t + 211s: two regionsservers are processing the same task. They fight for the leases: {code} 2012-09-04 18:27:32,004 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /hbase/TABLE/4d1c1a4695b1df8c58d13382b834332e/recovered.edits/037.temp owned by DFSClient_hb_rs_BOX2,60020,1346775882980 but is accessed by DFSClient_hb_rs_BOX1,60020,1346775719125 {code} They can fight like this for many files, until the tasks finally get interrupted or finished. The taks on the second box can be cancelled as well. In this case, the task is created again for a new box. The master seems to stop after 3 attemps. It can as well renounce to split the files. Sometimes the tasks were not cancelled on the RS side, so the split is finished despites what the master thinks and logs. In this case, the assignement starts. In the other, it's we've got a problem). {code} 2012-09-04 18:43:52,724 INFO org.apache.hadoop.hbase.master.SplitLogManager: Skipping resubmissions of task /hbase/splitlog/hdfs%3A%2F%2FBOX1%3A9000%2Fhbase%2F.logs%2FBOX0%2C60020%2C1346776587640-splitting%2FBOX0%252C60020%252C1346776587640.1346776587832 because threshold 3 reached {code} t + 300s: split is finished. Assignement starts t + 330s: assignement is finished, regions are available again. There are a lot of subcases possible depending on the number of logs files, of region server and so on. The issues are: 1) it's difficult, especially in HBase but not only, to interrupt a task. The pattern is often {code} void f() throws IOException{ try { // whatever throw InterruptedException }catch(InterruptedException){ throw new InterruptedIOException(); } } boolean g(){ int nbRetry= 0; for(;;) try{ f(); return true; }catch(IOException e){ nbRetry++; if ( nbRetry maxRetry) return false; } } } {code} This tyically shallows the interrupt. There are other variation, but this one seems to be the standard. Even if we fix this in HBase, we need the other layers to be Interrupteble as well. That's not proven. 2) 25s is very aggressive, considering that we have a default timeout of 180s for zookeeper. In other
[jira] [Commented] (HBASE-6856) Document the LeaseException thrown in scanner next
[ https://issues.apache.org/jira/browse/HBASE-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460740#comment-13460740 ] Hudson commented on HBASE-6856: --- Integrated in HBase-TRUNK #3366 (See [https://builds.apache.org/job/HBase-TRUNK/3366/]) HBASE-6856 Document the LeaseException thrown in scanner next (Revision 1388604) Result = FAILURE stack : Files : * /hbase/trunk/src/docbkx/troubleshooting.xml Document the LeaseException thrown in scanner next -- Key: HBASE-6856 URL: https://issues.apache.org/jira/browse/HBASE-6856 Project: HBase Issue Type: Improvement Components: documentation Affects Versions: 0.92.0 Reporter: Daniel Iancu Assignee: Daniel Iancu Labels: LeaseException Fix For: 0.96.0 In some situations clients that fetch data from a RS get a LeaseException instead of the usual ScannerTimeoutException/UnknownScannerException. This particular case should be documented in the HBase guide. Some key points * the source of exception is: org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230) * it happens in the context of a slow/freezing RS#next * it can be prevented by having hbase.rpc.timeout hbase.regionserver.lease.period Harsh J investigated the issue and has some conclusions, see http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6752) On region server failure, serve writes and timeranged reads during the log split
[ https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460742#comment-13460742 ] nkeywal commented on HBASE-6752: Seems reasonable, there are still some dark areas around timerange. Let's do thing smoothly :-). But I think your comment is right. Some various points I had in mind: There is another use case mentionned in HBASE-3745: In some applications, a common access pattern is to frequently scan tables with a time range predicate restricted to a fairly recent time window. For example, you may want to do an incremental aggregation or indexing step only on rows that have changed in the last hour. We do this efficiently by tracking min and max timestamp on an HFile level, so that old HFiles don't have to be read. bq. We do want the old edits to come in the correct order of sequence ids Imho yes, we should not relax any point of the HBase consistency. bq. So, we somehow need to cheaply find the correct sequence id to use for the new puts. It needs to be bigger than sequence ids for all the edits for that region in the log files. So maybe all that's needed here is to open recover the latest log file, and scan it to find the last sequence id? I would like HBase to be resilient to log files issues (no replica, corrupted files, overloaded datanodes, bad luck when choosing the datanode to read from...) by not opening them at all during this process. Would a guess estimate be ok? counting the number of files/blocks to calculate the maximum number of id? bq. Picking a winner among duplicates in two files relies on using sequence id of the HFile as a tie-break. And therefore, today, compactions always pick a dense subrange of files order by sequence ids. I wonder if we need major compactions? I was thinking that they could be skipped. But we need to be able to manage small compactions for sure. I imagine that we can have some critical cases where we can be in the intermediate state a few days: (week end + trying to fix the broken hlog on a test cluster + waiting for a non critical moment for fixing the production env)... On region server failure, serve writes and timeranged reads during the log split Key: HBASE-6752 URL: https://issues.apache.org/jira/browse/HBASE-6752 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: Gregory Chanan Priority: Minor Opening for write on failure would mean: - Assign the region to a new regionserver. It marks the region as recovering -- specific exception returned to the client when we cannot server. -- allow them to know where they stand. The exception can include some time information (failure stated on: ...) -- allow them to go immediately on the right regionserver, instead of retrying or calling the region holding meta to get the new address = save network calls, lower the load on meta. - Do the split as today. Priority is given to region server holding the new regions -- help to share the load balancing code: the split is done by region server considered as available for new regions -- help locality (the recovered edits are available on the region server) = lower the network usage - When the split is finished, we're done as of today - while the split is progressing, the region server can -- serve writes --- that's useful for all application that need to write but not read immediately: --- whatever logs events to analyze them later --- opentsdb is a perfect example. -- serve reads if they have a compatible time range. For heavily used tables, it could be an help, because: --- we can expect to have a few minutes of data only (as it's loaded) --- the heaviest queries, often accepts a few -or more- minutes delay. Some What if: 1) the split fails = Retry until it works. As today. Just that we serves writes. We need to know (as today) that the region has not recovered if we fail again. 2) the regionserver fails during the split = As 1 and as of today/ 3) the regionserver fails after the split but before the state change to fully available. = New assign. More logs to split (the ones already dones and the new ones). 4) the assignment fails = Retry until it works. As today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6504) Adding GC details prevents HBase from starting in non-distributed mode
[ https://issues.apache.org/jira/browse/HBASE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460747#comment-13460747 ] Hudson commented on HBASE-6504: --- Integrated in HBase-0.94-security #53 (See [https://builds.apache.org/job/HBase-0.94-security/53/]) HBASE-6504 Adding GC details prevents HBase from starting in non-distributed mode (Revision 1385027) Result = SUCCESS stack : Files : * /hbase/branches/0.94/bin/rolling-restart.sh * /hbase/branches/0.94/bin/start-hbase.sh * /hbase/branches/0.94/bin/stop-hbase.sh Adding GC details prevents HBase from starting in non-distributed mode -- Key: HBASE-6504 URL: https://issues.apache.org/jira/browse/HBASE-6504 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Benoit Sigoure Assignee: Michael Drzal Priority: Trivial Labels: noob Fix For: 0.94.2, 0.96.0 Attachments: HBASE-6504-output.txt, HBASE-6504.patch, HBASE-6504-v2.patch The {{conf/hbase-env.sh}} that ships with HBase contains a few commented out examples of variables that could be useful, such as adding {{-XX:+PrintGCDetails -XX:+PrintGCDateStamps}} to {{HBASE_OPTS}}. This has the annoying side effect that the JVM prints a summary of memory usage when it exits, and it does so on stdout: {code} $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed false Heap par new generation total 19136K, used 4908K [0x00073a20, 0x00073b6c, 0x00075186) eden space 17024K, 28% used [0x00073a20, 0x00073a6cb0a8, 0x00073b2a) from space 2112K, 0% used [0x00073b2a, 0x00073b2a, 0x00073b4b) to space 2112K, 0% used [0x00073b4b, 0x00073b4b, 0x00073b6c) concurrent mark-sweep generation total 63872K, used 0K [0x00075186, 0x0007556c, 0x0007f5a0) concurrent-mark-sweep perm gen total 21248K, used 6994K [0x0007f5a0, 0x0007f6ec, 0x0008) $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed /dev/null (nothing printed) {code} And this confuses {{bin/start-hbase.sh}} when it does {{distMode=`$bin/hbase --config $HBASE_CONF_DIR org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed`}}, because then the {{distMode}} variable is not just set to {{false}}, it also contains all this JVM spam. If you don't pay enough attention and realize that 3 processes are getting started (ZK, HM, RS) instead of just one (HM), then you end up with this confusing error message: {{Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.}}, which is even more puzzling because when you run {{netstat}} to see who owns that port, then you won't find any rogue process other than the one you just started. I'm wondering if the fix is not to just change the {{if [ $distMode == 'false' ]}} to a {{switch $distMode case (false*)}} type of test, to work around this annoying JVM misfeature that pollutes stdout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6803) script hbase should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH
[ https://issues.apache.org/jira/browse/HBASE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460748#comment-13460748 ] Hudson commented on HBASE-6803: --- Integrated in HBase-0.94-security #53 (See [https://builds.apache.org/job/HBase-0.94-security/53/]) HBASE-6803 script hbase should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH (Revision 1387260) Result = SUCCESS jxiang : Files : * /hbase/branches/0.94/bin/hbase script hbase should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH Key: HBASE-6803 URL: https://issues.apache.org/jira/browse/HBASE-6803 Project: HBase Issue Type: Bug Components: shell Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.94.2, 0.96.0 Attachments: trunk-6803.patch Snappy SO fails to load properly if LD_LIBRARY_PATH does not include the path where snappy SO is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460750#comment-13460750 ] Hudson commented on HBASE-6649: --- Integrated in HBase-0.94-security #53 (See [https://builds.apache.org/job/HBase-0.94-security/53/]) HBASE-6847 HBASE-6649 broke replication (Devaraj Das via JD) (Revision 1388160) Result = SUCCESS jdcryans : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Blocker Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling-1.patch, 6649-fix-io-exception-handling-1-trunk.patch, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6792) Remove interface audience annotations in 0.94/0.92 introduced by HBASE-6516
[ https://issues.apache.org/jira/browse/HBASE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460752#comment-13460752 ] Hudson commented on HBASE-6792: --- Integrated in HBase-0.94-security #53 (See [https://builds.apache.org/job/HBase-0.94-security/53/]) HBASE-6792 Remove interface audience annotations in 0.94/0.92 introduced by HBASE-6516 (Revision 1384947) Result = SUCCESS jmhsieh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/TableInfoMissingException.java Remove interface audience annotations in 0.94/0.92 introduced by HBASE-6516 --- Key: HBASE-6792 URL: https://issues.apache.org/jira/browse/HBASE-6792 Project: HBase Issue Type: Sub-task Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.92.3, 0.94.2 Attachments: hbase-6792.patch bq. An InterfaceAudience slipped into 0.94 here. It breaks 0.94 for older versions of hadoop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6842) the jar used in coprocessor is not deleted in local which will exhaust the space of /tmp
[ https://issues.apache.org/jira/browse/HBASE-6842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460751#comment-13460751 ] Hudson commented on HBASE-6842: --- Integrated in HBase-0.94-security #53 (See [https://builds.apache.org/job/HBase-0.94-security/53/]) HBASE-6842 the jar used in coprocessor is not deleted in local which will exhaust the space of /tmp (Revision 1387862) Result = SUCCESS stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java the jar used in coprocessor is not deleted in local which will exhaust the space of /tmp --- Key: HBASE-6842 URL: https://issues.apache.org/jira/browse/HBASE-6842 Project: HBase Issue Type: Bug Components: Coprocessors Affects Versions: 0.94.1 Reporter: Zhou wenjian Assignee: Zhou wenjian Priority: Critical Fix For: 0.94.2, 0.96.0 Attachments: HBASE-6842-trunk.patch FileSystem fs = path.getFileSystem(HBaseConfiguration.create()); Path dst = new Path(System.getProperty(java.io.tmpdir) + java.io.File.separator +. + pathPrefix + . + className + . + System.currentTimeMillis() + .jar); fs.copyToLocalFile(path, dst); fs.deleteOnExit(dst); change to File tmpLocal = new File(dst.toString()); tmpLocal.deleteOnExit(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies
[ https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460753#comment-13460753 ] Hudson commented on HBASE-6438: --- Integrated in HBase-0.94-security #53 (See [https://builds.apache.org/job/HBase-0.94-security/53/]) HBASE-6438 Addendum checks regionAlreadyInTransitionException when generating region plan (Chunhui) (Revision 1387209) HBASE-6438 RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies (Rajesh) (Revision 1385209) Result = SUCCESS tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies -- Key: HBASE-6438 URL: https://issues.apache.org/jira/browse/HBASE-6438 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: rajeshbabu Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: 6438-0.92.txt, 6438.addendum, 6438-addendum.94, 6438-trunk_2.patch, HBASE-6438_2.patch, HBASE-6438_94_3.patch, HBASE-6438_94_4.patch, HBASE-6438_94.patch, HBASE-6438-trunk_2.patch, HBASE-6438_trunk.patch Seeing some of the recent issues in region assignment, RegionAlreadyInTransitionException is one reason after which the region assignment may or may not happen(in the sense we need to wait for the TM to assign). In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on master restart. Consider the following case, due to some reason like master restart or external assign call, we try to assign a region that is already getting opened in a RS. Now the next call to assign has already changed the state of the znode and so the current assign that is going on the RS is affected and it fails. The second assignment that started also fails getting RAITE exception. Finally both assignments not carrying on. Idea is to find whether any such RAITE exception can be retried or not. Here again we have following cases like where - The znode is yet to transitioned from OFFLINE to OPENING in RS - RS may be in the step of openRegion. - RS may be trying to transition OPENING to OPENED. - RS is yet to add to online regions in the RS side. Here in openRegion() and updateMeta() any failures we are moving the znode to FAILED_OPEN. So in these cases getting an RAITE should be ok. But in other cases the assignment is stopped. The idea is to just add the current state of the region assignment in the RIT map in the RS side and using that info we can determine whether the assignment can be retried or not on getting an RAITE. Considering the current work going on in AM, pls do share if this is needed atleast in the 0.92/0.94 versions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6844) upgrade 0.23 version dependency in 0.94
[ https://issues.apache.org/jira/browse/HBASE-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460754#comment-13460754 ] Hudson commented on HBASE-6844: --- Integrated in HBase-0.94-security #53 (See [https://builds.apache.org/job/HBase-0.94-security/53/]) HBASE-6844 upgrade 0.23 version dependency in 0.94 (Revision 1387856) Result = SUCCESS stack : Files : * /hbase/branches/0.94/pom.xml upgrade 0.23 version dependency in 0.94 --- Key: HBASE-6844 URL: https://issues.apache.org/jira/browse/HBASE-6844 Project: HBase Issue Type: Bug Reporter: Francis Liu Assignee: Francis Liu Fix For: 0.92.3, 0.94.2 Attachments: 6844-092.txt, 6844.txt hadoop 0.23 has been promoted to stable. The snapshot jar no longer exists in maven. https://repository.apache.org/content/repositories/releases/org/apache/hadoop/hadoop-common/0.23.3/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira