[jira] [Commented] (HBASE-4583) Integrate RWCC with Append and Increment operations
[ https://issues.apache.org/jira/browse/HBASE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491871#comment-13491871 ] Jonathan Gray commented on HBASE-4583: -- My vote (if only for one implementation) would be for the less radical patch that removes in-memory versions that are not visible rather than doing this cleanup on flush which has a number of performance implications. I can see some reasons for wanting to keep versions around (providing support to an Omid-like transaction engine requires retaining old versions for at least some time), but it would be cool to have an option to prevent the deletion of the old versions rather than require that these exist in cases I won't ever use them. In all my increment performance tests, of which there have been many, the upsert/removal of old versions is one of the biggest gains, especially if you have particularly hot columns. I'm not sure which design you are referring to when you talk about being true to HBase's design ;) Or maybe you're referring to the general principles of HBase (append-only), but the increment operation itself was not part of any original design or implementation of HBase and has been a hack in one way or another from the very first implementation. For the reason that the implementation has been targeted at performance over purity. I've always seen it as an atomic operation that would have any notion of versioning as opaque to the user of the atomic increment. Again, I can see use cases for it, but I'd lean towards having it as an option rather than requirement. Thanks for doing this work, good stuff. +1 Integrate RWCC with Append and Increment operations --- Key: HBASE-4583 URL: https://issues.apache.org/jira/browse/HBASE-4583 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.96.0 Attachments: 4583-trunk-less-radical.txt, 4583-trunk-less-radical-v2.txt, 4583-trunk-less-radical-v3.txt, 4583-trunk-less-radical-v4.txt, 4583-trunk-less-radical-v5.txt, 4583-trunk-less-radical-v6.txt, 4583-trunk-radical.txt, 4583-trunk-radical_v2.txt, 4583-trunk-v3.txt, 4583.txt, 4583-v2.txt, 4583-v3.txt, 4583-v4.txt Currently Increment and Append operations do not work with RWCC and hence a client could see the results of multiple such operation mixed in the same Get/Scan. The semantics might be a bit more interesting here as upsert adds and removes to and from the memstore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4583) Integrate RWCC with Append and Increment operations
[ https://issues.apache.org/jira/browse/HBASE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492144#comment-13492144 ] Jonathan Gray commented on HBASE-4583: -- That makes sense to me (versions = 1 means upsert). Big +1 from me on adding support for setting the timestamp. Integrate RWCC with Append and Increment operations --- Key: HBASE-4583 URL: https://issues.apache.org/jira/browse/HBASE-4583 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.96.0 Attachments: 4583-mixed.txt, 4583-trunk-less-radical.txt, 4583-trunk-less-radical-v2.txt, 4583-trunk-less-radical-v3.txt, 4583-trunk-less-radical-v4.txt, 4583-trunk-less-radical-v5.txt, 4583-trunk-less-radical-v6.txt, 4583-trunk-radical.txt, 4583-trunk-radical_v2.txt, 4583-trunk-v3.txt, 4583.txt, 4583-v2.txt, 4583-v3.txt, 4583-v4.txt Currently Increment and Append operations do not work with RWCC and hence a client could see the results of multiple such operation mixed in the same Get/Scan. The semantics might be a bit more interesting here as upsert adds and removes to and from the memstore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4014) Coprocessors: Flag the presence of coprocessors in logged exceptions
[ https://issues.apache.org/jira/browse/HBASE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114422#comment-13114422 ] Jonathan Gray commented on HBASE-4014: -- Ted, why is this JIRA scattered over so many commits? And the commit message is a non-standard format (the first line is: HBASE-4014 is marked as Improvement). I've been trying to build some tools to help keep track of and in sync with the Apache repos but this kind of stuff makes it very difficult. Coprocessors: Flag the presence of coprocessors in logged exceptions Key: HBASE-4014 URL: https://issues.apache.org/jira/browse/HBASE-4014 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: Andrew Purtell Assignee: Eugene Koontz Fix For: 0.92.0 Attachments: 4014.final, HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch For some initial triage of bug reports for core versus for deployments with loaded coprocessors, we need something like the Linux kernel's taint flag, and list of linked in modules that show up in the output of every OOPS, to appear above or below exceptions that appear in the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4014) Coprocessors: Flag the presence of coprocessors in logged exceptions
[ https://issues.apache.org/jira/browse/HBASE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114040#comment-13114040 ] Jonathan Gray commented on HBASE-4014: -- What's the status of this? Coprocessors: Flag the presence of coprocessors in logged exceptions Key: HBASE-4014 URL: https://issues.apache.org/jira/browse/HBASE-4014 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: Andrew Purtell Assignee: Eugene Koontz Fix For: 0.92.0 Attachments: HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch For some initial triage of bug reports for core versus for deployments with loaded coprocessors, we need something like the Linux kernel's taint flag, and list of linked in modules that show up in the output of every OOPS, to appear above or below exceptions that appear in the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer
[ https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114043#comment-13114043 ] Jonathan Gray commented on HBASE-4460: -- Since security stuff can be dealt with in a separate JIRA, what do people think of the patch I have up? Shall I submit to rb? Support running an embedded ThriftServer within a RegionServer -- Key: HBASE-4460 URL: https://issues.apache.org/jira/browse/HBASE-4460 Project: HBase Issue Type: New Feature Components: regionserver, thrift Reporter: Jonathan Gray Assignee: Jonathan Gray Attachments: HBASE-4460-v1.patch Rather than a separate process, it can be advantageous in some situations for each RegionServer to embed their own ThriftServer. This allows each embedded ThriftServer to short-circuit any queries that should be executed on the local RS and skip the extra hop. This then enables the building of fat Thrift clients that cache region locations and avoid extra hops all together. This JIRA is just about the embedded ThriftServer. Will open others for the rest. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4461) Expose getRowOrBefore via Thrift
[ https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113669#comment-13113669 ] Jonathan Gray commented on HBASE-4461: -- Well my plan is to use it internally on 0.92 (we are porting all the changes necessary for our fat C++ client from our internal 90 branch). But wherever you think it should go is fine. Expose getRowOrBefore via Thrift Key: HBASE-4461 URL: https://issues.apache.org/jira/browse/HBASE-4461 Project: HBase Issue Type: Improvement Components: thrift Reporter: Jonathan Gray Assignee: Jonathan Gray Attachments: HBASE-4461-v2.patch In order for fat Thrift-based clients to locate region locations they need to utilize the getRowOrBefore method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4461) Expose getRowOrBefore via Thrift
[ https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113671#comment-13113671 ] Jonathan Gray commented on HBASE-4461: -- and I'm saving up my new features to force in 92 to try and get the HLog/Delayable stuff in ;) Expose getRowOrBefore via Thrift Key: HBASE-4461 URL: https://issues.apache.org/jira/browse/HBASE-4461 Project: HBase Issue Type: Improvement Components: thrift Reporter: Jonathan Gray Assignee: Jonathan Gray Attachments: HBASE-4461-v2.patch In order for fat Thrift-based clients to locate region locations they need to utilize the getRowOrBefore method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition
[ https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113674#comment-13113674 ] Jonathan Gray commented on HBASE-4131: -- Thanks stack! Make the Replication Service pluggable via a standard interface definition -- Key: HBASE-4131 URL: https://issues.apache.org/jira/browse/HBASE-4131 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: replicationInterface1.txt, replicationInterface2.txt, replicationInterface3.txt The current HBase code supports a replication service that can be used to sync data from from one hbase cluster to another. It would be nice to make it a pluggable interface so that other cross-data-center replication services can be used in conjuction with HBase. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4461) Expose getRowOrBefore via Thrift
[ https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113768#comment-13113768 ] Jonathan Gray commented on HBASE-4461: -- Man, I remember when i could buy your vote for $2.00! Expose getRowOrBefore via Thrift Key: HBASE-4461 URL: https://issues.apache.org/jira/browse/HBASE-4461 Project: HBase Issue Type: Improvement Components: thrift Reporter: Jonathan Gray Assignee: Jonathan Gray Attachments: HBASE-4461-v2.patch In order for fat Thrift-based clients to locate region locations they need to utilize the getRowOrBefore method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms
[ https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113870#comment-13113870 ] Jonathan Gray commented on HBASE-4449: -- Is this done now? LoadIncrementalHFiles should be able to handle CFs with blooms -- Key: HBASE-4449 URL: https://issues.apache.org/jira/browse/HBASE-4449 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Dave Revell Assignee: Dave Revell Fix For: 0.90.5 Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, HBASE-4449.patch When LoadIncrementalHFiles loads a store file that crosses region boundaries, it will split the file at the boundary to create two store files. If the store file is for a column family that has a bloom filter, then a java.lang.ArithmeticException: / by zero will be raised because ByteBloomFilter() is called with maxKeys of 0. The included patch assumes that the number of keys in each split child will be equal to the number of keys in the parent's bloom filter (instead of 0). This is an overestimate, but it's safe and easy. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms
[ https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113872#comment-13113872 ] Jonathan Gray commented on HBASE-4449: -- It looks like the test change was committed but not the change to LoadIncrementalHFiles? LoadIncrementalHFiles should be able to handle CFs with blooms -- Key: HBASE-4449 URL: https://issues.apache.org/jira/browse/HBASE-4449 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Dave Revell Assignee: Dave Revell Fix For: 0.90.5 Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, HBASE-4449.patch When LoadIncrementalHFiles loads a store file that crosses region boundaries, it will split the file at the boundary to create two store files. If the store file is for a column family that has a bloom filter, then a java.lang.ArithmeticException: / by zero will be raised because ByteBloomFilter() is called with maxKeys of 0. The included patch assumes that the number of keys in each split child will be equal to the number of keys in the parent's bloom filter (instead of 0). This is an overestimate, but it's safe and easy. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer
Support running an embedded ThriftServer within a RegionServer -- Key: HBASE-4460 URL: https://issues.apache.org/jira/browse/HBASE-4460 Project: HBase Issue Type: New Feature Components: regionserver, thrift Reporter: Jonathan Gray Assignee: Jonathan Gray Rather than a separate process, it can be advantageous in some situations for each RegionServer to embed their own ThriftServer. This allows each embedded ThriftServer to short-circuit any queries that should be executed on the local RS and skip the extra hop. This then enables the building of fat Thrift clients that cache region locations and avoid extra hops all together. This JIRA is just about the embedded ThriftServer. Will open others for the rest. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer
[ https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-4460: - Attachment: HBASE-4460-v1.patch Adds {{HRegionThriftServer}}, a RegionServer hosted ThriftServer. Default is off, can be turned on with hbase.regionserver.export.thrift set to true. Support running an embedded ThriftServer within a RegionServer -- Key: HBASE-4460 URL: https://issues.apache.org/jira/browse/HBASE-4460 Project: HBase Issue Type: New Feature Components: regionserver, thrift Reporter: Jonathan Gray Assignee: Jonathan Gray Attachments: HBASE-4460-v1.patch Rather than a separate process, it can be advantageous in some situations for each RegionServer to embed their own ThriftServer. This allows each embedded ThriftServer to short-circuit any queries that should be executed on the local RS and skip the extra hop. This then enables the building of fat Thrift clients that cache region locations and avoid extra hops all together. This JIRA is just about the embedded ThriftServer. Will open others for the rest. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer
[ https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112933#comment-13112933 ] Jonathan Gray commented on HBASE-4460: -- Replacing HRPC is another story but I think many of us are in agreement that we'd like to do that eventually. The scope here is much smaller and I'm working on a set of changes to allow fat Thrift-based clients, not necessarily replacing normal HRPC. Open to your feedback on what I can do to better integrate with security stuff but not sure what I can do at this point. Support running an embedded ThriftServer within a RegionServer -- Key: HBASE-4460 URL: https://issues.apache.org/jira/browse/HBASE-4460 Project: HBase Issue Type: New Feature Components: regionserver, thrift Reporter: Jonathan Gray Assignee: Jonathan Gray Attachments: HBASE-4460-v1.patch Rather than a separate process, it can be advantageous in some situations for each RegionServer to embed their own ThriftServer. This allows each embedded ThriftServer to short-circuit any queries that should be executed on the local RS and skip the extra hop. This then enables the building of fat Thrift clients that cache region locations and avoid extra hops all together. This JIRA is just about the embedded ThriftServer. Will open others for the rest. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4461) Expose getRowOrBefore via Thrift
[ https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112936#comment-13112936 ] Jonathan Gray commented on HBASE-4461: -- Thanks Ted. Expose getRowOrBefore via Thrift Key: HBASE-4461 URL: https://issues.apache.org/jira/browse/HBASE-4461 Project: HBase Issue Type: Improvement Components: thrift Reporter: Jonathan Gray Assignee: Jonathan Gray In order for fat Thrift-based clients to locate region locations they need to utilize the getRowOrBefore method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4296) Deprecate HTable[Interface].getRowOrBefore(...)
[ https://issues.apache.org/jira/browse/HBASE-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112935#comment-13112935 ] Jonathan Gray commented on HBASE-4296: -- Over in HBASE-4461 I am exposing this method to Thrift to enable building fat Thrift-based clients. Rather than deprecating this, could we just notate that it is an expensive operation and not for normal operations? Or even only allow it to work on ROOT and META? Deprecate HTable[Interface].getRowOrBefore(...) --- Key: HBASE-4296 URL: https://issues.apache.org/jira/browse/HBASE-4296 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Trivial Fix For: 0.92.0 Attachments: 4296.txt HTable's getRowOrBefore(...) internally calls into Store.getRowKeyAtOrBefore. That method was created to allow our scanning of .META. (see HBASE-2600). Store.getRowKeyAtOrBefore(...) lists a bunch of requirements for this to be performant that a user of HTable will not be aware of. I propose deprecating this in the public interface in 0.92 and removing it from the public interface in 0.94. If we don't get to HBASE-2600 in 0.94 it will still remain as internal interface for scanning meta. Comments? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer
[ https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112984#comment-13112984 ] Jonathan Gray commented on HBASE-4460: -- Gary, want to open another JIRA and link it here? Support running an embedded ThriftServer within a RegionServer -- Key: HBASE-4460 URL: https://issues.apache.org/jira/browse/HBASE-4460 Project: HBase Issue Type: New Feature Components: regionserver, thrift Reporter: Jonathan Gray Assignee: Jonathan Gray Attachments: HBASE-4460-v1.patch Rather than a separate process, it can be advantageous in some situations for each RegionServer to embed their own ThriftServer. This allows each embedded ThriftServer to short-circuit any queries that should be executed on the local RS and skip the extra hop. This then enables the building of fat Thrift clients that cache region locations and avoid extra hops all together. This JIRA is just about the embedded ThriftServer. Will open others for the rest. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch
[ https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-4452: - Fix Version/s: 0.92.0 lgtm. nice catch. pulling in to 0.92 Possibility of RS opening a region though tickleOpening fails due to znode version mismatch --- Key: HBASE-4452 URL: https://issues.apache.org/jira/browse/HBASE-4452 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.92.0 Attachments: HBASE-4452.patch Consider the following code {code} long period = Math.max(1, assignmentTimeout/ 3); long lastUpdate = now; while (!signaller.get() t.isAlive() !this.server.isStopped() !this.rsServices.isStopping() (endTime now)) { long elapsed = now - lastUpdate; if (elapsed period) { // Only tickle OPENING if postOpenDeployTasks is taking some time. lastUpdate = now; tickleOpening(post_open_deploy); } {code} Whenever the postopenDeploy tasks takes considerable time we try to tickleOpening so that there is no timeout deducted. But before it could do this if the TimeoutMonitor tries to assign the node to another RS then the other RS will move the node from OFFLINE to OPENING. Hence when the first RS tries to do tickleOpening the operation will fail. Now here lies the problem, {code} String encodedName = this.regionInfo.getEncodedName(); try { this.version = ZKAssign.retransitionNodeOpening(server.getZooKeeper(), this.regionInfo, this.server.getServerName(), this.version); } catch (KeeperException e) { {code} Now this.version becomes -1 as the operation failed. Now as in the first code snippet as the return type is not captured after tickleOpening() fails we go on with moving the node to OPENED. Here again we dont have any check for this condition as already the version has been changed to -1. Hence the OPENING to OPENED becomes successful. Chances of double assignment. {noformat} 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the expected version 2 2011-09-22 00:57:33,494 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, context=post_open_deploy 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-09-22 00:58:13,956 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3. {noformat} Correct me if this analysis is wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4462) Properly treating SocketTimeoutException
[ https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113011#comment-13113011 ] Jonathan Gray commented on HBASE-4462: -- +1 on treating STE differently. I think we should treat it as DNRE and kick it back to the client. There could be a configurable policy for socket timeouts (or network level errors in general?) if some people want the HBase client to retry once or something. Properly treating SocketTimeoutException Key: HBASE-4462 URL: https://issues.apache.org/jira/browse/HBASE-4462 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Fix For: 0.92.0 SocketTimeoutException is currently treated like any IOE inside of HCM.getRegionServerWithRetries and I think this is a problem. This method should only do retries in cases where we are pretty sure the operation will complete, but with STE we already waited for (by default) 60 seconds and nothing happened. I found this while debugging Douglas Campbell's problem on the mailing list where it seemed like he was using the same scanner from multiple threads, but actually it was just the same client doing retries while the first run didn't even finish yet (that's another problem). You could see the first scanner, then up to two other handlers waiting for it to finish in order to run (because of the synchronization on RegionScanner). So what should we do? We could treat STE as a DoNotRetryException and let the client deal with it, or we could retry only once. There's also the option of having a different behavior for get/put/icv/scan, the issue with operations that modify a cell is that you don't know if the operation completed or not (same when a RS dies hard after completing let's say a Put but just before returning to the client). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4296) Deprecate HTable[Interface].getRowOrBefore(...)
[ https://issues.apache.org/jira/browse/HBASE-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113037#comment-13113037 ] Jonathan Gray commented on HBASE-4296: -- We are already using the fat thrift client on our 0.90 branch. I'm in the process of pushing this all out into open source so we can then pull it back in to our 0.92 based branch. I'm happy to put this stuff into 0.92 in Apache as well but it's somewhat featurish :) Was the method removed in 0.94 already? Can we just hold off on removing it into 2600 happens and that way it won't matter and we can commit it anywhere. Following 2600 we can modify how it works and just use a normal scanner then? Deprecate HTable[Interface].getRowOrBefore(...) --- Key: HBASE-4296 URL: https://issues.apache.org/jira/browse/HBASE-4296 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Trivial Fix For: 0.92.0 Attachments: 4296.txt HTable's getRowOrBefore(...) internally calls into Store.getRowKeyAtOrBefore. That method was created to allow our scanning of .META. (see HBASE-2600). Store.getRowKeyAtOrBefore(...) lists a bunch of requirements for this to be performant that a user of HTable will not be aware of. I propose deprecating this in the public interface in 0.92 and removing it from the public interface in 0.94. If we don't get to HBASE-2600 in 0.94 it will still remain as internal interface for scanning meta. Comments? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4461) Expose getRowOrBefore via Thrift
[ https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-4461: - Attachment: HBASE-4461-v2.patch Adds getRowOrBefore() exposed to Thrift. Also adds server name and port to TRegionInfo so we can get assignment info through existing APIs in Thrift. Expose getRowOrBefore via Thrift Key: HBASE-4461 URL: https://issues.apache.org/jira/browse/HBASE-4461 Project: HBase Issue Type: Improvement Components: thrift Reporter: Jonathan Gray Assignee: Jonathan Gray Attachments: HBASE-4461-v2.patch In order for fat Thrift-based clients to locate region locations they need to utilize the getRowOrBefore method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4451) Improve zk node naming (/hbase/shutdown)
[ https://issues.apache.org/jira/browse/HBASE-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112155#comment-13112155 ] Jonathan Gray commented on HBASE-4451: -- bq. Would changing this have an effect on compatibility? If you wanted to support this change over a rolling restart or anything like that, it would probably be rather complicated or impractical. So it would require a full restart of the cluster most likely. In addition, any external ops/monitoring/admin tools people have built might be looking at the specific names. That shouldn't necessarily stop us though. Perhaps we can do this as part of a fresh look at the names of the ZK nodes in general. We might make some changes with the root node and such as well in 94. Do you want to look at all the ZK node names and see if there's a new scheme that would be more clear? Improve zk node naming (/hbase/shutdown) Key: HBASE-4451 URL: https://issues.apache.org/jira/browse/HBASE-4451 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.94.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.94.0 Right now the node {{/hbase/shutdown}} is used to indicate cluster status (cluster up, cluster down). However, upon a chat with Lars George today, we feel that having a name {{/hbase/shutdown}} is possibly bad. The {{/hbase/shutdown}} zknode contains a date when the cluster was _started_. Now that is difficult to understand and digest, given that a person may connect to zk and try to look at what it is about (they may think it 'shutdown' at that date.). I feel a better name may simply be: {{/hbase/running}}. Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4132) Extend the WALActionsListener API to accomodate log archival
[ https://issues.apache.org/jira/browse/HBASE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112249#comment-13112249 ] Jonathan Gray commented on HBASE-4132: -- Looks good. One thing: {code}oldPath = new Path(/DUMMY-No-preexisting-logfile);{code} Should we support passing a null path or at least use a static? Extend the WALActionsListener API to accomodate log archival Key: HBASE-4132 URL: https://issues.apache.org/jira/browse/HBASE-4132 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: walArchive.txt, walArchive2.txt The WALObserver interface exposes the log roll events. It would be nice to extend it to accomodate log archival events as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112250#comment-13112250 ] Jonathan Gray commented on HBASE-4153: -- Looks like this introduced a compile error in MockRegionServerServices? Handle RegionAlreadyInTransitionException in AssignmentManager -- Key: HBASE-4153 URL: https://issues.apache.org/jira/browse/HBASE-4153 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: 4153-v3.txt, HBASE-4153_1.patch, HBASE-4153_2.patch, HBASE-4153_3.patch, HBASE-4153_4.patch, HBASE-4153_5.patch, HBASE-4153_6.patch Comment from Stack over in HBASE-3741: {quote} Question: Looking at this patch again, if we throw a RegionAlreadyInTransitionException, won't we just assign the region elsewhere though RegionAlreadyInTransitionException in at least one case here is saying that the region is already open on this regionserver? {quote} Indeed looking at the code it's going to be handled the same way other exceptions are. Need to add special cases for assign and unassign. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112251#comment-13112251 ] Jonathan Gray commented on HBASE-4153: -- nevermind! Handle RegionAlreadyInTransitionException in AssignmentManager -- Key: HBASE-4153 URL: https://issues.apache.org/jira/browse/HBASE-4153 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: 4153-v3.txt, HBASE-4153_1.patch, HBASE-4153_2.patch, HBASE-4153_3.patch, HBASE-4153_4.patch, HBASE-4153_5.patch, HBASE-4153_6.patch Comment from Stack over in HBASE-3741: {quote} Question: Looking at this patch again, if we throw a RegionAlreadyInTransitionException, won't we just assign the region elsewhere though RegionAlreadyInTransitionException in at least one case here is saying that the region is already open on this regionserver? {quote} Indeed looking at the code it's going to be handled the same way other exceptions are. Need to add special cases for assign and unassign. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4432) Enable/Disable off heap cache with config
[ https://issues.apache.org/jira/browse/HBASE-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108104#comment-13108104 ] Jonathan Gray commented on HBASE-4432: -- +1 Enable/Disable off heap cache with config - Key: HBASE-4432 URL: https://issues.apache.org/jira/browse/HBASE-4432 Project: HBase Issue Type: Improvement Reporter: Li Pi Assignee: Li Pi Priority: Trivial Attachments: 4432.v3, enableswitchforoffheapcache.txt, patchv2.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row
[ https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108110#comment-13108110 ] Jonathan Gray commented on HBASE-4433: -- Good stuff. I think the first iteration of the ColumnTracker had the INCLUDE_AND_* primitives but it was simplified. Would be pretty cool that write up a unit test that creates single-KV sized blocks and you could run various queries to see the number of blocks accessed. Especially nice to catch regressions in the future. avoid extra next (potentially a seek) if done with column/row - Key: HBASE-4433 URL: https://issues.apache.org/jira/browse/HBASE-4433 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Kannan Muthukkaruppan [Noticed this in 89, but quite likely true of trunk as well.] When we are done with the requested column(s) the code still does an extra next() call before it realizes that it is actually done. This extra next() call could potentially result in an unnecessary extra block load. This is likely to be especially bad for CFs where the KVs are large blobs where each KV may be occupying a block of its own. So the next() can often load a new unrelated block unnecessarily. -- For the simple case of reading say the top-most column in a row in a single file, where each column (KV) was say a block of its own-- it seems that we are reading 3 blocks, instead of 1 block! I am working on a simple patch and with that the number of seeks is down to 2. [There is still an extra seek left. I think there were two levels of extra/unnecessary next() we were doing without actually confirming that the next was needed. One at the StoreScanner/ScanQueryMatcher level which this diff avoids. I think the other is at hfs.next() (at the storefile scanner level) that's happening whenever a HFile scanner servers out a data-- and perhaps that's the additional seek that we need to avoid. But I want to tackle this optimization first as the two issues seem unrelated.] -- The basic idea of the patch I am working on/testing is as follows. The ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if the KV needs to be included and then if done, only in the the next call it returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases when ExplicitColumnTracker knows it is done with a particular column/row, the patch attempts to combine the INCLUDE code and done hint into a single match code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4410) FilterList.filterKeyValue can return suboptimal ReturnCodes
[ https://issues.apache.org/jira/browse/HBASE-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107495#comment-13107495 ] Jonathan Gray commented on HBASE-4410: -- Actually I think Lars is correct. It's a question of whether we should execute all filters in a list filterKeyValue() or not. I think the right behavior is actually just to make it execute how one would expect this type of conditional to execute: if (conditionA conditionB) If conditionA fails, we don't expect conditionB to be executed. if (conditionA || conditionB) If conditionA passes, we don't expect conditionB to be executed. This was the previous behavior and my patch undoes it. I will work on a new patch. FilterList.filterKeyValue can return suboptimal ReturnCodes --- Key: HBASE-4410 URL: https://issues.apache.org/jira/browse/HBASE-4410 Project: HBase Issue Type: Improvement Components: filters Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4410-v1.patch FilterList.filterKeyValue does not always return the most optimal ReturnCode in both the AND and OR conditions. For example, if you have F1 AND F2, F1 returns SKIP. It immediately returns the SKIP. However, if F2 would have returned NEXT_COL or NEXT_ROW or SEEK_NEXT_USING_HINT, we would actually be able to return the more optimal ReturnCode from F2. For AND conditions, we can always pick the *most restrictive* return code. For OR conditions, we must always pick the *least restrictive* return code. This JIRA is to review the FilterList.filterKeyValue() method to try and make it more optimal and to add a new unit test which verifies the correct behavior. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4373) HBaseAdmin.assign() does not use force flag
[ https://issues.apache.org/jira/browse/HBASE-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106760#comment-13106760 ] Jonathan Gray commented on HBASE-4373: -- Trying to understand this patch. So with the force flag removed, what is the default behavior? If the state is not OFFLINE and we try to assign somewhere else, do we force the node to OFFLINE always? HBaseAdmin.assign() does not use force flag --- Key: HBASE-4373 URL: https://issues.apache.org/jira/browse/HBASE-4373 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4373.patch, HBASE-4373_1.patch The HBaseAdmin.assign() {code} public void assign(final byte [] regionName, final boolean force) throws MasterNotRunningException, ZooKeeperConnectionException, IOException { getMaster().assign(regionName, force); } {code} In the HMaster we call {code} PairHRegionInfo, ServerName pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toString(regionName)); if (cpHost != null) { if (cpHost.preAssign(pair.getFirst(), force)) { return; } } assignRegion(pair.getFirst()); if (cpHost != null) { cpHost.postAssign(pair.getFirst(), force); } {code} The force flag is not getting used. May be we need to update the javadoc or do not provide the force flag as a parameter if we are not going to use it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4422) Move block cache parameters and references into single CacheConf class
Move block cache parameters and references into single CacheConf class -- Key: HBASE-4422 URL: https://issues.apache.org/jira/browse/HBASE-4422 Project: HBase Issue Type: Improvement Components: io Reporter: Jonathan Gray Assignee: Jonathan Gray Fix For: 0.92.0 From StoreFile down to HFile, we currently use a boolean argument for each of the various block cache configuration parameters that exist. The number of parameters is going to continue to increase as we look at compressed cache, delta encoding, and more specific L1/L2 configuration. Every new config currently requires changing many constructors because it introduces a new boolean. We should move everything into a single class so that modifications are much less disruptive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4410) FilterList.filterKeyValue can return suboptimal ReturnCodes
FilterList.filterKeyValue can return suboptimal ReturnCodes --- Key: HBASE-4410 URL: https://issues.apache.org/jira/browse/HBASE-4410 Project: HBase Issue Type: Improvement Components: filters Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 FilterList.filterKeyValue does not always return the most optimal ReturnCode in both the AND and OR conditions. For example, if you have F1 AND F2, F1 returns SKIP. It immediately returns the SKIP. However, if F2 would have returned NEXT_COL or NEXT_ROW or SEEK_NEXT_USING_HINT, we would actually be able to return the more optimal ReturnCode from F2. For AND conditions, we can always pick the *most restrictive* return code. For OR conditions, we must always pick the *least restrictive* return code. This JIRA is to review the FilterList.filterKeyValue() method to try and make it more optimal and to add a new unit test which verifies the correct behavior. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4410) FilterList.filterKeyValue can return suboptimal ReturnCodes
[ https://issues.apache.org/jira/browse/HBASE-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-4410: - Attachment: HBASE-4410-v1.patch Implements changes described in description and includes unit test. New test and existing tests are passing, kicking off full suite now. FilterList.filterKeyValue can return suboptimal ReturnCodes --- Key: HBASE-4410 URL: https://issues.apache.org/jira/browse/HBASE-4410 Project: HBase Issue Type: Improvement Components: filters Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4410-v1.patch FilterList.filterKeyValue does not always return the most optimal ReturnCode in both the AND and OR conditions. For example, if you have F1 AND F2, F1 returns SKIP. It immediately returns the SKIP. However, if F2 would have returned NEXT_COL or NEXT_ROW or SEEK_NEXT_USING_HINT, we would actually be able to return the more optimal ReturnCode from F2. For AND conditions, we can always pick the *most restrictive* return code. For OR conditions, we must always pick the *least restrictive* return code. This JIRA is to review the FilterList.filterKeyValue() method to try and make it more optimal and to add a new unit test which verifies the correct behavior. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4310) SlabCache metrics bugfix.
[ https://issues.apache.org/jira/browse/HBASE-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104949#comment-13104949 ] Jonathan Gray commented on HBASE-4310: -- Can someone explain the three commits on this JIRA? Is the final commit from a different JIRA? It has a different commit message name but is linked to this JIRA and there is nothing in CHANGES.txt and nothing here in the JIRA talking about the change? SlabCache metrics bugfix. - Key: HBASE-4310 URL: https://issues.apache.org/jira/browse/HBASE-4310 Project: HBase Issue Type: Sub-task Reporter: Li Pi Assignee: Li Pi Priority: Minor Fix For: 0.92.0 Attachments: metrics.txt, metrics.txt, metrics.txt, metricsv2.txt, metricsv2.txt, metricsv3.txt math error in metrics makes it display incorrect metrics. also no longer logs metrics of size 0 to save space. Also added second log for those things that are successfully cached. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4310) SlabCache metrics bugfix.
[ https://issues.apache.org/jira/browse/HBASE-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104954#comment-13104954 ] Jonathan Gray commented on HBASE-4310: -- I see two separate lines for this JIRA in CHANGES as well. Is this was prompted some of those discussions about multiple commits on a JIRA? We should at least amend the CHANGES and commit message that it's a follow-up if nothing else. SlabCache metrics bugfix. - Key: HBASE-4310 URL: https://issues.apache.org/jira/browse/HBASE-4310 Project: HBase Issue Type: Sub-task Reporter: Li Pi Assignee: Li Pi Priority: Minor Fix For: 0.92.0 Attachments: metrics.txt, metrics.txt, metrics.txt, metricsv2.txt, metricsv2.txt, metricsv3.txt math error in metrics makes it display incorrect metrics. also no longer logs metrics of size 0 to save space. Also added second log for those things that are successfully cached. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4320) Off Heap Cache never creates Slabs
[ https://issues.apache.org/jira/browse/HBASE-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103794#comment-13103794 ] Jonathan Gray commented on HBASE-4320: -- Looks like this was committed with HBASE-4027 in the message and not HBASE-4320. Guess there's no way to retroactively fix that but in case anyone comes here looking for the revision info it's linked over in the other jira. Off Heap Cache never creates Slabs -- Key: HBASE-4320 URL: https://issues.apache.org/jira/browse/HBASE-4320 Project: HBase Issue Type: Sub-task Reporter: Li Pi Assignee: Li Pi Fix For: 0.92.0 Attachments: confnotloading.txt On testing, the configuration file is never loaded by the off heap cache. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4394) Add support for seeking hints to FilterList
Add support for seeking hints to FilterList --- Key: HBASE-4394 URL: https://issues.apache.org/jira/browse/HBASE-4394 Project: HBase Issue Type: Improvement Components: filters Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 Currently FilterList's do not support getNextKeyHint() even if the underlying filters are giving hints. We should add support for FilterList to pass these through. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList
[ https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-4394: - Attachment: HBASE-4394-v1.patch Adds support for seek hints to FilterList and adds a unit test to TestFilterList that ensures it does the right thing across the different variations of inputs to a filterlist. Add support for seeking hints to FilterList --- Key: HBASE-4394 URL: https://issues.apache.org/jira/browse/HBASE-4394 Project: HBase Issue Type: Improvement Components: filters Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4394-v1.patch Currently FilterList's do not support getNextKeyHint() even if the underlying filters are giving hints. We should add support for FilterList to pass these through. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList
[ https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-4394: - Status: Patch Available (was: Open) Add support for seeking hints to FilterList --- Key: HBASE-4394 URL: https://issues.apache.org/jira/browse/HBASE-4394 Project: HBase Issue Type: Improvement Components: filters Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4394-v1.patch Currently FilterList's do not support getNextKeyHint() even if the underlying filters are giving hints. We should add support for FilterList to pass these through. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList
[ https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-4394: - Attachment: HBASE-4394-trunk-v2.patch Rebased for trunk Add support for seeking hints to FilterList --- Key: HBASE-4394 URL: https://issues.apache.org/jira/browse/HBASE-4394 Project: HBase Issue Type: Improvement Components: filters Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4394-trunk-v2.patch, HBASE-4394-v1.patch Currently FilterList's do not support getNextKeyHint() even if the underlying filters are giving hints. We should add support for FilterList to pass these through. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4239) HBASE-4012 introduced duplicate variable Bytes.LONG_BYTES
HBASE-4012 introduced duplicate variable Bytes.LONG_BYTES - Key: HBASE-4239 URL: https://issues.apache.org/jira/browse/HBASE-4239 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Jonathan Gray Assignee: Ted Yu Priority: Trivial Fix For: 0.92.0 HBASE-4012 introduced Bytes.LONG_SIZE. This is a duplicate of Bytes.SIZEOF_LONG. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4239) HBASE-4012 introduced duplicate variable Bytes.LONG_BYTES
[ https://issues.apache.org/jira/browse/HBASE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089149#comment-13089149 ] Jonathan Gray commented on HBASE-4239: -- +1 HBASE-4012 introduced duplicate variable Bytes.LONG_BYTES - Key: HBASE-4239 URL: https://issues.apache.org/jira/browse/HBASE-4239 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Jonathan Gray Assignee: Ted Yu Priority: Trivial Fix For: 0.92.0 Attachments: 4239.txt HBASE-4012 introduced Bytes.LONG_SIZE. This is a duplicate of Bytes.SIZEOF_LONG. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086708#comment-13086708 ] Jonathan Gray commented on HBASE-4218: -- bq. in the mean time there will be places it has to cut a full KeyValue by copying bytes Agreed. There's some other work going on around slab allocators and object reuse that could be paired with this to ameliorate some of that overhead. Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Reporter: Jacek Migdal Labels: compression A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy
[ https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084333#comment-13084333 ] Jonathan Gray commented on HBASE-4015: -- Sorry I'm a little late to this discussion but I like the idea of not adding a new state. Instead, we can just pass the znode version number in the RPC to the regionservers. Or encode the servername in the znode. Refactor the TimeoutMonitor to make it less racy Key: HBASE-4015 URL: https://issues.apache.org/jira/browse/HBASE-4015 Project: HBase Issue Type: Sub-task Affects Versions: 0.90.3 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.92.0 Attachments: HBASE-4015_1_trunk.patch, Timeoutmonitor with state diagrams.pdf The current implementation of the TimeoutMonitor acts like a race condition generator, mostly making things worse rather than better. It does it's own thing for a while without caring for what's happening in the rest of the master. The first thing that needs to happen is that the regions should not be processed in one big batch, because that sometimes can take minutes to process (meanwhile a region that timed out opening might have opened, then what happens is it will be reassigned by the TimeoutMonitor generating the never ending PENDING_OPEN situation). Those operations should also be done more atomically, although I'm not sure how to do it in a scalable way in this case. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3899) enhance HBase RPC to support free-ing up server handler threads even if response is not ready
[ https://issues.apache.org/jira/browse/HBASE-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071861#comment-13071861 ] Jonathan Gray commented on HBASE-3899: -- Test passes for me on trunk. enhance HBase RPC to support free-ing up server handler threads even if response is not ready - Key: HBASE-3899 URL: https://issues.apache.org/jira/browse/HBASE-3899 Project: HBase Issue Type: Improvement Components: ipc Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.92.0 Attachments: HBASE-3899-2.patch, HBASE-3899.patch, asyncRpc.txt, asyncRpc.txt In the current implementation, the server handler thread picks up an item from the incoming callqueue, processes it and then wraps the response as a Writable and sends it back to the IPC server module. This wastes thread-resources when the thread is blocked for disk IO (transaction logging, read into block cache, etc). It would be nice if we can make the RPC Server Handler threads pick up a call from the IPC queue, hand it over to the application (e.g. HRegion), the application can queue it to be processed asynchronously and send a response back to the IPC server module saying that the response is not ready. The RPC Server Handler thread is now ready to pick up another request from the incoming callqueue. When the queued call is processed by the application, it indicates to the IPC module that the response is now ready to be sent back to the client. The RPC client continues to experience the same behaviour as before. A RPC client is synchronous and blocks till the response arrives. This RPC enhancement allows us to do very powerful things with the RegionServer. In future, we can make enhance the RegionServer's threading model to a message-passing model for better performance. We will not be limited by the number of threads in the RegionServer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4060) Making region assignment more robust
[ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070218#comment-13070218 ] Jonathan Gray commented on HBASE-4060: -- The primary difference between the suggestion by Eran and what is currently implemented is that the per-region znodes are never deleted in Eran's design. The existing implementation uses znodes to track regions that are currently in transition. An assigned and open region doesn't have a znode (nor would an unassigned and closed region of a disabled table). Check out ZKAssign and AssignmentManager for details on how that works. Making region assignment more robust Key: HBASE-4060 URL: https://issues.apache.org/jira/browse/HBASE-4060 Project: HBase Issue Type: Bug Reporter: Ted Yu Fix For: 0.92.0 From Eran Kutner: My concern is that the region allocation process seems to rely too much on timing considerations and doesn't seem to take enough measures to guarantee conflicts do not occur. I understand that in a distributed environment, when you don't get a timely response from a remote machine you can't know for sure if it did or did not receive the request, however there are things that can be done to mitigate this and reduce the conflict time significantly. For example, when I run dbck it knows that some regions are multiply assigned, the master could do the same and try to resolve the conflict. Another approach would be to handle late responses, even if the response from the remote machine arrives after it was assumed to be dead the master should have enough information to know it had created a conflict by assigning the region to another server. An even better solution, I think, is for the RS to periodically test that it is indeed the rightful owner of every region it holds and relinquish control over the region if it's not. Obviously a state where two RSs hold the same region is pathological and can lead to data loss, as demonstrated in my case. The system should be able to actively protect itself against such a scenario. It probably doesn't need saying but there is really nothing worse for a data storage system than data loss. In my case the problem didn't happen in the initial phase but after disabling and enabling a table with about 12K regions. For more background information, see 'Errors after major compaction' discussion on u...@hbase.apache.org -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
[ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069204#comment-13069204 ] Jonathan Gray commented on HBASE-3417: -- It does support COW but if it doesn't include changes to how files are named, it will still need this fix. Will follow-up with Mikhail. CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme -- Key: HBASE-3417 URL: https://issues.apache.org/jira/browse/HBASE-3417 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.92.0 Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch Currently the block names used in the block cache are built using the filesystem path. However, for cache on write, the path is a temporary output file. The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough. Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4084) Auto-Split runs only if there are many store files per region
[ https://issues.apache.org/jira/browse/HBASE-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13063476#comment-13063476 ] Jonathan Gray commented on HBASE-4084: -- I thought splits were triggered following a compaction not a flush? Auto-Split runs only if there are many store files per region - Key: HBASE-4084 URL: https://issues.apache.org/jira/browse/HBASE-4084 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: John Heitmann Currently, MemStoreFlusher.flushRegion() is the driver of auto-splitting. It only decides to auto-split a region if there are too many store files per region. Since it's not guaranteed that the number of store files per region always grows above the too many count before compaction reduces the count, there is no guarantee that auto-split will ever happen. In my test setup, compaction seems to always win the race and I haven't noticed auto-splitting happen once. It appears that the intention is to have split be mutually exclusive with compaction, and to have flushing be mutually exclusive with regions badly in need of compaction, but that resulted in auto-splitting being nested in a too-restrictive spot. I'm not sure what the right fix is. Having one method that is essentially requestSplitOrCompact would probably help readability, and could be the ultimate solution if it replaces other calls of requestCompaction(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4056) Support for using faster storage for write-ahead log
[ https://issues.apache.org/jira/browse/HBASE-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061786#comment-13061786 ] Jonathan Gray commented on HBASE-4056: -- Thanks for opening this JIRA. What do you see as the primary benefit of using flash for the WAL? I've seen some improvement in sequential write throughput, but not drastically different. It seems to me that a significant benefit of using flash is the fast random read access, and there are no random reads on the WAL. One idea that has floated around is to do something like cache-on-write to copy recently written files onto flash (in addition to HDFS) to allow for fast random read access. Or use flash as some kind of extension to the block cache. But regardless, making all of this stuff configurable and supporting more diverse setups is a good thing in general. Some experiments and benchmarks around this would be awesome. Good stuff. Support for using faster storage for write-ahead log Key: HBASE-4056 URL: https://issues.apache.org/jira/browse/HBASE-4056 Project: HBase Issue Type: New Feature Reporter: Praveen Kumar Priority: Minor Labels: features On clusters with heterogeneous storage components like hard drives and flash memory, it could be beneficial to use flash memory for write-ahead log. This can be accomplished by using client side mount table support (HADOOP-7257) that is offered by HDFS federation (HDFS-1052) feature. One can define two HDFS namespaces (faster and slower), and configure HBase to use faster storage namespace for storing WAL. This is an abstract task that captures the idea. More brainstorming and subtasks identification to follow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061102#comment-13061102 ] Jonathan Gray commented on HBASE-4071: -- I like this idea. It's somewhat related to an idea for a TTKAV (TimeToKeepAllValues) parameter that would allow a point-in-time SnapshotScanner. See HBASE-2376 Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4060) Making region assignment more robust
[ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060183#comment-13060183 ] Jonathan Gray commented on HBASE-4060: -- Andrew, we are already doing something like what you describe. It seems the issue is what Ted describes in #2 but it's not clear to me how this bug is being triggered. In TimeoutMonitor, we attempt to do an atomic change of state from OPENING to OFFLINE. If this fails, we don't do anything. If it succeeds, we attempt to do a reassign. In OpenRegionHandler (in the RS), we attempt an atomic change of state from OPENING to OPENED. If this fails, we roll back our open. If it succeeds, we are opened and the node is at OPENED. In OpenedRegionHandler (in the master), the first thing we do is delete a node but only if in OPENED state. If the TimeoutMonitor had done anything, it would have switched the state to OFFLINE. What am I missing? Making region assignment more robust Key: HBASE-4060 URL: https://issues.apache.org/jira/browse/HBASE-4060 Project: HBase Issue Type: Bug Reporter: Ted Yu Fix For: 0.92.0 From Eran Kutner: My concern is that the region allocation process seems to rely too much on timing considerations and doesn't seem to take enough measures to guarantee conflicts do not occur. I understand that in a distributed environment, when you don't get a timely response from a remote machine you can't know for sure if it did or did not receive the request, however there are things that can be done to mitigate this and reduce the conflict time significantly. For example, when I run dbck it knows that some regions are multiply assigned, the master could do the same and try to resolve the conflict. Another approach would be to handle late responses, even if the response from the remote machine arrives after it was assumed to be dead the master should have enough information to know it had created a conflict by assigning the region to another server. An even better solution, I think, is for the RS to periodically test that it is indeed the rightful owner of every region it holds and relinquish control over the region if it's not. Obviously a state where two RSs hold the same region is pathological and can lead to data loss, as demonstrated in my case. The system should be able to actively protect itself against such a scenario. It probably doesn't need saying but there is really nothing worse for a data storage system than data loss. In my case the problem didn't happen in the initial phase but after disabling and enabling a table with about 12K regions. For more background information, see 'Errors after major compaction' discussion on u...@hbase.apache.org -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache
[ https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054639#comment-13054639 ] Jonathan Gray commented on HBASE-4027: -- In the new HFile v2 over in HBASE-3857 the block cache interface changes from ByteBuffer to HeapSize. So you can now put anything you want into the cache that implements HeapSize (there is a new HFileBlock that is used in HFile v2). One big question is whether you're going to make copies out of the direct byte buffers on each read of that block, or if you're going to change KeyValue to use the ByteBuffer interface (or some other) instead of the byte[] directly. With a DBB you can't get access to an underlying byte[]. Enable direct byte buffers LruBlockCache Key: HBASE-4027 URL: https://issues.apache.org/jira/browse/HBASE-4027 Project: HBase Issue Type: Improvement Reporter: Jason Rutherglen Priority: Minor Java offers the creation of direct byte buffers which are allocated outside of the heap. They need to be manually free'd, which can be accomplished using an documented {{clean}} method. The feature will be optional. After implementing, we can benchmark for differences in speed and garbage collection observances. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4018) Attach memcached as secondary block cache to regionserver
[ https://issues.apache.org/jira/browse/HBASE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054172#comment-13054172 ] Jonathan Gray commented on HBASE-4018: -- bq. in many cases the CPU overhead dwarfs (or should) the extra RAM consumption from uncompressing into heap space. This is not necessarily the case. Many applications see 4-5X compression ratio and it means being able to increase your cache capacity by that much. Some applications can also be CPU bound, or the might be IO bound, or they might actually be IO bound because they are RAM bound (can't fit working set in memory). In general, it's hard to generalize here I think. bq. Perhaps it's easily offset with a less intensive comp algorithm. That's one of the major motivations for an hbase-specific prefix compression algorithm Attach memcached as secondary block cache to regionserver - Key: HBASE-4018 URL: https://issues.apache.org/jira/browse/HBASE-4018 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Li Pi Assignee: Li Pi Currently, block caches are limited by heap size, which is limited by garbage collection times in Java. We can get around this by using memcached w/JNI as a secondary block cache. This should be faster than the linux file system's caching, and allow us to very quickly gain access to a high quality slab allocated cache. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4017) BlockCache interface should be truly modular
[ https://issues.apache.org/jira/browse/HBASE-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053439#comment-13053439 ] Jonathan Gray commented on HBASE-4017: -- +1 FYI, in the upcoming HFile v2 stuff, there is a change in the block cache interface so that instead of ByteBuffer it takes HeapSize (so basically, any heap-size-aware structure). BlockCache interface should be truly modular Key: HBASE-4017 URL: https://issues.apache.org/jira/browse/HBASE-4017 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Li Pi Currently, the if the BlockCache that used isn't an LruBlockCache, somewhere in metrics will try to cast it to an LruBlockCache and cause an exception. The code should be modular enough to allow for the use of different block caches without throwing an exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4018) Attach memcached as secondary block cache to regionserver
[ https://issues.apache.org/jira/browse/HBASE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053446#comment-13053446 ] Jonathan Gray commented on HBASE-4018: -- The perf gain over the FS caching would be less-so if using short-circuited local reads. But anything that bypasses the DataNode is great for random read perf. Even still, making a copy out of in-process memory should be faster than linux fs caching. Attach memcached as secondary block cache to regionserver - Key: HBASE-4018 URL: https://issues.apache.org/jira/browse/HBASE-4018 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Li Pi Assignee: Li Pi Currently, block caches are limited by heap size, which is limited by garbage collection times in Java. We can get around this by using memcached w/JNI as a secondary block cache. This should be faster than the linux file system's caching, and allow us to very quickly gain access to a high quality slab allocated cache. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4018) Attach memcached as secondary block cache to regionserver
[ https://issues.apache.org/jira/browse/HBASE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053486#comment-13053486 ] Jonathan Gray commented on HBASE-4018: -- bq. Optimal solution would be building a slab allocated block cache within java. Use reference counting for a zero copy solution. This is difficult to implement and debug though. I'm working on this. I think implementing both directions is worthwhile and we can run good comparisons (including against linux fs cache + local datanodes). bq. It would seem best to move in the direction of local HDFS file access and allow plugging in the block cache as a point of comparison / legacy. I think it's best to move in all directions and do comparisons. I've already seen performance differences between fs cache and the actual hbase block cache. There's also compressed vs. decompressed (fs cache will always be compressed) Attach memcached as secondary block cache to regionserver - Key: HBASE-4018 URL: https://issues.apache.org/jira/browse/HBASE-4018 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Li Pi Assignee: Li Pi Currently, block caches are limited by heap size, which is limited by garbage collection times in Java. We can get around this by using memcached w/JNI as a secondary block cache. This should be faster than the linux file system's caching, and allow us to very quickly gain access to a high quality slab allocated cache. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3340) Eventually Consistent Secondary Indexing via Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052307#comment-13052307 ] Jonathan Gray commented on HBASE-3340: -- I'm not actively working on this but it's also a potential intern project at fb. A code drop on GitHub would be great and maybe we can work together. There are quite a few alternative directions to go for indexing. And an endless amount of development that could be done around APIs, schemas, filters, etc. So the more the merrier. The basic design I was thinking would be something similar to google percolator or what the Lily guys are doing (http://www.lilyproject.org/lily/about/playground/hbaserowlog/version/1) Eventually Consistent Secondary Indexing via Coprocessors - Key: HBASE-3340 URL: https://issues.apache.org/jira/browse/HBASE-3340 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Jonathan Gray Assignee: Jonathan Gray Secondary indexing support via coprocessors with an eventual consistency guarantee. Design to come. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3945) Load balancer shouldn't move the same region in two consective balancing actions
[ https://issues.apache.org/jira/browse/HBASE-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044017#comment-13044017 ] Jonathan Gray commented on HBASE-3945: -- I worry about this approach of more and more knobs, especially when they don't directly address what a good/bad load balance really is. If a region gets moved in two consecutive balancing actions, then something is wrong with the balancer in the first place. While I agree in principle that regions moving multiple times and quickly is not desirable, this will be a common outcome if the balancing algorithm isn't already taking into account metrics over time (rather than short snapshots). If we're using load but then adding all these limits/controls, it's hard to ever understand the behavior of the balancer. Load balancer shouldn't move the same region in two consective balancing actions Key: HBASE-3945 URL: https://issues.apache.org/jira/browse/HBASE-3945 Project: HBase Issue Type: Improvement Reporter: Ted Yu Keeping a region on the same region server would give good stability for active scanners. We shouldn't reassign the same region in two successive calls to balanceCluster(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-3947) SplitLog in HMaster spend long time, move it to regionserver
[ https://issues.apache.org/jira/browse/HBASE-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray resolved HBASE-3947. -- Resolution: Duplicate This was implemented over in HBASE-1364 and committed into trunk. SplitLog in HMaster spend long time, move it to regionserver Key: HBASE-3947 URL: https://issues.apache.org/jira/browse/HBASE-3947 Project: HBase Issue Type: Improvement Components: master, regionserver, zookeeper Reporter: mingjian Fix For: 0.90.4 One of our 100 nodes cluster crashed by namenode crash. We restarted and found it spend about two and a half hours to split hlogs. After crashed, there are about 3,500 hfiles in /hbase/.logs/. Split 1 of them need about 2~3 seconds. SplitLog works in a single thread of HMaster. Why not move it to regionservers? And HMaster only creates split plans and notifies regionserver through zookeeper. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3732) New configuration option for client-side compression
[ https://issues.apache.org/jira/browse/HBASE-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042595#comment-13042595 ] Jonathan Gray commented on HBASE-3732: -- I agree that value compression is easily done at the application level. In cases where you have very large values, compressing that data is something you should always be thinking about. Published or contributed code samples could go a long way. Are there things we could add in Put/Get to make this kind of stuff easily pluggable? If it can be integrated simply, then this might be okay, but it should probably be part of a larger conversation about compression. And anything that touches KV needs to be thought through. I think there could be some substantial savings in hbase-specific prefix or row/family/qualifier compression, both on-disk and in-memory. One idea there would require some complicating of KeyValue and its comparator, or a simpler solution would require short-term memory allocations to reconstitute KVs as they make their way through the KVHeap/KVScanner. I've also done some work on supporting a two-level compressed/uncompressed block cache patch (with lzo). I'm waiting to finish until HBASE-3857 goes in as it adds some things that make life easier in the HFile code. New configuration option for client-side compression Key: HBASE-3732 URL: https://issues.apache.org/jira/browse/HBASE-3732 Project: HBase Issue Type: New Feature Reporter: Jean-Daniel Cryans Fix For: 0.92.0 Attachments: compressed_streams.jar We have a case here where we have to store very fat cells (arrays of integers) which can amount into the hundreds of KBs that we need to read often, concurrently, and possibly keep in cache. Compressing the values on the client using java.util.zip's Deflater before sending them to HBase proved to be in our case almost an order of magnitude faster. There reasons are evident: less data sent to hbase, memstore contains compressed data, block cache contains compressed data too, etc. I was thinking that it might be something useful to add to a family schema, so that Put/Result do the conversion for you. The actual compression algo should also be configurable. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3725) HBase increments from old value after delete and write to disk
[ https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-3725: - Attachment: HBASE-3725-v3.patch This fixes the problem in the only simple way I could think of. A new configuration option is added hbase.hregion.increment.supportdeletes which defaults to true (because it is required for correctness). When this option is true, then when the scan against StoreFiles is done, it will also include the MemStore. This should ensure correctness for cases where delete markers are present in the MemStore that need to apply to KVs in the StoreFiles. I made this a configuration option because it makes increment operations less optimal, so for increment workloads that do not need to support deletes, they can keep the option turned off and avoid the double scan of the MemStore. A potential optimal and correct solution to this could be to use the old Get delete tracker which would retain delete information across files (for in-order file processing rather than one mega merge). Some work is going into re-integrating those, so if they do make it back in the HBase, we could utilize them here. This should suffice for now. HBase increments from old value after delete and write to disk -- Key: HBASE-3725 URL: https://issues.apache.org/jira/browse/HBASE-3725 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.90.1 Reporter: Nathaniel Cook Attachments: HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch Deleted row values are sometimes used for starting points on new increments. To reproduce: Create a row r. Set column x to some default value. Force hbase to write that value to the file system (such as restarting the cluster). Delete the row. Call table.incrementColumnValue with some_value Get the row. The returned value in the column was incremented from the old value before the row was deleted instead of being initialized to some_value. Code to reproduce: {code} import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.Delete; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTableInterface; import org.apache.hadoop.hbase.client.HTablePool; import org.apache.hadoop.hbase.client.Increment; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; public class HBaseTestIncrement { static String tableName = testIncrement; static byte[] infoCF = Bytes.toBytes(info); static byte[] rowKey = Bytes.toBytes(test-rowKey); static byte[] newInc = Bytes.toBytes(new); static byte[] oldInc = Bytes.toBytes(old); /** * This code reproduces a bug with increment column values in hbase * Usage: First run part one by passing '1' as the first arg *Then restart the hbase cluster so it writes everything to disk *Run part two by passing '2' as the first arg * * This will result in the old deleted data being found and used for the increment calls * * @param args * @throws IOException */ public static void main(String[] args) throws IOException { if(1.equals(args[0])) partOne(); if(2.equals(args[0])) partTwo(); if (both.equals(args[0])) { partOne(); partTwo(); } } /** * Creates a table and increments a column value 10 times by 10 each time. * Results in a value of 100 for the column * * @throws IOException */ static void partOne()throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf); HTableDescriptor tableDesc = new HTableDescriptor(tableName); tableDesc.addFamily(new HColumnDescriptor(infoCF)); if(admin.tableExists(tableName)) { admin.disableTable(tableName); admin.deleteTable(tableName); } admin.createTable(tableDesc); HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE); HTableInterface table = pool.getTable(Bytes.toBytes(tableName)); //Increment unitialized column for (int j = 0; j 10; j++)
[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs
[ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021160#comment-13021160 ] Jonathan Gray commented on HBASE-1364: -- Great work Prakash! [performance] Distributed splitting of regionserver commit logs --- Key: HBASE-1364 URL: https://issues.apache.org/jira/browse/HBASE-1364 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: stack Assignee: Prakash Khemani Priority: Critical Fix For: 0.92.0 Attachments: 1364-v5.txt, HBASE-1364.patch, org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt Time Spent: 8h Remaining Estimate: 0h HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. (Below is from HBASE-1008) In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2256) Delete row, followed quickly to put of the same row will sometimes fail.
[ https://issues.apache.org/jira/browse/HBASE-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017148#comment-13017148 ] Jonathan Gray commented on HBASE-2256: -- I think this would be a hacky non-solution, regardless of whether it's epoch nanos or not. Delete row, followed quickly to put of the same row will sometimes fail. Key: HBASE-2256 URL: https://issues.apache.org/jira/browse/HBASE-2256 Project: HBase Issue Type: Bug Affects Versions: 0.20.3 Reporter: Clint Morgan Attachments: hbase-2256.patch Doing a Delete of a whole row, followed immediately by a put to that row will sometimes miss a cell. Attached is a test to provoke the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3729) Get cells via shell with a time range predicate
[ https://issues.apache.org/jira/browse/HBASE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016245#comment-13016245 ] Jonathan Gray commented on HBASE-3729: -- I think the default behavior of the shell should be the default behavior of the client, which is 1 version unless specified otherwise. Specifying a time range and wanting the most recent from within that range is a valid and somewhat common use case. Get cells via shell with a time range predicate --- Key: HBASE-3729 URL: https://issues.apache.org/jira/browse/HBASE-3729 Project: HBase Issue Type: New Feature Components: shell Reporter: Eric Charles Assignee: Ted Yu Attachments: 3729-v2.txt, 3729-v3.txt, 3729.txt HBase shell allows to specify a timestamp to get a value - get 't1', 'r1', {COLUMN = 'c1', TIMESTAMP = ts1} If you don't give the exact timestamp, you get nothing... so it's difficult to get the cell previous versions. It would be fine to have a time range predicate based get. The shell syntax could be (depending on technical feasibility) - get 't1', 'r1', {COLUMN = 'c1', TIMERANGE = (start_timestamp, end_timestamp)} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3729) Get cells via shell with a time range predicate
[ https://issues.apache.org/jira/browse/HBASE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016248#comment-13016248 ] Jonathan Gray commented on HBASE-3729: -- HTable (Get/Scan) default is 1 version not 3 versions. I think you are thinking of the HColumnDescriptor default. Get cells via shell with a time range predicate --- Key: HBASE-3729 URL: https://issues.apache.org/jira/browse/HBASE-3729 Project: HBase Issue Type: New Feature Components: shell Reporter: Eric Charles Assignee: Ted Yu Attachments: 3729-v2.txt, 3729-v3.txt, 3729-v4.txt, 3729.txt HBase shell allows to specify a timestamp to get a value - get 't1', 'r1', {COLUMN = 'c1', TIMESTAMP = ts1} If you don't give the exact timestamp, you get nothing... so it's difficult to get the cell previous versions. It would be fine to have a time range predicate based get. The shell syntax could be (depending on technical feasibility) - get 't1', 'r1', {COLUMN = 'c1', TIMERANGE = (start_timestamp, end_timestamp)} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk
[ https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015683#comment-13015683 ] Jonathan Gray commented on HBASE-3725: -- Hey Nathaniel. Thanks for posting the unit test! I will take a look at this sometime this week and try to get a fix out for it. HBase increments from old value after delete and write to disk -- Key: HBASE-3725 URL: https://issues.apache.org/jira/browse/HBASE-3725 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.90.1 Reporter: Nathaniel Cook Attachments: HBASE-3725.patch Deleted row values are sometimes used for starting points on new increments. To reproduce: Create a row r. Set column x to some default value. Force hbase to write that value to the file system (such as restarting the cluster). Delete the row. Call table.incrementColumnValue with some_value Get the row. The returned value in the column was incremented from the old value before the row was deleted instead of being initialized to some_value. Code to reproduce: {code} import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.Delete; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTableInterface; import org.apache.hadoop.hbase.client.HTablePool; import org.apache.hadoop.hbase.client.Increment; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; public class HBaseTestIncrement { static String tableName = testIncrement; static byte[] infoCF = Bytes.toBytes(info); static byte[] rowKey = Bytes.toBytes(test-rowKey); static byte[] newInc = Bytes.toBytes(new); static byte[] oldInc = Bytes.toBytes(old); /** * This code reproduces a bug with increment column values in hbase * Usage: First run part one by passing '1' as the first arg *Then restart the hbase cluster so it writes everything to disk *Run part two by passing '2' as the first arg * * This will result in the old deleted data being found and used for the increment calls * * @param args * @throws IOException */ public static void main(String[] args) throws IOException { if(1.equals(args[0])) partOne(); if(2.equals(args[0])) partTwo(); if (both.equals(args[0])) { partOne(); partTwo(); } } /** * Creates a table and increments a column value 10 times by 10 each time. * Results in a value of 100 for the column * * @throws IOException */ static void partOne()throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf); HTableDescriptor tableDesc = new HTableDescriptor(tableName); tableDesc.addFamily(new HColumnDescriptor(infoCF)); if(admin.tableExists(tableName)) { admin.disableTable(tableName); admin.deleteTable(tableName); } admin.createTable(tableDesc); HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE); HTableInterface table = pool.getTable(Bytes.toBytes(tableName)); //Increment unitialized column for (int j = 0; j 10; j++) { table.incrementColumnValue(rowKey, infoCF, oldInc, (long)10); Increment inc = new Increment(rowKey); inc.addColumn(infoCF, newInc, (long)10); table.increment(inc); } Get get = new Get(rowKey); Result r = table.get(get); System.out.println(initial values: new + Bytes.toLong(r.getValue(infoCF, newInc)) + old + Bytes.toLong(r.getValue(infoCF, oldInc))); } /** * First deletes the data then increments the column 10 times by 1 each time * * Should result in a value of 10 but it doesn't, it results in a values of 110 * * @throws IOException */ static void partTwo()throws IOException { Configuration conf = HBaseConfiguration.create(); HTablePool pool = new
[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match
[ https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014748#comment-13014748 ] Jonathan Gray commented on HBASE-3562: -- Thanks for looking into this Evert. This is definitely some tricky stuff. A few comments on your patch... - Our convention in conditionals is to put the variable first. I find it a little tricky to read the code when the constant is first. For example: {code} if (MatchCode.INCLUDE == mc) {code} should be {code} if (mc == MatchCode.INCLUDE) {code} (And all the other places where you have this type of logic) - The unit test {{TestColumnMatchAndFilterOrder}} is clever how you check correctness, but I think it would be good to actually do a read query and verify the results for a few different combinations of the query to prove correctness of the overall algorithm. Other changes to SQM down the road might change more behavior / order of operations, so this test may no longer apply or give full coverage for correctness. Having some tests which don't rely on the precise server-side interactions but rather confirm the end results will be more applicable as we move forward. - You have some lines that are 80 characters, especially in some of the javadoc. Just wrap that so all lines are = 80 chars. - There was a comment in SQM that described why the filter was checked first. Can you write some inline comments to describe how this works now? There are a couple lines at the end but it will be useful to have some explanation on why this has changed and what the behavior is now. - Is there any particular reason that you had includeLatestColumn take timestamp as a parameter? The timestamp is passed in the check call, and we could just hang on to that. It just feels a little strange to me since you should never pass a different timestamp, and the tracker can know which was the latest column. Overall this is really solid! Great work Evert! ValueFilter is being evaluated before performing the column match - Key: HBASE-3562 URL: https://issues.apache.org/jira/browse/HBASE-3562 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.90.0 Reporter: Evert Arckens Attachments: HBASE-3562.patch When performing a Get operation where a both a column is specified and a ValueFilter, the ValueFilter is evaluated before making the column match as is indicated in the javadoc of Get.setFilter() : {@link Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column match, deletes and max versions have been run. The is shown in the little test below, which uses a TestComparator extending a WritableByteArrayComparable. public void testFilter() throws Exception { byte[] cf = Bytes.toBytes(cf); byte[] row = Bytes.toBytes(row); byte[] col1 = Bytes.toBytes(col1); byte[] col2 = Bytes.toBytes(col2); Put put = new Put(row); put.add(cf, col1, new byte[]{(byte)1}); put.add(cf, col2, new byte[]{(byte)2}); table.put(put); Get get = new Get(row); get.addColumn(cf, col2); // We only want to retrieve col2 TestComparator testComparator = new TestComparator(); Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator); get.setFilter(filter); Result result = table.get(get); } public class TestComparator extends WritableByteArrayComparable { /** * Nullary constructor, for Writable */ public TestComparator() { super(); } @Override public int compareTo(byte[] theirValue) { if (theirValue[0] == (byte)1) { // If the column match was done before evaluating the filter, we should never get here. throw new RuntimeException(I only expect (byte)2 in col2, not (byte)1 from col1); } if (theirValue[0] == (byte)2) { return 0; } else return 1; } } When only one column should be retrieved, this can be worked around by using a SingleColumnValueFilter instead of the ValueFilter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match
[ https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011248#comment-13011248 ] Jonathan Gray commented on HBASE-3562: -- The counter in ColumnTracker is responsible for tracking setMaxVersions. You may have queried for only the latest version, so once the ColumnTracker sees a given column, it will reject subsequent version of that columns. Currently there's no way for the CT to know that subsequent filters actually prevented it from being returned so it should not be included in the count of returned versions. We would need to introduce something like {{skippedPreviousKeyValue}} that could be sent back to the CT so it could undo the previous count. ValueFilter is being evaluated before performing the column match - Key: HBASE-3562 URL: https://issues.apache.org/jira/browse/HBASE-3562 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.90.0 Reporter: Evert Arckens When performing a Get operation where a both a column is specified and a ValueFilter, the ValueFilter is evaluated before making the column match as is indicated in the javadoc of Get.setFilter() : {@link Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column match, deletes and max versions have been run. The is shown in the little test below, which uses a TestComparator extending a WritableByteArrayComparable. public void testFilter() throws Exception { byte[] cf = Bytes.toBytes(cf); byte[] row = Bytes.toBytes(row); byte[] col1 = Bytes.toBytes(col1); byte[] col2 = Bytes.toBytes(col2); Put put = new Put(row); put.add(cf, col1, new byte[]{(byte)1}); put.add(cf, col2, new byte[]{(byte)2}); table.put(put); Get get = new Get(row); get.addColumn(cf, col2); // We only want to retrieve col2 TestComparator testComparator = new TestComparator(); Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator); get.setFilter(filter); Result result = table.get(get); } public class TestComparator extends WritableByteArrayComparable { /** * Nullary constructor, for Writable */ public TestComparator() { super(); } @Override public int compareTo(byte[] theirValue) { if (theirValue[0] == (byte)1) { // If the column match was done before evaluating the filter, we should never get here. throw new RuntimeException(I only expect (byte)2 in col2, not (byte)1 from col1); } if (theirValue[0] == (byte)2) { return 0; } else return 1; } } When only one column should be retrieved, this can be worked around by using a SingleColumnValueFilter instead of the ValueFilter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011452#comment-13011452 ] Jonathan Gray commented on HBASE-3694: -- Do we really want to put things like this into RegionServerMetrics? That class is a mess and is currently only used for the publishing of our metrics (not used for internal state tracking). And we should avoid the hadoop Metrics* classes like the plague... heavily synchronized and generally confusing. My vote would be to add a new class, maybe {{RegionServerHeapManager}} or something like that... might be a good opportunity to cleanup and centralize the code related to that. But could just hold this one AtomicLong for now. Agree that adding a new interface method just for the long is not ideal since it buys us nothing down the road. Better to add something new that we can use later. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3669) Region in PENDING_OPEN keeps being bounced between RS and master
[ https://issues.apache.org/jira/browse/HBASE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010796#comment-13010796 ] Jonathan Gray commented on HBASE-3669: -- When I've seen this happen, there has been another RS cutting in and transferring to OPENING. As someone in the other JIRA indicates, this kind of thing can happen when one of the RS is unable to open the region because it doesn't have the proper compression lib or some DFS error. If the master successfully transfers to OFFLINE and the RS sees it as OPENING, then almost certainly there's another RS that has gotten in the way. The contents of the RIT znode actually contains serverName, so we should probably add additional debug information when the state transfer fails. (Unable to go from OFFLINE to OPENING because already in OPENING by server #serverName#) Region in PENDING_OPEN keeps being bounced between RS and master Key: HBASE-3669 URL: https://issues.apache.org/jira/browse/HBASE-3669 Project: HBase Issue Type: Bug Affects Versions: 0.90.1 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.90.2 After going crazy killing region servers after HBASE-3668, most of the cluster recovered except for 3 regions that kept being refused by the region servers. One the master I would see: {code} 2011-03-17 22:23:14,828 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. state=PENDING_OPEN, ts=1300400554826 2011-03-17 22:23:14,828 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. 2011-03-17 22:23:14,828 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. state=PENDING_OPEN, ts=1300400554826 2011-03-17 22:23:14,828 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. so generated a random one; hri=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21., src=, dest=sv2borg171,60020,1300399357135; 17 (online=17, exclude=null) available servers 2011-03-17 22:23:14,828 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. to sv2borg171,60020,1300399357135 {code} Then on the region server: {code} 2011-03-17 22:23:14,829 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x22d627c142707d2 Attempting to transition node f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 2011-03-17 22:23:14,832 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: regionserver:60020-0x22d627c142707d2 Retrieved 166 byte(s) of data from znode /hbase/unassigned/f11849557c64c4efdbe0498f3fe97a21; data=region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21., server=sv2borg180,60020,1300384550966, state=RS_ZK_REGION_OPENING 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x22d627c142707d2 Attempt to transition the unassigned node for f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the node existed but was in the state RS_ZK_REGION_OPENING 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed transition from OFFLINE to OPENING for region=f11849557c64c4efdbe0498f3fe97a21 {code} I'm not sure I fully understand what was going on... the master was suppose to OFFLINE the znode but then that's not what the region server was seeing? In any case, I was able to recover by doing a force unassign for each region and then assign. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3669) Region in PENDING_OPEN keeps being bounced between RS and master
[ https://issues.apache.org/jira/browse/HBASE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-3669: - Attachment: HBASE-3669-debug-v1.patch Adds more debug Region in PENDING_OPEN keeps being bounced between RS and master Key: HBASE-3669 URL: https://issues.apache.org/jira/browse/HBASE-3669 Project: HBase Issue Type: Bug Affects Versions: 0.90.1 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.90.2 Attachments: HBASE-3669-debug-v1.patch After going crazy killing region servers after HBASE-3668, most of the cluster recovered except for 3 regions that kept being refused by the region servers. One the master I would see: {code} 2011-03-17 22:23:14,828 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. state=PENDING_OPEN, ts=1300400554826 2011-03-17 22:23:14,828 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. 2011-03-17 22:23:14,828 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. state=PENDING_OPEN, ts=1300400554826 2011-03-17 22:23:14,828 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. so generated a random one; hri=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21., src=, dest=sv2borg171,60020,1300399357135; 17 (online=17, exclude=null) available servers 2011-03-17 22:23:14,828 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. to sv2borg171,60020,1300399357135 {code} Then on the region server: {code} 2011-03-17 22:23:14,829 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x22d627c142707d2 Attempting to transition node f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 2011-03-17 22:23:14,832 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: regionserver:60020-0x22d627c142707d2 Retrieved 166 byte(s) of data from znode /hbase/unassigned/f11849557c64c4efdbe0498f3fe97a21; data=region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21., server=sv2borg180,60020,1300384550966, state=RS_ZK_REGION_OPENING 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x22d627c142707d2 Attempt to transition the unassigned node for f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the node existed but was in the state RS_ZK_REGION_OPENING 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed transition from OFFLINE to OPENING for region=f11849557c64c4efdbe0498f3fe97a21 {code} I'm not sure I fully understand what was going on... the master was suppose to OFFLINE the znode but then that's not what the region server was seeing? In any case, I was able to recover by doing a force unassign for each region and then assign. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3627) NPE in EventHandler when region already reassigned
[ https://issues.apache.org/jira/browse/HBASE-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010807#comment-13010807 ] Jonathan Gray commented on HBASE-3627: -- looks good, +1 NPE in EventHandler when region already reassigned -- Key: HBASE-3627 URL: https://issues.apache.org/jira/browse/HBASE-3627 Project: HBase Issue Type: Bug Affects Versions: 0.90.1 Reporter: Jean-Daniel Cryans Assignee: stack Priority: Critical Fix For: 0.90.2 Attachments: 3627.txt When a region takes too long to open, it will try to update the unassigned znode and will fail on an ugly NPE like this: {quote} DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x22dc571dde04ca7 Attempting to transition node 0519dc3b62a569347526875048c37faa from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: regionserver:60020-0x22dc571dde04ca7 Unable to get data of znode /hbase/unassigned/0519dc3b62a569347526875048c37faa because node does not exist (not necessarily an error) ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_RS_OPEN_REGION java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198) at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672) at org.apache.hadoop.hbase.zookeeper.ZKAssign.retransitionNodeOpening(ZKAssign.java:585) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.tickleOpening(OpenRegionHandler.java:322) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:97) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) {quote} I think the region server in this case should be closing the region ASAP. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3654) Weird blocking between getOnlineRegion and createRegionLoad
[ https://issues.apache.org/jira/browse/HBASE-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010811#comment-13010811 ] Jonathan Gray commented on HBASE-3654: -- I'm late to the conversation, but have also seen contention on the onlineRegion map. Changing to CHM helped. Weird blocking between getOnlineRegion and createRegionLoad --- Key: HBASE-3654 URL: https://issues.apache.org/jira/browse/HBASE-3654 Project: HBase Issue Type: Bug Affects Versions: 0.90.1 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.90.2 Attachments: ConcurrentHM, ConcurrentSKLM, CopyOnWrite, HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_COWAL.patch, HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_COWAL1.patch, HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_ConcurrentHM.patch, TestOnlineRegions.java, hashmap Saw this when debugging something else: {code} regionserver60020 prio=10 tid=0x7f538c1c nid=0x4c7 runnable [0x7f53931da000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hbase.regionserver.Store.getStorefilesIndexSize(Store.java:1380) at org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:916) - locked 0x000672aa0a00 (a java.util.concurrent.ConcurrentSkipListMap) at org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:767) - locked 0x000656f62710 (a java.util.HashMap) at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:722) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:591) at java.lang.Thread.run(Thread.java:662) IPC Reader 9 on port 60020 prio=10 tid=0x7f538c1be000 nid=0x4c6 waiting for monitor entry [0x7f53932db000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.HRegionServer.getFromOnlineRegions(HRegionServer.java:2295) - waiting to lock 0x000656f62710 (a java.util.HashMap) at org.apache.hadoop.hbase.regionserver.HRegionServer.getOnlineRegion(HRegionServer.java:2307) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2333) at org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.isMetaRegion(HRegionServer.java:379) at org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:422) at org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:361) at org.apache.hadoop.hbase.ipc.HBaseServer.getQosLevel(HBaseServer.java:1126) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:982) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316) - locked 0x000656e60068 (a org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) ... IPC Reader 0 on port 60020 prio=10 tid=0x7f538c08b000 nid=0x4bd waiting for monitor entry [0x7f5393be4000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.HRegionServer.getFromOnlineRegions(HRegionServer.java:2295) - waiting to lock 0x000656f62710 (a java.util.HashMap) at org.apache.hadoop.hbase.regionserver.HRegionServer.getOnlineRegion(HRegionServer.java:2307) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2333) at org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.isMetaRegion(HRegionServer.java:379) at org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:422) at org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:361) at org.apache.hadoop.hbase.ipc.HBaseServer.getQosLevel(HBaseServer.java:1126) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:982) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522) at
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010983#comment-13010983 ] Jonathan Gray commented on HBASE-3694: -- Neither of these seem right. Issue with adding another method for this? high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3052) Add ability to have multiple ZK servers in a quorum in MiniZooKeeperCluster for test writing
[ https://issues.apache.org/jira/browse/HBASE-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011070#comment-13011070 ] Jonathan Gray commented on HBASE-3052: -- How the heck do you re-open a task on this new jira? :) Add ability to have multiple ZK servers in a quorum in MiniZooKeeperCluster for test writing Key: HBASE-3052 URL: https://issues.apache.org/jira/browse/HBASE-3052 Project: HBase Issue Type: Improvement Components: test, zookeeper Reporter: Jonathan Gray Assignee: Liyin Tang Priority: Minor Attachments: HBASE_3052[r1083993].patch, HBASE_3052[r1084033].patch Interesting things can happen when you have a ZK quorum of multiple servers and one of them dies. Doing testing here on clusters, this has turned up some bugs with HBase interaction with ZK. Would be good to add the ability to have multiple ZK servers in unit tests and be able to kill them individually. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3691) Add compressor support for 'snappy', google's compressor
[ https://issues.apache.org/jira/browse/HBASE-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010234#comment-13010234 ] Jonathan Gray commented on HBASE-3691: -- It's slightly faster for both compression and decompression when compared to LZO (169/434 vs. 250/500). I'm unsure of the difference in compression ratios but we can ship with it, yay Add compressor support for 'snappy', google's compressor Key: HBASE-3691 URL: https://issues.apache.org/jira/browse/HBASE-3691 Project: HBase Issue Type: Task Reporter: stack Priority: Critical Fix For: 0.92.0 http://code.google.com/p/snappy/ is apache licensed. bq. Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more. bq. Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as Zippy in some presentations and the likes.) Lets get it in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3693) isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase
[ https://issues.apache.org/jira/browse/HBASE-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010252#comment-13010252 ] Jonathan Gray commented on HBASE-3693: -- +1 on caching this. Good stuff! isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase -- Key: HBASE-3693 URL: https://issues.apache.org/jira/browse/HBASE-3693 Project: HBase Issue Type: Improvement Reporter: Kannan Muthukkaruppan Assignee: Liyin Tang We noticed that are lots of listStatus calls on the ColumnFamily directories within each regions, coming from this codepath: {code} compactionSelection() -- isMajorCompaction -- getLowestTimestamp() -- FileStatus[] stats = fs.listStatus(p); {code} So on every compactionSelection() we're taking this hit. While not immediately an issue, just from log inspection, this accounts for quite a large number of RPCs to namenode at the moment and seems like an unnecessary load to be sending to the namenode. Seems like it would be easy to cache the timestamp for each opened/created StoreFile, in memory, in the region server, and avoid going to DFS each time for this information. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3687) Bulk assign on startup should handle a ServerNotRunningException
[ https://issues.apache.org/jira/browse/HBASE-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009842#comment-13009842 ] Jonathan Gray commented on HBASE-3687: -- Shouldn't the RS not check in to the master with an RPC until it is available? Bulk assign on startup should handle a ServerNotRunningException Key: HBASE-3687 URL: https://issues.apache.org/jira/browse/HBASE-3687 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.90.2 Attachments: 3687.txt On startup, we do bulk assign. At the moment, if any problem during bulk assign, we consider startup failed and expectation is that you need to retry (We need to make this better but that is not what this issue is about). One exception that we should handle is the case where a RS is slow coming up and its rpc is not yet up listening. In this case it will throw: ServerNotRunningException. We should retry at least this one exception during bulk assign. We had this happen to us starting up a prod cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3687) Bulk assign on startup should handle a ServerNotRunningException
[ https://issues.apache.org/jira/browse/HBASE-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009843#comment-13009843 ] Jonathan Gray commented on HBASE-3687: -- and weren't we just saying that we should not be putting in Thread.sleeps ;) Bulk assign on startup should handle a ServerNotRunningException Key: HBASE-3687 URL: https://issues.apache.org/jira/browse/HBASE-3687 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.90.2 Attachments: 3687.txt On startup, we do bulk assign. At the moment, if any problem during bulk assign, we consider startup failed and expectation is that you need to retry (We need to make this better but that is not what this issue is about). One exception that we should handle is the case where a RS is slow coming up and its rpc is not yet up listening. In this case it will throw: ServerNotRunningException. We should retry at least this one exception during bulk assign. We had this happen to us starting up a prod cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3687) Bulk assign on startup should handle a ServerNotRunningException
[ https://issues.apache.org/jira/browse/HBASE-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009850#comment-13009850 ] Jonathan Gray commented on HBASE-3687: -- I think it's fine for now. The real fix should be having the RS not check in the master until it is fully online (agree, outside scope of this jira). Bulk assign on startup should handle a ServerNotRunningException Key: HBASE-3687 URL: https://issues.apache.org/jira/browse/HBASE-3687 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.90.2 Attachments: 3687.txt On startup, we do bulk assign. At the moment, if any problem during bulk assign, we consider startup failed and expectation is that you need to retry (We need to make this better but that is not what this issue is about). One exception that we should handle is the case where a RS is slow coming up and its rpc is not yet up listening. In this case it will throw: ServerNotRunningException. We should retry at least this one exception during bulk assign. We had this happen to us starting up a prod cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1755) Putting 'Meta' table into ZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009290#comment-13009290 ] Jonathan Gray commented on HBASE-1755: -- I generally agree that we should store temporary data in ZK, but I see META as largely temporary. Table/region meta data is already persisted on HDFS (we don't properly update, but that can be fixed without much trouble). And we have plans to move schema and configuration information into ZK for online changes, so at least on a running cluster, we'll be depending on ZK for region configuration. Otherwise, META is largely for locations. I also think the possibility exists to keep a META region but maintain region locations in ZK. In general, the special casing and exception handling around the reading and updating of META is extraordinarily painful both in the master and in the regionservers. Putting 'Meta' table into ZooKeeper --- Key: HBASE-1755 URL: https://issues.apache.org/jira/browse/HBASE-1755 Project: HBase Issue Type: Improvement Affects Versions: 0.90.0 Reporter: Erik Holstad Fix For: 0.92.0 Moving to 0.22.0 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-3322) HLog sync slowdown under heavy load with HBASE-2467
[ https://issues.apache.org/jira/browse/HBASE-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray resolved HBASE-3322. -- Resolution: Won't Fix There is an issue here but upon further investigation, it's not really a bug. The issue is around heavy concurrency / high number of threads in HLog. The current behavior is that each thread does a notify to the LogSyncer and then does a wait on a single object. The LogSyncer waits to be notified, then syncs what is pending, and then does a notifyAll to all the threads waiting for their sync. This is a straightforward and correct pattern but under heavy concurrency, the fact that all threads are waiting on a single object to be notified becomes a bottleneck. Will open other JIRAs to deal with solutions to this. Closing this one as this is not a blocking bug. HLog sync slowdown under heavy load with HBASE-2467 --- Key: HBASE-3322 URL: https://issues.apache.org/jira/browse/HBASE-3322 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.90.0 Reporter: Jonathan Gray Priority: Blocker Fix For: 0.92.0 Testing HBASE-2467 and HDFS-895 on 100 node cluster w/ a heavy increment workload we experienced significant slowdown. Stack traces show that most threads are on HLog.updateLock. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2549) Review Trackers (column, delete, etc) on Trunk after 2248 goes in for correctness and optimal earlying-out
[ https://issues.apache.org/jira/browse/HBASE-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009425#comment-13009425 ] Jonathan Gray commented on HBASE-2549: -- punted from 0.92 Review Trackers (column, delete, etc) on Trunk after 2248 goes in for correctness and optimal earlying-out -- Key: HBASE-2549 URL: https://issues.apache.org/jira/browse/HBASE-2549 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Once we move to all Scans, the trackers could use a refresh. There are often times where we return, for example, a MatchCode.SKIP (which just goes to the next KV not including the current one) where we could be sending a more optimal return code like MatchCode.SEEK_NEXT_ROW. This is a jira to review all of this code after 2248 goes in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2549) Review Trackers (column, delete, etc) on Trunk after 2248 goes in for correctness and optimal earlying-out
[ https://issues.apache.org/jira/browse/HBASE-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009428#comment-13009428 ] Jonathan Gray commented on HBASE-2549: -- (punting because this was largely done but would be good to do a full analysis at some point down the road) Review Trackers (column, delete, etc) on Trunk after 2248 goes in for correctness and optimal earlying-out -- Key: HBASE-2549 URL: https://issues.apache.org/jira/browse/HBASE-2549 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Once we move to all Scans, the trackers could use a refresh. There are often times where we return, for example, a MatchCode.SKIP (which just goes to the next KV not including the current one) where we could be sending a more optimal return code like MatchCode.SEEK_NEXT_ROW. This is a jira to review all of this code after 2248 goes in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2832) Priorities and multi-threading for MemStore flushing
[ https://issues.apache.org/jira/browse/HBASE-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-2832: - Fix Version/s: (was: 0.92.0) punting from 0.92. still needs to be done but should not be tied to a version until work is being actively done Priorities and multi-threading for MemStore flushing Key: HBASE-2832 URL: https://issues.apache.org/jira/browse/HBASE-2832 Project: HBase Issue Type: New Feature Components: regionserver Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Similar to HBASE-1476 and HBASE-2646 which are for compactions, but do this for flushes. Flushing when we hit the normal flush size is a low priority flush. Other types of flushes (heap pressure, blocking client requests, etc) are high priority. Should have a tunable number of concurrent flushes. Will use the {{HBaseExecutorService}} and {{HBaseEventHandler}} introduced from master/zk changes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2375) Make decision to split based on aggregate size of all StoreFiles and revisit related config params
[ https://issues.apache.org/jira/browse/HBASE-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-2375: - Fix Version/s: (was: 0.92.0) punting from 0.92. still needs to be done but should not be tied to a version until work is being actively done Make decision to split based on aggregate size of all StoreFiles and revisit related config params -- Key: HBASE-2375 URL: https://issues.apache.org/jira/browse/HBASE-2375 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.20.3 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Labels: moved_from_0_20_5 Attachments: HBASE-2375-v8.patch Currently we will make the decision to split a region when a single StoreFile in a single family exceeds the maximum region size. This issue is about changing the decision to split to be based on the aggregate size of all StoreFiles in a single family (but still not aggregating across families). This would move a check to split after flushes rather than after compactions. This issue should also deal with revisiting our default values for some related configuration parameters. The motivating factor for this change comes from watching the behavior of RegionServers during heavy write scenarios. Today the default behavior goes like this: - We fill up regions, and as long as you are not under global RS heap pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) StoreFiles. - After we get 3 StoreFiles (hbase.hstore.compactionThreshold) we trigger a compaction on this region. - Compaction queues notwithstanding, this will create a 192MB file, not triggering a split based on max region size (hbase.hregion.max.filesize). - You'll then flush two more 64MB MemStores and hit the compactionThreshold and trigger a compaction. - You end up with 192 + 64 + 64 in a single compaction. This will create a single 320MB and will trigger a split. - While you are performing the compaction (which now writes out 64MB more than the split size, so is about 5X slower than the time it takes to do a single flush), you are still taking on additional writes into MemStore. - Compaction finishes, decision to split is made, region is closed. The region now has to flush whichever edits made it to MemStore while the compaction ran. This flushing, in our tests, is by far the dominating factor in how long data is unavailable during a split. We measured about 1 second to do the region closing, master assignment, reopening. Flushing could take 5-6 seconds, during which time the region is unavailable. - The daughter regions re-open on the same RS. Immediately when the StoreFiles are opened, a compaction is triggered across all of their StoreFiles because they contain references. Since we cannot currently split a split, we need to not hang on to these references for long. This described behavior is really bad because of how often we have to rewrite data onto HDFS. Imports are usually just IO bound as the RS waits to flush and compact. In the above example, the first cell to be inserted into this region ends up being written to HDFS 4 times (initial flush, first compaction w/ no split decision, second compaction w/ split decision, third compaction on daughter region). In addition, we leave a large window where we take on edits (during the second compaction of 320MB) and then must make the region unavailable as we flush it. If we increased the compactionThreshold to be 5 and determined splits based on aggregate size, the behavior becomes: - We fill up regions, and as long as you are not under global RS heap pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) StoreFiles. - After each MemStore flush, we calculate the aggregate size of all StoreFiles. We can also check the compactionThreshold. For the first three flushes, both would not hit the limit. On the fourth flush, we would see total aggregate size = 256MB and determine to make a split. - Decision to split is made, region is closed. This time, the region just has to flush out whichever edits made it to the MemStore during the snapshot/flush of the previous MemStore. So this time window has shrunk by more than 75% as it was the time to write 64MB from memory not 320MB from aggregating 5 hdfs files. This will greatly reduce the time data is unavailable during splits. - The daughter regions re-open on the same RS. Immediately when the StoreFiles are opened, a compaction is triggered across all of their StoreFiles because they contain references. This would stay the same. In this example, we
[jira] [Updated] (HBASE-3641) LruBlockCache.CacheStats.getHitCount() is not using the correct variable
[ https://issues.apache.org/jira/browse/HBASE-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-3641: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to branch and trunk. Thanks stack. LruBlockCache.CacheStats.getHitCount() is not using the correct variable Key: HBASE-3641 URL: https://issues.apache.org/jira/browse/HBASE-3641 Project: HBase Issue Type: Bug Components: io Affects Versions: 0.90.1, 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Fix For: 0.90.2, 0.92.0 Attachments: HBASE-3641-v1.patch, HBASE-3641-v2.patch {code} public long getHitCount() { return hitCachingCount.get(); } {code} This should be {{hitCount.get()}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1110) Distribute the master role to HRS after ZK integration
[ https://issues.apache.org/jira/browse/HBASE-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009435#comment-13009435 ] Jonathan Gray commented on HBASE-1110: -- Is this really that important to do now? Seems simple enough to start master processes on slave nodes if you want lots of backups. If each RS can become a master, then you have to reserve heap in each to handle the master role (which is a non-trivial amount). I think this is a fine area to explore and always good to have options (this could make sense on a small cluster). But I'd opt to move out of 0.92. Distribute the master role to HRS after ZK integration -- Key: HBASE-1110 URL: https://issues.apache.org/jira/browse/HBASE-1110 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Fix For: 0.92.0 After ZK integration, the master role can be distributed out to the HRS as group behaviors mediated by synchronization and rendezvous points in ZK. - State sharing, for example load. -- Load information can be shared with neighbors via ephemeral child status znodes of a znode representing the cluster root. -- Region servers can periodically walk the status nodes of their neighbors. If they find themselves loaded relative to others, they can release regions. If they find themselves less loaded relative to others, they can be more aggressive about finding unassigned regions (see below). - Ephemeral znodes for region ownership, e.g. /hbase//region/ephemeral-node -- Use a permanent child of region to serve as a 'dirty' flag, removed during normal close. - A distributed queue for region assignment. -- When coming up, HRS can check the assignment queue for candidates. -- HRS shutdown includes marking regions clean and moving them onto assignment queue. -- All/any HRS can do occasional random walks over region leases looking for expired-dirty state (when timeout causes ZK to delete the ephemeral node representing the lease), and can helpfully move them first to a queue (+ barrier) for splitting then onto the assignment queue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
[ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009449#comment-13009449 ] Jonathan Gray commented on HBASE-3417: -- Just verified that this is the same as what we have been running with in production (since the patch was put up in January). I'm ready to commit if you want to +1 me :) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme -- Key: HBASE-3417 URL: https://issues.apache.org/jira/browse/HBASE-3417 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.92.0 Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, HBASE-3417-v5.patch Currently the block names used in the block cache are built using the filesystem path. However, for cache on write, the path is a temporary output file. The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough. Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3052) Add ability to have multiple ZK servers in a quorum in MiniZooKeeperCluster for test writing
[ https://issues.apache.org/jira/browse/HBASE-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009454#comment-13009454 ] Jonathan Gray commented on HBASE-3052: -- Patch is looking good but I'm confused by a few things. Are you starting all the servers at the beginning? Or do the ZK servers only actually start/run once you kill another one? The idea for this is to create a ZK quorum of servers and then be able to kill individual ones. Ideally, we'd also be able to specifically kill whichever server is the quorum leader. Also, I'm unclear on the meaning of candidate in this context. Is the candidate server the active server? Does that mean it's online? Maybe change the name or at least add some javadoc explaining what exactly is happening. Add ability to have multiple ZK servers in a quorum in MiniZooKeeperCluster for test writing Key: HBASE-3052 URL: https://issues.apache.org/jira/browse/HBASE-3052 Project: HBase Issue Type: Improvement Components: test, zookeeper Reporter: Jonathan Gray Assignee: Liyin Tang Priority: Minor Attachments: HBASE_3052[r1083993].patch Interesting things can happen when you have a ZK quorum of multiple servers and one of them dies. Doing testing here on clusters, this has turned up some bugs with HBase interaction with ZK. Would be good to add the ability to have multiple ZK servers in unit tests and be able to kill them individually. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3658) Alert when heap is over committed
[ https://issues.apache.org/jira/browse/HBASE-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008052#comment-13008052 ] Jonathan Gray commented on HBASE-3658: -- +1 on refusing to start Alert when heap is over committed - Key: HBASE-3658 URL: https://issues.apache.org/jira/browse/HBASE-3658 Project: HBase Issue Type: Improvement Affects Versions: 0.90.1 Reporter: Jean-Daniel Cryans Fix For: 0.92.0 Something I just witnessed, the block cache setting was at 70% but the max global memstore size was at the default of 40% meaning that 110% of the heap can potentially be assigned and then you need more heap to do stuff like flushing and compacting. We should run a configuration check that alerts the user when that happens and maybe even refuse to start. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3663) The starvation problem in current load balance algorithm
[ https://issues.apache.org/jira/browse/HBASE-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008089#comment-13008089 ] Jonathan Gray commented on HBASE-3663: -- I think this is an issue in the 0.20 / 0.89 version of the load balancer which is no longer in any active branches. The starvation problem in current load balance algorithm Key: HBASE-3663 URL: https://issues.apache.org/jira/browse/HBASE-3663 Project: HBase Issue Type: Bug Reporter: Liyin Tang Attachments: result_new_load_balance.txt, result_old_load_balance.txt This is an interesting starvation case. There are 2 conditions to trigger this problem. Condition1: r/s - r/(s+1) 1 Let r: the number of regions Let s: the number of servers Condition2: for each server, the load of each server is less or equal the ceil of avg load. Here is the unit test to verify this problem: For example, there are 16 servers and 62 regions. The avg load is 3.875. And setting the slot to 0 to keep the load of each server either 3 or 4. When a new server is coming, no server needs to assign regions to this new server, since no one is larger the ceil of the avg. (Setting slot to 0 is to easily trigger this situation, otherwise it needs much larger numbers) Solutions is pretty straightforward. Just compare the floor of the avg instead of the ceil. This solution will evenly balance the load from the servers which is little more loaded than others. I also attached the comparison result for the case mentioned above between the old balance algorithm and new balance algorithm. (I set the slot = 0 when testing) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HBASE-3641) LruBlockCache.CacheStats.getHitCount() is not using the correct variable
LruBlockCache.CacheStats.getHitCount() is not using the correct variable Key: HBASE-3641 URL: https://issues.apache.org/jira/browse/HBASE-3641 Project: HBase Issue Type: Bug Components: io Affects Versions: 0.90.1, 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Fix For: 0.90.2, 0.92.0 {code} public long getHitCount() { return hitCachingCount.get(); } {code} This should be {{hitCount.get()}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3641) LruBlockCache.CacheStats.getHitCount() is not using the correct variable
[ https://issues.apache.org/jira/browse/HBASE-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-3641: - Status: Patch Available (was: Open) LruBlockCache.CacheStats.getHitCount() is not using the correct variable Key: HBASE-3641 URL: https://issues.apache.org/jira/browse/HBASE-3641 Project: HBase Issue Type: Bug Components: io Affects Versions: 0.90.1, 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Fix For: 0.90.2, 0.92.0 Attachments: HBASE-3641-v1.patch {code} public long getHitCount() { return hitCachingCount.get(); } {code} This should be {{hitCount.get()}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3641) LruBlockCache.CacheStats.getHitCount() is not using the correct variable
[ https://issues.apache.org/jira/browse/HBASE-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-3641: - Attachment: HBASE-3641-v1.patch LruBlockCache.CacheStats.getHitCount() is not using the correct variable Key: HBASE-3641 URL: https://issues.apache.org/jira/browse/HBASE-3641 Project: HBase Issue Type: Bug Components: io Affects Versions: 0.90.1, 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Fix For: 0.90.2, 0.92.0 Attachments: HBASE-3641-v1.patch {code} public long getHitCount() { return hitCachingCount.get(); } {code} This should be {{hitCount.get()}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-1364) [performance] Distributed splitting of regionserver commit logs
[ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005821#comment-13005821 ] Jonathan Gray commented on HBASE-1364: -- FYI, Prakash Khemani is working on this right now. Not sure when a patch will be up but it's looking good so far. It is built on top of the new ZK stuff in 0.90 and above. [performance] Distributed splitting of regionserver commit logs --- Key: HBASE-1364 URL: https://issues.apache.org/jira/browse/HBASE-1364 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: stack Assignee: Alex Newman Priority: Critical Fix For: 0.92.0 Attachments: HBASE-1364.patch Time Spent: 8h Remaining Estimate: 0h HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. (Below is from HBASE-1008) In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3622) Deadlock in HBaseServer (JVM bug?)
[ https://issues.apache.org/jira/browse/HBASE-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005499#comment-13005499 ] Jonathan Gray commented on HBASE-3622: -- We run with +UseMembar at FB. I ran experiments on CPU-bound workloads and there was no significant difference in performance either way. Deadlock in HBaseServer (JVM bug?) -- Key: HBASE-3622 URL: https://issues.apache.org/jira/browse/HBASE-3622 Project: HBase Issue Type: Bug Affects Versions: 0.90.1 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.92.0 Attachments: HBASE-3622.patch On Dmitriy's cluster: {code} IPC Reader 0 on port 60020 prio=10 tid=0x2aacb4a82800 nid=0x3a72 waiting on condition [0x429ba000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaabf5fa6d0 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) at java.util.concurrent.LinkedBlockingQueue.signalNotEmpty(LinkedBlockingQueue.java:103) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:267) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:985) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316) - locked 0x2aaabf580fb0 (a org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) ... IPC Server handler 29 on 60020 daemon prio=10 tid=0x2aacbc163800 nid=0x3acc waiting on condition [0x462f3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaabf5e3800 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025) IPC Server handler 28 on 60020 daemon prio=10 tid=0x2aacbc161800 nid=0x3acb waiting on condition [0x461f2000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaabf5e3800 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025 ... {code} This region server stayed in this state for hours. The reader is waiting to put and the handlers are waiting to take, and they wait on different lock ids. It reminds me of the UseMembar thing about the JVM sometime missing to notify waiters. In any case, that RS needed to be closed in order to get out of that state. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3614) Expose per-region request rate metrics
[ https://issues.apache.org/jira/browse/HBASE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004817#comment-13004817 ] Jonathan Gray commented on HBASE-3614: -- I'm not sure if there is a JIRA yet, but some guys at FB did a bunch of work on doing per-family metrics. They did work to dynamically generate new metric names, etc. I think we could work on this at the same time we start to think about using the info for better load balancing and such. This could obviously come first. Expose per-region request rate metrics -- Key: HBASE-3614 URL: https://issues.apache.org/jira/browse/HBASE-3614 Project: HBase Issue Type: Improvement Components: metrics, regionserver Reporter: Gary Helmling Priority: Minor We currently export metrics on request rates for each region server, and this can help with identifying uneven load at a high level. But once you see a given server under high load, you're forced to extrapolate based on your application patterns and the data it's serving what the likely culprit is. This can and should be much easier if we just exported request rate metrics per-region on each server. Dynamically updating the metrics keys based on assigned regions may pose some minor challenges, but this seems a very valuable diagnostic tool to have available. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3573) Move shutdown messaging OFF hearbeat; prereq for fix of hbase-1502
[ https://issues.apache.org/jira/browse/HBASE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000613#comment-13000613 ] Jonathan Gray commented on HBASE-3573: -- not sure if it matters, but one check returns true if it the server holds a catalog region. then another check uses that check to determine that the last two server *only* hold catalogs. so in that case, they could still be holding other user regions? Move shutdown messaging OFF hearbeat; prereq for fix of hbase-1502 -- Key: HBASE-3573 URL: https://issues.apache.org/jira/browse/HBASE-3573 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.92.0 Attachments: 3573.txt, 3573.txt -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3573) Move shutdown messaging OFF hearbeat; prereq for fix of hbase-1502
[ https://issues.apache.org/jira/browse/HBASE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000654#comment-13000654 ] Jonathan Gray commented on HBASE-3573: -- Yeah, that all makes sense. Just making sure that's what you intended. +1 if tests pass and you tried it up on cluster. Move shutdown messaging OFF hearbeat; prereq for fix of hbase-1502 -- Key: HBASE-3573 URL: https://issues.apache.org/jira/browse/HBASE-3573 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.92.0 Attachments: 3573.txt, 3573.txt -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-2947) MultiIncrement (MultiGet functionality for increments)
[ https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12997163#comment-12997163 ] Jonathan Gray commented on HBASE-2947: -- HBASE-2814 seems to only be about thrift. This is to make Increment a Row operation so it can be used with the existing MultiAction stuff. MultiIncrement (MultiGet functionality for increments) -- Key: HBASE-2947 URL: https://issues.apache.org/jira/browse/HBASE-2947 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Attachments: HBASE-2947-v1.patch HBASE-1845 introduced MultiGet and other cross-row/cross-region batch operations. We should add a way to do that with increments. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira