[jira] [Commented] (HBASE-4583) Integrate RWCC with Append and Increment operations

2012-11-06 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491871#comment-13491871
 ] 

Jonathan Gray commented on HBASE-4583:
--

My vote (if only for one implementation) would be for the less radical patch 
that removes in-memory versions that are not visible rather than doing this 
cleanup on flush which has a number of performance implications.  I can see 
some reasons for wanting to keep versions around (providing support to an 
Omid-like transaction engine requires retaining old versions for at least some 
time), but it would be cool to have an option to prevent the deletion of the 
old versions rather than require that these exist in cases I won't ever use 
them.  In all my increment performance tests, of which there have been many, 
the upsert/removal of old versions is one of the biggest gains, especially if 
you have particularly hot columns.

I'm not sure which design you are referring to when you talk about being true 
to HBase's design ;) Or maybe you're referring to the general principles of 
HBase (append-only), but the increment operation itself was not part of any 
original design or implementation of HBase and has been a hack in one way or 
another from the very first implementation.  For the reason that the 
implementation has been targeted at performance over purity.  I've always seen 
it as an atomic operation that would have any notion of versioning as opaque to 
the user of the atomic increment.  Again, I can see use cases for it, but I'd 
lean towards having it as an option rather than requirement.

Thanks for doing this work, good stuff.  +1

 Integrate RWCC with Append and Increment operations
 ---

 Key: HBASE-4583
 URL: https://issues.apache.org/jira/browse/HBASE-4583
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.96.0

 Attachments: 4583-trunk-less-radical.txt, 
 4583-trunk-less-radical-v2.txt, 4583-trunk-less-radical-v3.txt, 
 4583-trunk-less-radical-v4.txt, 4583-trunk-less-radical-v5.txt, 
 4583-trunk-less-radical-v6.txt, 4583-trunk-radical.txt, 
 4583-trunk-radical_v2.txt, 4583-trunk-v3.txt, 4583.txt, 4583-v2.txt, 
 4583-v3.txt, 4583-v4.txt


 Currently Increment and Append operations do not work with RWCC and hence a 
 client could see the results of multiple such operation mixed in the same 
 Get/Scan.
 The semantics might be a bit more interesting here as upsert adds and removes 
 to and from the memstore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-4583) Integrate RWCC with Append and Increment operations

2012-11-06 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492144#comment-13492144
 ] 

Jonathan Gray commented on HBASE-4583:
--

That makes sense to me (versions = 1 means upsert).

Big +1 from me on adding support for setting the timestamp.

 Integrate RWCC with Append and Increment operations
 ---

 Key: HBASE-4583
 URL: https://issues.apache.org/jira/browse/HBASE-4583
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.96.0

 Attachments: 4583-mixed.txt, 4583-trunk-less-radical.txt, 
 4583-trunk-less-radical-v2.txt, 4583-trunk-less-radical-v3.txt, 
 4583-trunk-less-radical-v4.txt, 4583-trunk-less-radical-v5.txt, 
 4583-trunk-less-radical-v6.txt, 4583-trunk-radical.txt, 
 4583-trunk-radical_v2.txt, 4583-trunk-v3.txt, 4583.txt, 4583-v2.txt, 
 4583-v3.txt, 4583-v4.txt


 Currently Increment and Append operations do not work with RWCC and hence a 
 client could see the results of multiple such operation mixed in the same 
 Get/Scan.
 The semantics might be a bit more interesting here as upsert adds and removes 
 to and from the memstore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-4014) Coprocessors: Flag the presence of coprocessors in logged exceptions

2011-09-25 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114422#comment-13114422
 ] 

Jonathan Gray commented on HBASE-4014:
--

Ted, why is this JIRA scattered over so many commits?  And the commit message 
is a non-standard format (the first line is: HBASE-4014 is marked as 
Improvement).  I've been trying to build some tools to help keep track of and 
in sync with the Apache repos but this kind of stuff makes it very difficult.

 Coprocessors: Flag the presence of coprocessors in logged exceptions
 

 Key: HBASE-4014
 URL: https://issues.apache.org/jira/browse/HBASE-4014
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Andrew Purtell
Assignee: Eugene Koontz
 Fix For: 0.92.0

 Attachments: 4014.final, HBASE-4014.patch, HBASE-4014.patch, 
 HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch


 For some initial triage of bug reports for core versus for deployments with 
 loaded coprocessors, we need something like the Linux kernel's taint flag, 
 and list of linked in modules that show up in the output of every OOPS, to 
 appear above or below exceptions that appear in the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4014) Coprocessors: Flag the presence of coprocessors in logged exceptions

2011-09-24 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114040#comment-13114040
 ] 

Jonathan Gray commented on HBASE-4014:
--

What's the status of this?

 Coprocessors: Flag the presence of coprocessors in logged exceptions
 

 Key: HBASE-4014
 URL: https://issues.apache.org/jira/browse/HBASE-4014
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Andrew Purtell
Assignee: Eugene Koontz
 Fix For: 0.92.0

 Attachments: HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch, 
 HBASE-4014.patch, HBASE-4014.patch


 For some initial triage of bug reports for core versus for deployments with 
 loaded coprocessors, we need something like the Linux kernel's taint flag, 
 and list of linked in modules that show up in the output of every OOPS, to 
 appear above or below exceptions that appear in the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-24 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114043#comment-13114043
 ] 

Jonathan Gray commented on HBASE-4460:
--

Since security stuff can be dealt with in a separate JIRA, what do people think 
of the patch I have up?  Shall I submit to rb?

 Support running an embedded ThriftServer within a RegionServer
 --

 Key: HBASE-4460
 URL: https://issues.apache.org/jira/browse/HBASE-4460
 Project: HBase
  Issue Type: New Feature
  Components: regionserver, thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4460-v1.patch


 Rather than a separate process, it can be advantageous in some situations for 
 each RegionServer to embed their own ThriftServer.  This allows each embedded 
 ThriftServer to short-circuit any queries that should be executed on the 
 local RS and skip the extra hop.  This then enables the building of fat 
 Thrift clients that cache region locations and avoid extra hops all together.
 This JIRA is just about the embedded ThriftServer.  Will open others for the 
 rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4461) Expose getRowOrBefore via Thrift

2011-09-23 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113669#comment-13113669
 ] 

Jonathan Gray commented on HBASE-4461:
--

Well my plan is to use it internally on 0.92 (we are porting all the changes 
necessary for our fat C++ client from our internal 90 branch).  But wherever 
you think it should go is fine.

 Expose getRowOrBefore via Thrift
 

 Key: HBASE-4461
 URL: https://issues.apache.org/jira/browse/HBASE-4461
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4461-v2.patch


 In order for fat Thrift-based clients to locate region locations they need to 
 utilize the getRowOrBefore method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4461) Expose getRowOrBefore via Thrift

2011-09-23 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113671#comment-13113671
 ] 

Jonathan Gray commented on HBASE-4461:
--

and I'm saving up my new features to force in 92 to try and get the 
HLog/Delayable stuff in ;)

 Expose getRowOrBefore via Thrift
 

 Key: HBASE-4461
 URL: https://issues.apache.org/jira/browse/HBASE-4461
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4461-v2.patch


 In order for fat Thrift-based clients to locate region locations they need to 
 utilize the getRowOrBefore method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-23 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113674#comment-13113674
 ] 

Jonathan Gray commented on HBASE-4131:
--

Thanks stack!

 Make the Replication Service pluggable via a standard interface definition
 --

 Key: HBASE-4131
 URL: https://issues.apache.org/jira/browse/HBASE-4131
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: replicationInterface1.txt, replicationInterface2.txt, 
 replicationInterface3.txt


 The current HBase code supports a replication service that can be used to 
 sync data from from one hbase cluster to another. It would be nice to make it 
 a pluggable interface so that other cross-data-center replication services 
 can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4461) Expose getRowOrBefore via Thrift

2011-09-23 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113768#comment-13113768
 ] 

Jonathan Gray commented on HBASE-4461:
--

Man, I remember when i could buy your vote for $2.00!

 Expose getRowOrBefore via Thrift
 

 Key: HBASE-4461
 URL: https://issues.apache.org/jira/browse/HBASE-4461
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4461-v2.patch


 In order for fat Thrift-based clients to locate region locations they need to 
 utilize the getRowOrBefore method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms

2011-09-23 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113870#comment-13113870
 ] 

Jonathan Gray commented on HBASE-4449:
--

Is this done now?

 LoadIncrementalHFiles should be able to handle CFs with blooms
 --

 Key: HBASE-4449
 URL: https://issues.apache.org/jira/browse/HBASE-4449
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Dave Revell
Assignee: Dave Revell
 Fix For: 0.90.5

 Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, 
 HBASE-4449.patch


 When LoadIncrementalHFiles loads a store file that crosses region boundaries, 
 it will split the file at the boundary to create two store files. If the 
 store file is for a column family that has a bloom filter, then a 
 java.lang.ArithmeticException: / by zero will be raised because 
 ByteBloomFilter() is called with maxKeys of 0.
 The included patch assumes that the number of keys in each split child will 
 be equal to the number of keys in the parent's bloom filter (instead of 0). 
 This is an overestimate, but it's safe and easy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms

2011-09-23 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113872#comment-13113872
 ] 

Jonathan Gray commented on HBASE-4449:
--

It looks like the test change was committed but not the change to 
LoadIncrementalHFiles?

 LoadIncrementalHFiles should be able to handle CFs with blooms
 --

 Key: HBASE-4449
 URL: https://issues.apache.org/jira/browse/HBASE-4449
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Dave Revell
Assignee: Dave Revell
 Fix For: 0.90.5

 Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, 
 HBASE-4449.patch


 When LoadIncrementalHFiles loads a store file that crosses region boundaries, 
 it will split the file at the boundary to create two store files. If the 
 store file is for a column family that has a bloom filter, then a 
 java.lang.ArithmeticException: / by zero will be raised because 
 ByteBloomFilter() is called with maxKeys of 0.
 The included patch assumes that the number of keys in each split child will 
 be equal to the number of keys in the parent's bloom filter (instead of 0). 
 This is an overestimate, but it's safe and easy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Jonathan Gray (JIRA)
Support running an embedded ThriftServer within a RegionServer
--

 Key: HBASE-4460
 URL: https://issues.apache.org/jira/browse/HBASE-4460
 Project: HBase
  Issue Type: New Feature
  Components: regionserver, thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray


Rather than a separate process, it can be advantageous in some situations for 
each RegionServer to embed their own ThriftServer.  This allows each embedded 
ThriftServer to short-circuit any queries that should be executed on the local 
RS and skip the extra hop.  This then enables the building of fat Thrift 
clients that cache region locations and avoid extra hops all together.

This JIRA is just about the embedded ThriftServer.  Will open others for the 
rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4460:
-

Attachment: HBASE-4460-v1.patch

Adds {{HRegionThriftServer}}, a RegionServer hosted ThriftServer.  Default is 
off, can be turned on with hbase.regionserver.export.thrift set to true.

 Support running an embedded ThriftServer within a RegionServer
 --

 Key: HBASE-4460
 URL: https://issues.apache.org/jira/browse/HBASE-4460
 Project: HBase
  Issue Type: New Feature
  Components: regionserver, thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4460-v1.patch


 Rather than a separate process, it can be advantageous in some situations for 
 each RegionServer to embed their own ThriftServer.  This allows each embedded 
 ThriftServer to short-circuit any queries that should be executed on the 
 local RS and skip the extra hop.  This then enables the building of fat 
 Thrift clients that cache region locations and avoid extra hops all together.
 This JIRA is just about the embedded ThriftServer.  Will open others for the 
 rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112933#comment-13112933
 ] 

Jonathan Gray commented on HBASE-4460:
--

Replacing HRPC is another story but I think many of us are in agreement that 
we'd like to do that eventually.  The scope here is much smaller and I'm 
working on a set of changes to allow fat Thrift-based clients, not necessarily 
replacing normal HRPC.

Open to your feedback on what I can do to better integrate with security stuff 
but not sure what I can do at this point.

 Support running an embedded ThriftServer within a RegionServer
 --

 Key: HBASE-4460
 URL: https://issues.apache.org/jira/browse/HBASE-4460
 Project: HBase
  Issue Type: New Feature
  Components: regionserver, thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4460-v1.patch


 Rather than a separate process, it can be advantageous in some situations for 
 each RegionServer to embed their own ThriftServer.  This allows each embedded 
 ThriftServer to short-circuit any queries that should be executed on the 
 local RS and skip the extra hop.  This then enables the building of fat 
 Thrift clients that cache region locations and avoid extra hops all together.
 This JIRA is just about the embedded ThriftServer.  Will open others for the 
 rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4461) Expose getRowOrBefore via Thrift

2011-09-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112936#comment-13112936
 ] 

Jonathan Gray commented on HBASE-4461:
--

Thanks Ted.

 Expose getRowOrBefore via Thrift
 

 Key: HBASE-4461
 URL: https://issues.apache.org/jira/browse/HBASE-4461
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray

 In order for fat Thrift-based clients to locate region locations they need to 
 utilize the getRowOrBefore method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4296) Deprecate HTable[Interface].getRowOrBefore(...)

2011-09-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112935#comment-13112935
 ] 

Jonathan Gray commented on HBASE-4296:
--

Over in HBASE-4461 I am exposing this method to Thrift to enable building fat 
Thrift-based clients.  Rather than deprecating this, could we just notate that 
it is an expensive operation and not for normal operations?  Or even only allow 
it to work on ROOT and META?

 Deprecate HTable[Interface].getRowOrBefore(...)
 ---

 Key: HBASE-4296
 URL: https://issues.apache.org/jira/browse/HBASE-4296
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Trivial
 Fix For: 0.92.0

 Attachments: 4296.txt


 HTable's getRowOrBefore(...) internally calls into Store.getRowKeyAtOrBefore. 
 That method was created to allow our scanning of .META. (see HBASE-2600).
 Store.getRowKeyAtOrBefore(...) lists a bunch of requirements for this to be 
 performant that a user of HTable will not be aware of.
 I propose deprecating this in the public interface in 0.92 and removing it 
 from the public interface in 0.94. If we don't get to HBASE-2600 in 0.94 it 
 will still remain as internal interface for scanning meta.
 Comments?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112984#comment-13112984
 ] 

Jonathan Gray commented on HBASE-4460:
--

Gary, want to open another JIRA and link it here?

 Support running an embedded ThriftServer within a RegionServer
 --

 Key: HBASE-4460
 URL: https://issues.apache.org/jira/browse/HBASE-4460
 Project: HBase
  Issue Type: New Feature
  Components: regionserver, thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4460-v1.patch


 Rather than a separate process, it can be advantageous in some situations for 
 each RegionServer to embed their own ThriftServer.  This allows each embedded 
 ThriftServer to short-circuit any queries that should be executed on the 
 local RS and skip the extra hop.  This then enables the building of fat 
 Thrift clients that cache region locations and avoid extra hops all together.
 This JIRA is just about the embedded ThriftServer.  Will open others for the 
 rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4452:
-

Fix Version/s: 0.92.0

lgtm.  nice catch.  pulling in to 0.92

 Possibility of RS opening a region though tickleOpening fails due to znode 
 version mismatch
 ---

 Key: HBASE-4452
 URL: https://issues.apache.org/jira/browse/HBASE-4452
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-4452.patch


 Consider the following code
 {code}
 long period = Math.max(1, assignmentTimeout/ 3);
 long lastUpdate = now;
 while (!signaller.get()  t.isAlive()  !this.server.isStopped() 
 !this.rsServices.isStopping()  (endTime  now)) {
   long elapsed = now - lastUpdate;
   if (elapsed  period) {
 // Only tickle OPENING if postOpenDeployTasks is taking some time.
 lastUpdate = now;
 tickleOpening(post_open_deploy);
   }
 {code}
 Whenever the postopenDeploy tasks takes considerable time we try to 
 tickleOpening so that there is no timeout deducted.  But before it could do 
 this if the TimeoutMonitor tries to assign the node to another RS then the 
 other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
 tries to do tickleOpening the operation will fail. Now here lies the problem,
 {code}
 String encodedName = this.regionInfo.getEncodedName();
 try {
   this.version =
 ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
   this.regionInfo, this.server.getServerName(), this.version);
 } catch (KeeperException e) {
 {code}
 Now this.version becomes -1 as the operation failed.
 Now as in the first code snippet as the return type is not captured after 
 tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
 dont have any check for this condition as already the version has been 
 changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
 double assignment.
 {noformat}
 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
 node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
 expected version 2
 2011-09-22 00:57:33,494 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
 refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
 context=post_open_deploy
 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENED
 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENED
 2011-09-22 00:58:13,956 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
 t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
 {noformat}
 Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4462) Properly treating SocketTimeoutException

2011-09-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113011#comment-13113011
 ] 

Jonathan Gray commented on HBASE-4462:
--

+1 on treating STE differently.  I think we should treat it as DNRE and kick it 
back to the client.  There could be a configurable policy for socket timeouts 
(or network level errors in general?) if some people want the HBase client to 
retry once or something.

 Properly treating SocketTimeoutException
 

 Key: HBASE-4462
 URL: https://issues.apache.org/jira/browse/HBASE-4462
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0


 SocketTimeoutException is currently treated like any IOE inside of 
 HCM.getRegionServerWithRetries and I think this is a problem. This method 
 should only do retries in cases where we are pretty sure the operation will 
 complete, but with STE we already waited for (by default) 60 seconds and 
 nothing happened.
 I found this while debugging Douglas Campbell's problem on the mailing list 
 where it seemed like he was using the same scanner from multiple threads, but 
 actually it was just the same client doing retries while the first run didn't 
 even finish yet (that's another problem). You could see the first scanner, 
 then up to two other handlers waiting for it to finish in order to run 
 (because of the synchronization on RegionScanner).
 So what should we do? We could treat STE as a DoNotRetryException and let the 
 client deal with it, or we could retry only once.
 There's also the option of having a different behavior for get/put/icv/scan, 
 the issue with operations that modify a cell is that you don't know if the 
 operation completed or not (same when a RS dies hard after completing let's 
 say a Put but just before returning to the client).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4296) Deprecate HTable[Interface].getRowOrBefore(...)

2011-09-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113037#comment-13113037
 ] 

Jonathan Gray commented on HBASE-4296:
--

We are already using the fat thrift client on our 0.90 branch.  I'm in the 
process of pushing this all out into open source so we can then pull it back in 
to our 0.92 based branch.  I'm happy to put this stuff into 0.92 in Apache as 
well but it's somewhat featurish :)

Was the method removed in 0.94 already?  Can we just hold off on removing it 
into 2600 happens and that way it won't matter and we can commit it anywhere.  
Following 2600 we can modify how it works and just use a normal scanner then?

 Deprecate HTable[Interface].getRowOrBefore(...)
 ---

 Key: HBASE-4296
 URL: https://issues.apache.org/jira/browse/HBASE-4296
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Trivial
 Fix For: 0.92.0

 Attachments: 4296.txt


 HTable's getRowOrBefore(...) internally calls into Store.getRowKeyAtOrBefore. 
 That method was created to allow our scanning of .META. (see HBASE-2600).
 Store.getRowKeyAtOrBefore(...) lists a bunch of requirements for this to be 
 performant that a user of HTable will not be aware of.
 I propose deprecating this in the public interface in 0.92 and removing it 
 from the public interface in 0.94. If we don't get to HBASE-2600 in 0.94 it 
 will still remain as internal interface for scanning meta.
 Comments?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4461) Expose getRowOrBefore via Thrift

2011-09-22 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4461:
-

Attachment: HBASE-4461-v2.patch

Adds getRowOrBefore() exposed to Thrift.  Also adds server name and port to 
TRegionInfo so we can get assignment info through existing APIs in Thrift.

 Expose getRowOrBefore via Thrift
 

 Key: HBASE-4461
 URL: https://issues.apache.org/jira/browse/HBASE-4461
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4461-v2.patch


 In order for fat Thrift-based clients to locate region locations they need to 
 utilize the getRowOrBefore method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4451) Improve zk node naming (/hbase/shutdown)

2011-09-21 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112155#comment-13112155
 ] 

Jonathan Gray commented on HBASE-4451:
--

bq. Would changing this have an effect on compatibility? 

If you wanted to support this change over a rolling restart or anything like 
that, it would probably be rather complicated or impractical.  So it would 
require a full restart of the cluster most likely.  In addition, any external 
ops/monitoring/admin tools people have built might be looking at the specific 
names.  That shouldn't necessarily stop us though.

Perhaps we can do this as part of a fresh look at the names of the ZK nodes in 
general.  We might make some changes with the root node and such as well in 94. 
 Do you want to look at all the ZK node names and see if there's a new scheme 
that would be more clear?

 Improve zk node naming (/hbase/shutdown)
 

 Key: HBASE-4451
 URL: https://issues.apache.org/jira/browse/HBASE-4451
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.94.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.94.0


 Right now the node {{/hbase/shutdown}} is used to indicate cluster status 
 (cluster up, cluster down).
 However, upon a chat with Lars George today, we feel that having a name 
 {{/hbase/shutdown}} is possibly bad. The {{/hbase/shutdown}} zknode contains 
 a date when the cluster was _started_. Now that is difficult to understand 
 and digest, given that a person may connect to zk and try to look at what it 
 is about (they may think it 'shutdown' at that date.).
 I feel a better name may simply be: {{/hbase/running}}. Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4132) Extend the WALActionsListener API to accomodate log archival

2011-09-21 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112249#comment-13112249
 ] 

Jonathan Gray commented on HBASE-4132:
--

Looks good.  One thing:

{code}oldPath = new Path(/DUMMY-No-preexisting-logfile);{code}

Should we support passing a null path or at least use a static?

 Extend the WALActionsListener API to accomodate log archival
 

 Key: HBASE-4132
 URL: https://issues.apache.org/jira/browse/HBASE-4132
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: walArchive.txt, walArchive2.txt


 The WALObserver interface exposes the log roll events. It would be nice to 
 extend it to accomodate log archival events as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-21 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112250#comment-13112250
 ] 

Jonathan Gray commented on HBASE-4153:
--

Looks like this introduced a compile error in MockRegionServerServices?

 Handle RegionAlreadyInTransitionException in AssignmentManager
 --

 Key: HBASE-4153
 URL: https://issues.apache.org/jira/browse/HBASE-4153
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: 4153-v3.txt, HBASE-4153_1.patch, HBASE-4153_2.patch, 
 HBASE-4153_3.patch, HBASE-4153_4.patch, HBASE-4153_5.patch, HBASE-4153_6.patch


 Comment from Stack over in HBASE-3741:
 {quote}
 Question: Looking at this patch again, if we throw a 
 RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
 though RegionAlreadyInTransitionException in at least one case here is saying 
 that the region is already open on this regionserver?
 {quote}
 Indeed looking at the code it's going to be handled the same way other 
 exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-21 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112251#comment-13112251
 ] 

Jonathan Gray commented on HBASE-4153:
--

nevermind!

 Handle RegionAlreadyInTransitionException in AssignmentManager
 --

 Key: HBASE-4153
 URL: https://issues.apache.org/jira/browse/HBASE-4153
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: 4153-v3.txt, HBASE-4153_1.patch, HBASE-4153_2.patch, 
 HBASE-4153_3.patch, HBASE-4153_4.patch, HBASE-4153_5.patch, HBASE-4153_6.patch


 Comment from Stack over in HBASE-3741:
 {quote}
 Question: Looking at this patch again, if we throw a 
 RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
 though RegionAlreadyInTransitionException in at least one case here is saying 
 that the region is already open on this regionserver?
 {quote}
 Indeed looking at the code it's going to be handled the same way other 
 exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4432) Enable/Disable off heap cache with config

2011-09-19 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108104#comment-13108104
 ] 

Jonathan Gray commented on HBASE-4432:
--

+1

 Enable/Disable off heap cache with config
 -

 Key: HBASE-4432
 URL: https://issues.apache.org/jira/browse/HBASE-4432
 Project: HBase
  Issue Type: Improvement
Reporter: Li Pi
Assignee: Li Pi
Priority: Trivial
 Attachments: 4432.v3, enableswitchforoffheapcache.txt, patchv2.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-19 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108110#comment-13108110
 ] 

Jonathan Gray commented on HBASE-4433:
--

Good stuff.  I think the first iteration of the ColumnTracker had the 
INCLUDE_AND_* primitives but it was simplified.  Would be pretty cool that 
write up a unit test that creates single-KV sized blocks and you could run 
various queries to see the number of blocks accessed.  Especially nice to catch 
regressions in the future.

 avoid extra next (potentially a seek) if done with column/row
 -

 Key: HBASE-4433
 URL: https://issues.apache.org/jira/browse/HBASE-4433
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

 [Noticed this in 89, but quite likely true of trunk as well.]
 When we are done with the requested column(s) the code still does an extra 
 next() call before it realizes that it is actually done. This extra next() 
 call could potentially result in an unnecessary extra block load. This is 
 likely to be especially bad for CFs where the KVs are large blobs where each 
 KV may be occupying a block of its own. So the next() can often load a new 
 unrelated block unnecessarily.
 --
 For the simple case of reading say the top-most column in a row in a single 
 file, where each column (KV) was say a block of its own-- it seems that we 
 are reading 3 blocks, instead of 1 block!
 I am working on a simple patch and with that the number of seeks is down to 
 2. 
 [There is still an extra seek left.  I think there were two levels of 
 extra/unnecessary next() we were doing without actually confirming that the 
 next was needed. One at the StoreScanner/ScanQueryMatcher level which this 
 diff avoids. I think the other is at hfs.next() (at the storefile scanner 
 level) that's happening whenever a HFile scanner servers out a data-- and 
 perhaps that's the additional seek that we need to avoid. But I want to 
 tackle this optimization first as the two issues seem unrelated.]
 -- 
 The basic idea of the patch I am working on/testing is as follows. The 
 ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if 
 the KV needs to be included and then if done, only in the the next call it 
 returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases 
 when ExplicitColumnTracker knows it is done with a particular column/row, the 
 patch attempts to combine the INCLUDE code and done hint into a single match 
 code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4410) FilterList.filterKeyValue can return suboptimal ReturnCodes

2011-09-18 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107495#comment-13107495
 ] 

Jonathan Gray commented on HBASE-4410:
--

Actually I think Lars is correct.  It's a question of whether we should execute 
all filters in a list filterKeyValue() or not.

I think the right behavior is actually just to make it execute how one would 
expect this type of conditional to execute:

if (conditionA  conditionB)

If conditionA fails, we don't expect conditionB to be executed.

if (conditionA || conditionB)

If conditionA passes, we don't expect conditionB to be executed.

This was the previous behavior and my patch undoes it.  I will work on a new 
patch.

 FilterList.filterKeyValue can return suboptimal ReturnCodes
 ---

 Key: HBASE-4410
 URL: https://issues.apache.org/jira/browse/HBASE-4410
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4410-v1.patch


 FilterList.filterKeyValue does not always return the most optimal ReturnCode 
 in both the AND and OR conditions.
 For example, if you have F1 AND F2, F1 returns SKIP.  It immediately returns 
 the SKIP.  However, if F2 would have returned NEXT_COL or NEXT_ROW or 
 SEEK_NEXT_USING_HINT, we would actually be able to return the more optimal 
 ReturnCode from F2.
 For AND conditions, we can always pick the *most restrictive* return code.
 For OR conditions, we must always pick the *least restrictive* return code.
 This JIRA is to review the FilterList.filterKeyValue() method to try and make 
 it more optimal and to add a new unit test which verifies the correct 
 behavior.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4373) HBaseAdmin.assign() does not use force flag

2011-09-16 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106760#comment-13106760
 ] 

Jonathan Gray commented on HBASE-4373:
--

Trying to understand this patch.  So with the force flag removed, what is the 
default behavior?  If the state is not OFFLINE and we try to assign somewhere 
else, do we force the node to OFFLINE always?

 HBaseAdmin.assign() does not use force flag
 ---

 Key: HBASE-4373
 URL: https://issues.apache.org/jira/browse/HBASE-4373
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4373.patch, HBASE-4373_1.patch


 The HBaseAdmin.assign()
 {code}
   public void assign(final byte [] regionName, final boolean force)
   throws MasterNotRunningException, ZooKeeperConnectionException, IOException 
 {
 getMaster().assign(regionName, force);
   }
 {code}
 In the HMaster we call 
 {code}
 PairHRegionInfo, ServerName pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toString(regionName));
 if (cpHost != null) {
   if (cpHost.preAssign(pair.getFirst(), force)) {
 return;
   }
 }
 assignRegion(pair.getFirst());
 if (cpHost != null) {
   cpHost.postAssign(pair.getFirst(), force);
 }
 {code}
 The force flag is not getting used.  May be we need to update the javadoc or 
 do not provide the force flag as a parameter if we are not going to use it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4422) Move block cache parameters and references into single CacheConf class

2011-09-16 Thread Jonathan Gray (JIRA)
Move block cache parameters and references into single CacheConf class
--

 Key: HBASE-4422
 URL: https://issues.apache.org/jira/browse/HBASE-4422
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.92.0


From StoreFile down to HFile, we currently use a boolean argument for each of 
the various block cache configuration parameters that exist.  The number of 
parameters is going to continue to increase as we look at compressed cache, 
delta encoding, and more specific L1/L2 configuration.  Every new config 
currently requires changing many constructors because it introduces a new 
boolean.

We should move everything into a single class so that modifications are much 
less disruptive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4410) FilterList.filterKeyValue can return suboptimal ReturnCodes

2011-09-14 Thread Jonathan Gray (JIRA)
FilterList.filterKeyValue can return suboptimal ReturnCodes
---

 Key: HBASE-4410
 URL: https://issues.apache.org/jira/browse/HBASE-4410
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0


FilterList.filterKeyValue does not always return the most optimal ReturnCode in 
both the AND and OR conditions.

For example, if you have F1 AND F2, F1 returns SKIP.  It immediately returns 
the SKIP.  However, if F2 would have returned NEXT_COL or NEXT_ROW or 
SEEK_NEXT_USING_HINT, we would actually be able to return the more optimal 
ReturnCode from F2.

For AND conditions, we can always pick the *most restrictive* return code.

For OR conditions, we must always pick the *least restrictive* return code.

This JIRA is to review the FilterList.filterKeyValue() method to try and make 
it more optimal and to add a new unit test which verifies the correct behavior.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4410) FilterList.filterKeyValue can return suboptimal ReturnCodes

2011-09-14 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4410:
-

Attachment: HBASE-4410-v1.patch

Implements changes described in description and includes unit test.  New test 
and existing tests are passing, kicking off full suite now.

 FilterList.filterKeyValue can return suboptimal ReturnCodes
 ---

 Key: HBASE-4410
 URL: https://issues.apache.org/jira/browse/HBASE-4410
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4410-v1.patch


 FilterList.filterKeyValue does not always return the most optimal ReturnCode 
 in both the AND and OR conditions.
 For example, if you have F1 AND F2, F1 returns SKIP.  It immediately returns 
 the SKIP.  However, if F2 would have returned NEXT_COL or NEXT_ROW or 
 SEEK_NEXT_USING_HINT, we would actually be able to return the more optimal 
 ReturnCode from F2.
 For AND conditions, we can always pick the *most restrictive* return code.
 For OR conditions, we must always pick the *least restrictive* return code.
 This JIRA is to review the FilterList.filterKeyValue() method to try and make 
 it more optimal and to add a new unit test which verifies the correct 
 behavior.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4310) SlabCache metrics bugfix.

2011-09-14 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104949#comment-13104949
 ] 

Jonathan Gray commented on HBASE-4310:
--

Can someone explain the three commits on this JIRA?  Is the final commit from a 
different JIRA?  It has a different commit message name but is linked to this 
JIRA and there is nothing in CHANGES.txt and nothing here in the JIRA talking 
about the change?

 SlabCache metrics bugfix.
 -

 Key: HBASE-4310
 URL: https://issues.apache.org/jira/browse/HBASE-4310
 Project: HBase
  Issue Type: Sub-task
Reporter: Li Pi
Assignee: Li Pi
Priority: Minor
 Fix For: 0.92.0

 Attachments: metrics.txt, metrics.txt, metrics.txt, metricsv2.txt, 
 metricsv2.txt, metricsv3.txt


 math error in metrics makes it display incorrect metrics. also no longer logs 
 metrics of size 0 to save space. Also added second log for those things that 
 are successfully cached.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4310) SlabCache metrics bugfix.

2011-09-14 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104954#comment-13104954
 ] 

Jonathan Gray commented on HBASE-4310:
--

I see two separate lines for this JIRA in CHANGES as well.  Is this was 
prompted some of those discussions about multiple commits on a JIRA?  We should 
at least amend the CHANGES and commit message that it's a follow-up if nothing 
else.

 SlabCache metrics bugfix.
 -

 Key: HBASE-4310
 URL: https://issues.apache.org/jira/browse/HBASE-4310
 Project: HBase
  Issue Type: Sub-task
Reporter: Li Pi
Assignee: Li Pi
Priority: Minor
 Fix For: 0.92.0

 Attachments: metrics.txt, metrics.txt, metrics.txt, metricsv2.txt, 
 metricsv2.txt, metricsv3.txt


 math error in metrics makes it display incorrect metrics. also no longer logs 
 metrics of size 0 to save space. Also added second log for those things that 
 are successfully cached.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4320) Off Heap Cache never creates Slabs

2011-09-13 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103794#comment-13103794
 ] 

Jonathan Gray commented on HBASE-4320:
--

Looks like this was committed with HBASE-4027 in the message and not 
HBASE-4320.  Guess there's no way to retroactively fix that but in case anyone 
comes here looking for the revision info it's linked over in the other jira.

 Off Heap Cache never creates Slabs
 --

 Key: HBASE-4320
 URL: https://issues.apache.org/jira/browse/HBASE-4320
 Project: HBase
  Issue Type: Sub-task
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.92.0

 Attachments: confnotloading.txt


 On testing, the configuration file is never loaded by the off heap cache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4394) Add support for seeking hints to FilterList

2011-09-13 Thread Jonathan Gray (JIRA)
Add support for seeking hints to FilterList
---

 Key: HBASE-4394
 URL: https://issues.apache.org/jira/browse/HBASE-4394
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0


Currently FilterList's do not support getNextKeyHint() even if the underlying 
filters are giving hints.  We should add support for FilterList to pass these 
through.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList

2011-09-13 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4394:
-

Attachment: HBASE-4394-v1.patch

Adds support for seek hints to FilterList and adds a unit test to 
TestFilterList that ensures it does the right thing across the different 
variations of inputs to a filterlist.

 Add support for seeking hints to FilterList
 ---

 Key: HBASE-4394
 URL: https://issues.apache.org/jira/browse/HBASE-4394
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4394-v1.patch


 Currently FilterList's do not support getNextKeyHint() even if the underlying 
 filters are giving hints.  We should add support for FilterList to pass these 
 through.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList

2011-09-13 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4394:
-

Status: Patch Available  (was: Open)

 Add support for seeking hints to FilterList
 ---

 Key: HBASE-4394
 URL: https://issues.apache.org/jira/browse/HBASE-4394
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4394-v1.patch


 Currently FilterList's do not support getNextKeyHint() even if the underlying 
 filters are giving hints.  We should add support for FilterList to pass these 
 through.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList

2011-09-13 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4394:
-

Attachment: HBASE-4394-trunk-v2.patch

Rebased for trunk

 Add support for seeking hints to FilterList
 ---

 Key: HBASE-4394
 URL: https://issues.apache.org/jira/browse/HBASE-4394
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4394-trunk-v2.patch, HBASE-4394-v1.patch


 Currently FilterList's do not support getNextKeyHint() even if the underlying 
 filters are giving hints.  We should add support for FilterList to pass these 
 through.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4239) HBASE-4012 introduced duplicate variable Bytes.LONG_BYTES

2011-08-22 Thread Jonathan Gray (JIRA)
HBASE-4012 introduced duplicate variable Bytes.LONG_BYTES
-

 Key: HBASE-4239
 URL: https://issues.apache.org/jira/browse/HBASE-4239
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Ted Yu
Priority: Trivial
 Fix For: 0.92.0


HBASE-4012 introduced Bytes.LONG_SIZE.  This is a duplicate of 
Bytes.SIZEOF_LONG.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4239) HBASE-4012 introduced duplicate variable Bytes.LONG_BYTES

2011-08-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089149#comment-13089149
 ] 

Jonathan Gray commented on HBASE-4239:
--

+1

 HBASE-4012 introduced duplicate variable Bytes.LONG_BYTES
 -

 Key: HBASE-4239
 URL: https://issues.apache.org/jira/browse/HBASE-4239
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Ted Yu
Priority: Trivial
 Fix For: 0.92.0

 Attachments: 4239.txt


 HBASE-4012 introduced Bytes.LONG_SIZE.  This is a duplicate of 
 Bytes.SIZEOF_LONG.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-08-17 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086708#comment-13086708
 ] 

Jonathan Gray commented on HBASE-4218:
--

bq. in the mean time there will be places it has to cut a full KeyValue by 
copying bytes
Agreed.  There's some other work going on around slab allocators and object 
reuse that could be paired with this to ameliorate some of that overhead.

 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: Jacek Migdal
  Labels: compression

 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy

2011-08-12 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084333#comment-13084333
 ] 

Jonathan Gray commented on HBASE-4015:
--

Sorry I'm a little late to this discussion but I like the idea of not adding a 
new state.  Instead, we can just pass the znode version number in the RPC to 
the regionservers.  Or encode the servername in the znode.

 Refactor the TimeoutMonitor to make it less racy
 

 Key: HBASE-4015
 URL: https://issues.apache.org/jira/browse/HBASE-4015
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.90.3
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.92.0

 Attachments: HBASE-4015_1_trunk.patch, Timeoutmonitor with state 
 diagrams.pdf


 The current implementation of the TimeoutMonitor acts like a race condition 
 generator, mostly making things worse rather than better. It does it's own 
 thing for a while without caring for what's happening in the rest of the 
 master.
 The first thing that needs to happen is that the regions should not be 
 processed in one big batch, because that sometimes can take minutes to 
 process (meanwhile a region that timed out opening might have opened, then 
 what happens is it will be reassigned by the TimeoutMonitor generating the 
 never ending PENDING_OPEN situation).
 Those operations should also be done more atomically, although I'm not sure 
 how to do it in a scalable way in this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3899) enhance HBase RPC to support free-ing up server handler threads even if response is not ready

2011-07-27 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071861#comment-13071861
 ] 

Jonathan Gray commented on HBASE-3899:
--

Test passes for me on trunk.

 enhance HBase RPC to support free-ing up server handler threads even if 
 response is not ready
 -

 Key: HBASE-3899
 URL: https://issues.apache.org/jira/browse/HBASE-3899
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.92.0

 Attachments: HBASE-3899-2.patch, HBASE-3899.patch, asyncRpc.txt, 
 asyncRpc.txt


 In the current implementation, the server handler thread picks up an item 
 from the incoming callqueue, processes it and then wraps the response as a 
 Writable and sends it back to the IPC server module. This wastes 
 thread-resources when the thread is blocked for disk IO (transaction logging, 
 read into block cache, etc).
 It would be nice if we can make the RPC Server Handler threads pick up a call 
 from the IPC queue, hand it over to the application (e.g. HRegion), the 
 application can queue it to be processed asynchronously and send a response 
 back to the IPC server module saying that the response is not ready. The RPC 
 Server Handler thread is now ready to pick up another request from the 
 incoming callqueue. When the queued call is processed by the application, it 
 indicates to the IPC module that the response is now ready to be sent back to 
 the client.
 The RPC client continues to experience the same behaviour as before. A RPC 
 client is synchronous and blocks till the response arrives.
 This RPC enhancement allows us to do very powerful things with the 
 RegionServer. In future, we can make enhance the RegionServer's threading 
 model to a message-passing model for better performance. We will not be 
 limited by the number of threads in the RegionServer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4060) Making region assignment more robust

2011-07-24 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070218#comment-13070218
 ] 

Jonathan Gray commented on HBASE-4060:
--

The primary difference between the suggestion by Eran and what is currently 
implemented is that the per-region znodes are never deleted in Eran's design.  
The existing implementation uses znodes to track regions that are currently in 
transition.  An assigned and open region doesn't have a znode (nor would an 
unassigned and closed region of a disabled table).

Check out ZKAssign and AssignmentManager for details on how that works.

 Making region assignment more robust
 

 Key: HBASE-4060
 URL: https://issues.apache.org/jira/browse/HBASE-4060
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
 Fix For: 0.92.0


 From Eran Kutner:
 My concern is that the region allocation process seems to rely too much on
 timing considerations and doesn't seem to take enough measures to guarantee
 conflicts do not occur. I understand that in a distributed environment, when
 you don't get a timely response from a remote machine you can't know for
 sure if it did or did not receive the request, however there are things that
 can be done to mitigate this and reduce the conflict time significantly. For
 example, when I run dbck it knows that some regions are multiply assigned,
 the master could do the same and try to resolve the conflict. Another
 approach would be to handle late responses, even if the response from the
 remote machine arrives after it was assumed to be dead the master should
 have enough information to know it had created a conflict by assigning the
 region to another server. An even better solution, I think, is for the RS to
 periodically test that it is indeed the rightful owner of every region it
 holds and relinquish control over the region if it's not.
 Obviously a state where two RSs hold the same region is pathological and can
 lead to data loss, as demonstrated in my case. The system should be able to
 actively protect itself against such a scenario. It probably doesn't need
 saying but there is really nothing worse for a data storage system than data
 loss.
 In my case the problem didn't happen in the initial phase but after
 disabling and enabling a table with about 12K regions.
 For more background information, see 'Errors after major compaction' 
 discussion on u...@hbase.apache.org

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-07-21 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069204#comment-13069204
 ] 

Jonathan Gray commented on HBASE-3417:
--

It does support COW but if it doesn't include changes to how files are named, 
it will still need this fix.  Will follow-up with Mikhail.

 CacheOnWrite is using the temporary output path for block names, need to use 
 a more consistent block naming scheme
 --

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, 
 HBASE-3417-v5.patch


 Currently the block names used in the block cache are built using the 
 filesystem path.  However, for cache on write, the path is a temporary output 
 file.
 The original COW patch actually made some modifications to block naming stuff 
 to make it more consistent but did not do enough.  Should add a separate 
 method somewhere for generating block names using some more easily mocked 
 scheme (rather than just raw path as we generate a random unique file name 
 twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4084) Auto-Split runs only if there are many store files per region

2011-07-11 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13063476#comment-13063476
 ] 

Jonathan Gray commented on HBASE-4084:
--

I thought splits were triggered following a compaction not a flush?

 Auto-Split runs only if there are many store files per region
 -

 Key: HBASE-4084
 URL: https://issues.apache.org/jira/browse/HBASE-4084
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: John Heitmann

 Currently, MemStoreFlusher.flushRegion() is the driver of auto-splitting. It 
 only decides to auto-split a region if there are too many store files per 
 region. Since it's not guaranteed that the number of store files per region 
 always grows above the too many count before compaction reduces the count, 
 there is no guarantee that auto-split will ever happen. In my test setup, 
 compaction seems to always win the race and I haven't noticed auto-splitting 
 happen once.
 It appears that the intention is to have split be mutually exclusive with 
 compaction, and to have flushing be mutually exclusive with regions badly in 
 need of compaction, but that resulted in auto-splitting being nested in a 
 too-restrictive spot.
 I'm not sure what the right fix is. Having one method that is essentially 
 requestSplitOrCompact would probably help readability, and could be the 
 ultimate solution if it replaces other calls of requestCompaction().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4056) Support for using faster storage for write-ahead log

2011-07-08 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061786#comment-13061786
 ] 

Jonathan Gray commented on HBASE-4056:
--

Thanks for opening this JIRA.

What do you see as the primary benefit of using flash for the WAL?  I've seen 
some improvement in sequential write throughput, but not drastically different.

It seems to me that a significant benefit of using flash is the fast random 
read access, and there are no random reads on the WAL.

One idea that has floated around is to do something like cache-on-write to copy 
recently written files onto flash (in addition to HDFS) to allow for fast 
random read access.  Or use flash as some kind of extension to the block cache.

But regardless, making all of this stuff configurable and supporting more 
diverse setups is a good thing in general.  Some experiments and benchmarks 
around this would be awesome.  Good stuff.

 Support for using faster storage for write-ahead log
 

 Key: HBASE-4056
 URL: https://issues.apache.org/jira/browse/HBASE-4056
 Project: HBase
  Issue Type: New Feature
Reporter: Praveen Kumar
Priority: Minor
  Labels: features

 On clusters with heterogeneous storage components like hard drives and flash 
 memory, it could be beneficial to use flash memory for write-ahead log. This 
 can be accomplished by using client side mount table support (HADOOP-7257) 
 that is offered by HDFS federation (HDFS-1052) feature. One can define two 
 HDFS namespaces (faster and slower), and configure HBase to use faster 
 storage namespace for storing WAL.
 This is an abstract task that captures the idea. More brainstorming and 
 subtasks identification to follow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-07-07 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061102#comment-13061102
 ] 

Jonathan Gray commented on HBASE-4071:
--

I like this idea.  It's somewhat related to an idea for a TTKAV 
(TimeToKeepAllValues) parameter that would allow a point-in-time 
SnapshotScanner.  See HBASE-2376

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack

 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4060) Making region assignment more robust

2011-07-05 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060183#comment-13060183
 ] 

Jonathan Gray commented on HBASE-4060:
--

Andrew, we are already doing something like what you describe.  It seems the 
issue is what Ted describes in #2 but it's not clear to me how this bug is 
being triggered.

In TimeoutMonitor, we attempt to do an atomic change of state from OPENING to 
OFFLINE.  If this fails, we don't do anything.  If it succeeds, we attempt to 
do a reassign.

In OpenRegionHandler (in the RS), we attempt an atomic change of state from 
OPENING to OPENED.  If this fails, we roll back our open.  If it succeeds, we 
are opened and the node is at OPENED.

In OpenedRegionHandler (in the master), the first thing we do is delete a node 
but only if in OPENED state.  If the TimeoutMonitor had done anything, it would 
have switched the state to OFFLINE.


What am I missing?

 Making region assignment more robust
 

 Key: HBASE-4060
 URL: https://issues.apache.org/jira/browse/HBASE-4060
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
 Fix For: 0.92.0


 From Eran Kutner:
 My concern is that the region allocation process seems to rely too much on
 timing considerations and doesn't seem to take enough measures to guarantee
 conflicts do not occur. I understand that in a distributed environment, when
 you don't get a timely response from a remote machine you can't know for
 sure if it did or did not receive the request, however there are things that
 can be done to mitigate this and reduce the conflict time significantly. For
 example, when I run dbck it knows that some regions are multiply assigned,
 the master could do the same and try to resolve the conflict. Another
 approach would be to handle late responses, even if the response from the
 remote machine arrives after it was assumed to be dead the master should
 have enough information to know it had created a conflict by assigning the
 region to another server. An even better solution, I think, is for the RS to
 periodically test that it is indeed the rightful owner of every region it
 holds and relinquish control over the region if it's not.
 Obviously a state where two RSs hold the same region is pathological and can
 lead to data loss, as demonstrated in my case. The system should be able to
 actively protect itself against such a scenario. It probably doesn't need
 saying but there is really nothing worse for a data storage system than data
 loss.
 In my case the problem didn't happen in the initial phase but after
 disabling and enabling a table with about 12K regions.
 For more background information, see 'Errors after major compaction' 
 discussion on u...@hbase.apache.org

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache

2011-06-24 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054639#comment-13054639
 ] 

Jonathan Gray commented on HBASE-4027:
--

In the new HFile v2 over in HBASE-3857 the block cache interface changes from 
ByteBuffer to HeapSize.  So you can now put anything you want into the cache 
that implements HeapSize (there is a new HFileBlock that is used in HFile v2).

One big question is whether you're going to make copies out of the direct byte 
buffers on each read of that block, or if you're going to change KeyValue to 
use the ByteBuffer interface (or some other) instead of the byte[] directly.  
With a DBB you can't get access to an underlying byte[].

 Enable direct byte buffers LruBlockCache
 

 Key: HBASE-4027
 URL: https://issues.apache.org/jira/browse/HBASE-4027
 Project: HBase
  Issue Type: Improvement
Reporter: Jason Rutherglen
Priority: Minor

 Java offers the creation of direct byte buffers which are allocated outside 
 of the heap.
 They need to be manually free'd, which can be accomplished using an 
 documented {{clean}} method.
 The feature will be optional.  After implementing, we can benchmark for 
 differences in speed and garbage collection observances.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4018) Attach memcached as secondary block cache to regionserver

2011-06-23 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054172#comment-13054172
 ] 

Jonathan Gray commented on HBASE-4018:
--

bq. in many cases the CPU overhead dwarfs (or should) the extra RAM consumption 
from uncompressing into heap space.

This is not necessarily the case.  Many applications see 4-5X compression ratio 
and it means being able to increase your cache capacity by that much.  Some 
applications can also be CPU bound, or the might be IO bound, or they might 
actually be IO bound because they are RAM bound (can't fit working set in 
memory).  In general, it's hard to generalize here I think.

bq. Perhaps it's easily offset with a less intensive comp algorithm.

That's one of the major motivations for an hbase-specific prefix compression 
algorithm

 Attach memcached as secondary block cache to regionserver
 -

 Key: HBASE-4018
 URL: https://issues.apache.org/jira/browse/HBASE-4018
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Li Pi
Assignee: Li Pi

 Currently, block caches are limited by heap size, which is limited by garbage 
 collection times in Java.
 We can get around this by using memcached w/JNI as a secondary block cache. 
 This should be faster than the linux file system's caching, and allow us to 
 very quickly gain access to a high quality slab allocated cache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4017) BlockCache interface should be truly modular

2011-06-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053439#comment-13053439
 ] 

Jonathan Gray commented on HBASE-4017:
--

+1

FYI, in the upcoming HFile v2 stuff, there is a change in the block cache 
interface so that instead of ByteBuffer it takes HeapSize (so basically, any 
heap-size-aware structure).

 BlockCache interface should be truly modular
 

 Key: HBASE-4017
 URL: https://issues.apache.org/jira/browse/HBASE-4017
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Li Pi

 Currently, the if the BlockCache that used isn't an LruBlockCache, somewhere 
 in metrics will try to cast it to an LruBlockCache and cause an exception. 
 The code should be modular enough to allow for the use of different block 
 caches without throwing an exception.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4018) Attach memcached as secondary block cache to regionserver

2011-06-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053446#comment-13053446
 ] 

Jonathan Gray commented on HBASE-4018:
--

The perf gain over the FS caching would be less-so if using short-circuited 
local reads.  But anything that bypasses the DataNode is great for random read 
perf.

Even still, making a copy out of in-process memory should be faster than linux 
fs caching.

 Attach memcached as secondary block cache to regionserver
 -

 Key: HBASE-4018
 URL: https://issues.apache.org/jira/browse/HBASE-4018
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Li Pi
Assignee: Li Pi

 Currently, block caches are limited by heap size, which is limited by garbage 
 collection times in Java.
 We can get around this by using memcached w/JNI as a secondary block cache. 
 This should be faster than the linux file system's caching, and allow us to 
 very quickly gain access to a high quality slab allocated cache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4018) Attach memcached as secondary block cache to regionserver

2011-06-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053486#comment-13053486
 ] 

Jonathan Gray commented on HBASE-4018:
--

bq. Optimal solution would be building a slab allocated block cache within 
java. Use reference counting for a zero copy solution. This is difficult to 
implement and debug though.

I'm working on this.  I think implementing both directions is worthwhile and we 
can run good comparisons (including against linux fs cache + local datanodes).

bq. It would seem best to move in the direction of local HDFS file access and 
allow plugging in the block cache as a point of comparison / legacy.

I think it's best to move in all directions and do comparisons.  I've already 
seen performance differences between fs cache and the actual hbase block cache. 
 There's also compressed vs. decompressed (fs cache will always be compressed)

 Attach memcached as secondary block cache to regionserver
 -

 Key: HBASE-4018
 URL: https://issues.apache.org/jira/browse/HBASE-4018
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Li Pi
Assignee: Li Pi

 Currently, block caches are limited by heap size, which is limited by garbage 
 collection times in Java.
 We can get around this by using memcached w/JNI as a secondary block cache. 
 This should be faster than the linux file system's caching, and allow us to 
 very quickly gain access to a high quality slab allocated cache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3340) Eventually Consistent Secondary Indexing via Coprocessors

2011-06-20 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052307#comment-13052307
 ] 

Jonathan Gray commented on HBASE-3340:
--

I'm not actively working on this but it's also a potential intern project at 
fb.  A code drop on GitHub would be great and maybe we can work together.  
There are quite a few alternative directions to go for indexing.  And an 
endless amount of development that could be done around APIs, schemas, filters, 
etc.  So the more the merrier.

The basic design I was thinking would be something similar to google percolator 
or what the Lily guys are doing 
(http://www.lilyproject.org/lily/about/playground/hbaserowlog/version/1)

 Eventually Consistent Secondary Indexing via Coprocessors
 -

 Key: HBASE-3340
 URL: https://issues.apache.org/jira/browse/HBASE-3340
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Jonathan Gray
Assignee: Jonathan Gray

 Secondary indexing support via coprocessors with an eventual consistency 
 guarantee.  Design to come.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3945) Load balancer shouldn't move the same region in two consective balancing actions

2011-06-03 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044017#comment-13044017
 ] 

Jonathan Gray commented on HBASE-3945:
--

I worry about this approach of more and more knobs, especially when they don't 
directly address what a good/bad load balance really is.

If a region gets moved in two consecutive balancing actions, then something is 
wrong with the balancer in the first place.  While I agree in principle that 
regions moving multiple times and quickly is not desirable, this will be a 
common outcome if the balancing algorithm isn't already taking into account 
metrics over time (rather than short snapshots).  If we're using load but then 
adding all these limits/controls, it's hard to ever understand the behavior of 
the balancer.

 Load balancer shouldn't move the same region in two consective balancing 
 actions
 

 Key: HBASE-3945
 URL: https://issues.apache.org/jira/browse/HBASE-3945
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu

 Keeping a region on the same region server would give good stability for 
 active scanners.
 We shouldn't reassign the same region in two successive calls to 
 balanceCluster().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3947) SplitLog in HMaster spend long time, move it to regionserver

2011-06-01 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray resolved HBASE-3947.
--

Resolution: Duplicate

This was implemented over in HBASE-1364 and committed into trunk.

 SplitLog in HMaster spend long time, move it to regionserver
 

 Key: HBASE-3947
 URL: https://issues.apache.org/jira/browse/HBASE-3947
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, zookeeper
Reporter: mingjian
 Fix For: 0.90.4


 One of our 100 nodes cluster crashed by namenode crash.
 We restarted and found it spend about two and a half hours to split hlogs.
 After crashed, there are about 3,500 hfiles in /hbase/.logs/. Split 1 of them 
 need about 2~3 seconds.
 SplitLog works in a single thread of HMaster. Why not move it to 
 regionservers? And HMaster only creates split plans and notifies regionserver 
 through zookeeper.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3732) New configuration option for client-side compression

2011-06-01 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042595#comment-13042595
 ] 

Jonathan Gray commented on HBASE-3732:
--

I agree that value compression is easily done at the application level.  In 
cases where you have very large values, compressing that data is something you 
should always be thinking about.

Published or contributed code samples could go a long way.  Are there things we 
could add in Put/Get to make this kind of stuff easily pluggable?

If it can be integrated simply, then this might be okay, but it should probably 
be part of a larger conversation about compression.  And anything that touches 
KV needs to be thought through.

I think there could be some substantial savings in hbase-specific prefix or 
row/family/qualifier compression, both on-disk and in-memory.  One idea there 
would require some complicating of KeyValue and its comparator, or a simpler 
solution would require short-term memory allocations to reconstitute KVs as 
they make their way through the KVHeap/KVScanner.

I've also done some work on supporting a two-level compressed/uncompressed 
block cache patch (with lzo).  I'm waiting to finish until HBASE-3857 goes in 
as it adds some things that make life easier in the HFile code.

 New configuration option for client-side compression
 

 Key: HBASE-3732
 URL: https://issues.apache.org/jira/browse/HBASE-3732
 Project: HBase
  Issue Type: New Feature
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0

 Attachments: compressed_streams.jar


 We have a case here where we have to store very fat cells (arrays of 
 integers) which can amount into the hundreds of KBs that we need to read 
 often, concurrently, and possibly keep in cache. Compressing the values on 
 the client using java.util.zip's Deflater before sending them to HBase proved 
 to be in our case almost an order of magnitude faster.
 There reasons are evident: less data sent to hbase, memstore contains 
 compressed data, block cache contains compressed data too, etc.
 I was thinking that it might be something useful to add to a family schema, 
 so that Put/Result do the conversion for you. The actual compression algo 
 should also be configurable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3725) HBase increments from old value after delete and write to disk

2011-04-28 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3725:
-

Attachment: HBASE-3725-v3.patch

This fixes the problem in the only simple way I could think of.

A new configuration option is added hbase.hregion.increment.supportdeletes 
which defaults to true (because it is required for correctness).

When this option is true, then when the scan against StoreFiles is done, it 
will also include the MemStore.  This should ensure correctness for cases where 
delete markers are present in the MemStore that need to apply to KVs in the 
StoreFiles.

I made this a configuration option because it makes increment operations less 
optimal, so for increment workloads that do not need to support deletes, they 
can keep the option turned off and avoid the double scan of the MemStore.

A potential optimal and correct solution to this could be to use the old Get 
delete tracker which would retain delete information across files (for in-order 
file processing rather than one mega merge).  Some work is going into 
re-integrating those, so if they do make it back in the HBase, we could utilize 
them here.

This should suffice for now.

 HBase increments from old value after delete and write to disk
 --

 Key: HBASE-3725
 URL: https://issues.apache.org/jira/browse/HBASE-3725
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.1
Reporter: Nathaniel Cook
 Attachments: HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, 
 HBASE-3725.patch


 Deleted row values are sometimes used for starting points on new increments.
 To reproduce:
 Create a row r. Set column x to some default value.
 Force hbase to write that value to the file system (such as restarting the 
 cluster).
 Delete the row.
 Call table.incrementColumnValue with some_value
 Get the row.
 The returned value in the column was incremented from the old value before 
 the row was deleted instead of being initialized to some_value.
 Code to reproduce:
 {code}
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.HColumnDescriptor;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Get;
 import org.apache.hadoop.hbase.client.HBaseAdmin;
 import org.apache.hadoop.hbase.client.HTableInterface;
 import org.apache.hadoop.hbase.client.HTablePool;
 import org.apache.hadoop.hbase.client.Increment;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.util.Bytes;
 public class HBaseTestIncrement
 {
   static String tableName  = testIncrement;
   static byte[] infoCF = Bytes.toBytes(info);
   static byte[] rowKey = Bytes.toBytes(test-rowKey);
   static byte[] newInc = Bytes.toBytes(new);
   static byte[] oldInc = Bytes.toBytes(old);
   /**
* This code reproduces a bug with increment column values in hbase
* Usage: First run part one by passing '1' as the first arg
*Then restart the hbase cluster so it writes everything to disk
*Run part two by passing '2' as the first arg
*
* This will result in the old deleted data being found and used for 
 the increment calls
*
* @param args
* @throws IOException
*/
   public static void main(String[] args) throws IOException
   {
   if(1.equals(args[0]))
   partOne();
   if(2.equals(args[0]))
   partTwo();
   if (both.equals(args[0]))
   {
   partOne();
   partTwo();
   }
   }
   /**
* Creates a table and increments a column value 10 times by 10 each 
 time.
* Results in a value of 100 for the column
*
* @throws IOException
*/
   static void partOne()throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HBaseAdmin admin = new HBaseAdmin(conf);
   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
   tableDesc.addFamily(new HColumnDescriptor(infoCF));
   if(admin.tableExists(tableName))
   {
   admin.disableTable(tableName);
   admin.deleteTable(tableName);
   }
   admin.createTable(tableDesc);
   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
   //Increment unitialized column
   for (int j = 0; j  10; j++)
  

[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-04-18 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021160#comment-13021160
 ] 

Jonathan Gray commented on HBASE-1364:
--

Great work Prakash!

 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: 1364-v5.txt, HBASE-1364.patch, 
 org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2256) Delete row, followed quickly to put of the same row will sometimes fail.

2011-04-07 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017148#comment-13017148
 ] 

Jonathan Gray commented on HBASE-2256:
--

I think this would be a hacky non-solution, regardless of whether it's epoch 
nanos or not.

 Delete row, followed quickly to put of the same row will sometimes fail.
 

 Key: HBASE-2256
 URL: https://issues.apache.org/jira/browse/HBASE-2256
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.20.3
Reporter: Clint Morgan
 Attachments: hbase-2256.patch


 Doing a Delete of a whole row, followed immediately by a put to that row will 
 sometimes miss a cell. Attached is a test to provoke the issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3729) Get cells via shell with a time range predicate

2011-04-05 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016245#comment-13016245
 ] 

Jonathan Gray commented on HBASE-3729:
--

I think the default behavior of the shell should be the default behavior of the 
client, which is 1 version unless specified otherwise.  Specifying a time range 
and wanting the most recent from within that range is a valid and somewhat 
common use case.

 Get cells via shell with a time range predicate
 ---

 Key: HBASE-3729
 URL: https://issues.apache.org/jira/browse/HBASE-3729
 Project: HBase
  Issue Type: New Feature
  Components: shell
Reporter: Eric Charles
Assignee: Ted Yu
 Attachments: 3729-v2.txt, 3729-v3.txt, 3729.txt


 HBase shell allows to specify a timestamp to get a value
 - get 't1', 'r1', {COLUMN = 'c1', TIMESTAMP = ts1}
 If you don't give the exact timestamp, you get nothing... so it's difficult 
 to get the cell previous versions.
 It would be fine to have a time range predicate based get.
 The shell syntax could be (depending on technical feasibility)
 - get 't1', 'r1', {COLUMN = 'c1', TIMERANGE = (start_timestamp, 
 end_timestamp)}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3729) Get cells via shell with a time range predicate

2011-04-05 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016248#comment-13016248
 ] 

Jonathan Gray commented on HBASE-3729:
--

HTable (Get/Scan) default is 1 version not 3 versions.  I think you are 
thinking of the HColumnDescriptor default.

 Get cells via shell with a time range predicate
 ---

 Key: HBASE-3729
 URL: https://issues.apache.org/jira/browse/HBASE-3729
 Project: HBase
  Issue Type: New Feature
  Components: shell
Reporter: Eric Charles
Assignee: Ted Yu
 Attachments: 3729-v2.txt, 3729-v3.txt, 3729-v4.txt, 3729.txt


 HBase shell allows to specify a timestamp to get a value
 - get 't1', 'r1', {COLUMN = 'c1', TIMESTAMP = ts1}
 If you don't give the exact timestamp, you get nothing... so it's difficult 
 to get the cell previous versions.
 It would be fine to have a time range predicate based get.
 The shell syntax could be (depending on technical feasibility)
 - get 't1', 'r1', {COLUMN = 'c1', TIMERANGE = (start_timestamp, 
 end_timestamp)}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk

2011-04-04 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015683#comment-13015683
 ] 

Jonathan Gray commented on HBASE-3725:
--

Hey Nathaniel.  Thanks for posting the unit test!

I will take a look at this sometime this week and try to get a fix out for it.

 HBase increments from old value after delete and write to disk
 --

 Key: HBASE-3725
 URL: https://issues.apache.org/jira/browse/HBASE-3725
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.1
Reporter: Nathaniel Cook
 Attachments: HBASE-3725.patch


 Deleted row values are sometimes used for starting points on new increments.
 To reproduce:
 Create a row r. Set column x to some default value.
 Force hbase to write that value to the file system (such as restarting the 
 cluster).
 Delete the row.
 Call table.incrementColumnValue with some_value
 Get the row.
 The returned value in the column was incremented from the old value before 
 the row was deleted instead of being initialized to some_value.
 Code to reproduce:
 {code}
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.HColumnDescriptor;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Get;
 import org.apache.hadoop.hbase.client.HBaseAdmin;
 import org.apache.hadoop.hbase.client.HTableInterface;
 import org.apache.hadoop.hbase.client.HTablePool;
 import org.apache.hadoop.hbase.client.Increment;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.util.Bytes;
 public class HBaseTestIncrement
 {
   static String tableName  = testIncrement;
   static byte[] infoCF = Bytes.toBytes(info);
   static byte[] rowKey = Bytes.toBytes(test-rowKey);
   static byte[] newInc = Bytes.toBytes(new);
   static byte[] oldInc = Bytes.toBytes(old);
   /**
* This code reproduces a bug with increment column values in hbase
* Usage: First run part one by passing '1' as the first arg
*Then restart the hbase cluster so it writes everything to disk
*Run part two by passing '2' as the first arg
*
* This will result in the old deleted data being found and used for 
 the increment calls
*
* @param args
* @throws IOException
*/
   public static void main(String[] args) throws IOException
   {
   if(1.equals(args[0]))
   partOne();
   if(2.equals(args[0]))
   partTwo();
   if (both.equals(args[0]))
   {
   partOne();
   partTwo();
   }
   }
   /**
* Creates a table and increments a column value 10 times by 10 each 
 time.
* Results in a value of 100 for the column
*
* @throws IOException
*/
   static void partOne()throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HBaseAdmin admin = new HBaseAdmin(conf);
   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
   tableDesc.addFamily(new HColumnDescriptor(infoCF));
   if(admin.tableExists(tableName))
   {
   admin.disableTable(tableName);
   admin.deleteTable(tableName);
   }
   admin.createTable(tableDesc);
   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
   //Increment unitialized column
   for (int j = 0; j  10; j++)
   {
   table.incrementColumnValue(rowKey, infoCF, oldInc, 
 (long)10);
   Increment inc = new Increment(rowKey);
   inc.addColumn(infoCF, newInc, (long)10);
   table.increment(inc);
   }
   Get get = new Get(rowKey);
   Result r = table.get(get);
   System.out.println(initial values: new  + 
 Bytes.toLong(r.getValue(infoCF, newInc)) +  old  + 
 Bytes.toLong(r.getValue(infoCF, oldInc)));
   }
   /**
* First deletes the data then increments the column 10 times by 1 each 
 time
*
* Should result in a value of 10 but it doesn't, it results in a 
 values of 110
*
* @throws IOException
*/
   static void partTwo()throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HTablePool pool = new 

[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match

2011-04-01 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014748#comment-13014748
 ] 

Jonathan Gray commented on HBASE-3562:
--

Thanks for looking into this Evert.  This is definitely some tricky stuff.

A few comments on your patch...

- Our convention in conditionals is to put the variable first.  I find it a 
little tricky to read the code when the constant is first.  For example:
{code}
if (MatchCode.INCLUDE == mc)
{code}
should be
{code}
if (mc == MatchCode.INCLUDE)
{code}
(And all the other places where you have this type of logic)

- The unit test {{TestColumnMatchAndFilterOrder}} is clever how you check 
correctness, but I think it would be good to actually do a read query and 
verify the results for a few different combinations of the query to prove 
correctness of the overall algorithm.  Other changes to SQM down the road might 
change more behavior / order of operations, so this test may no longer apply or 
give full coverage for correctness.  Having some tests which don't rely on the 
precise server-side interactions but rather confirm the end results will be 
more applicable as we move forward.

- You have some lines that are  80 characters, especially in some of the 
javadoc.  Just wrap that so all lines are = 80 chars.

- There was a comment in SQM that described why the filter was checked first.  
Can you write some inline comments to describe how this works now?  There are a 
couple lines at the end but it will be useful to have some explanation on why 
this has changed and what the behavior is now.

- Is there any particular reason that you had includeLatestColumn take 
timestamp as a parameter?  The timestamp is passed in the check call, and we 
could just hang on to that.  It just feels a little strange to me since you 
should never pass a different timestamp, and the tracker can know which was the 
latest column.

Overall this is really solid!  Great work Evert!

 ValueFilter is being evaluated before performing the column match
 -

 Key: HBASE-3562
 URL: https://issues.apache.org/jira/browse/HBASE-3562
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.90.0
Reporter: Evert Arckens
 Attachments: HBASE-3562.patch


 When performing a Get operation where a both a column is specified and a 
 ValueFilter, the ValueFilter is evaluated before making the column match as 
 is indicated in the javadoc of Get.setFilter()  :  {@link 
 Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column 
 match, deletes and max versions have been run. 
 The is shown in the little test below, which uses a TestComparator extending 
 a WritableByteArrayComparable.
 public void testFilter() throws Exception {
   byte[] cf = Bytes.toBytes(cf);
   byte[] row = Bytes.toBytes(row);
   byte[] col1 = Bytes.toBytes(col1);
   byte[] col2 = Bytes.toBytes(col2);
   Put put = new Put(row);
   put.add(cf, col1, new byte[]{(byte)1});
   put.add(cf, col2, new byte[]{(byte)2});
   table.put(put);
   Get get = new Get(row);
   get.addColumn(cf, col2); // We only want to retrieve col2
   TestComparator testComparator = new TestComparator();
   Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator);
   get.setFilter(filter);
   Result result = table.get(get);
 }
 public class TestComparator extends WritableByteArrayComparable {
 /**
  * Nullary constructor, for Writable
  */
 public TestComparator() {
 super();
 }
 
 @Override
 public int compareTo(byte[] theirValue) {
 if (theirValue[0] == (byte)1) {
 // If the column match was done before evaluating the filter, we 
 should never get here.
 throw new RuntimeException(I only expect (byte)2 in col2, not 
 (byte)1 from col1);
 }
 if (theirValue[0] == (byte)2) {
 return 0;
 }
 else return 1;
 }
 }
 When only one column should be retrieved, this can be worked around by using 
 a SingleColumnValueFilter instead of the ValueFilter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match

2011-03-25 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011248#comment-13011248
 ] 

Jonathan Gray commented on HBASE-3562:
--

The counter in ColumnTracker is responsible for tracking setMaxVersions.  You 
may have queried for only the latest version, so once the ColumnTracker sees a 
given column, it will reject subsequent version of that columns.  Currently 
there's no way for the CT to know that subsequent filters actually prevented it 
from being returned so it should not be included in the count of returned 
versions.

We would need to introduce something like {{skippedPreviousKeyValue}} that 
could be sent back to the CT so it could undo the previous count.

 ValueFilter is being evaluated before performing the column match
 -

 Key: HBASE-3562
 URL: https://issues.apache.org/jira/browse/HBASE-3562
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.90.0
Reporter: Evert Arckens

 When performing a Get operation where a both a column is specified and a 
 ValueFilter, the ValueFilter is evaluated before making the column match as 
 is indicated in the javadoc of Get.setFilter()  :  {@link 
 Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column 
 match, deletes and max versions have been run. 
 The is shown in the little test below, which uses a TestComparator extending 
 a WritableByteArrayComparable.
 public void testFilter() throws Exception {
   byte[] cf = Bytes.toBytes(cf);
   byte[] row = Bytes.toBytes(row);
   byte[] col1 = Bytes.toBytes(col1);
   byte[] col2 = Bytes.toBytes(col2);
   Put put = new Put(row);
   put.add(cf, col1, new byte[]{(byte)1});
   put.add(cf, col2, new byte[]{(byte)2});
   table.put(put);
   Get get = new Get(row);
   get.addColumn(cf, col2); // We only want to retrieve col2
   TestComparator testComparator = new TestComparator();
   Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator);
   get.setFilter(filter);
   Result result = table.get(get);
 }
 public class TestComparator extends WritableByteArrayComparable {
 /**
  * Nullary constructor, for Writable
  */
 public TestComparator() {
 super();
 }
 
 @Override
 public int compareTo(byte[] theirValue) {
 if (theirValue[0] == (byte)1) {
 // If the column match was done before evaluating the filter, we 
 should never get here.
 throw new RuntimeException(I only expect (byte)2 in col2, not 
 (byte)1 from col1);
 }
 if (theirValue[0] == (byte)2) {
 return 0;
 }
 else return 1;
 }
 }
 When only one column should be retrieved, this can be worked around by using 
 a SingleColumnValueFilter instead of the ValueFilter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function

2011-03-25 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011452#comment-13011452
 ] 

Jonathan Gray commented on HBASE-3694:
--

Do we really want to put things like this into RegionServerMetrics?  That class 
is a mess and is currently only used for the publishing of our metrics (not 
used for internal state tracking).  And we should avoid the hadoop Metrics* 
classes like the plague... heavily synchronized and generally confusing.

My vote would be to add a new class, maybe {{RegionServerHeapManager}} or 
something like that... might be a good opportunity to cleanup and centralize 
the code related to that.  But could just hold this one AtomicLong for now.  
Agree that adding a new interface method just for the long is not ideal since 
it buys us nothing down the road.  Better to add something new that we can use 
later.

 high multiput latency due to checking global mem store size in a synchronized 
 function
 --

 Key: HBASE-3694
 URL: https://issues.apache.org/jira/browse/HBASE-3694
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, 
 Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch


 The problem is we found the multiput latency is very high.
 In our case, we have almost 22 Regions in each RS and there are no flush 
 happened during these puts.
 After investigation, we believe that the root cause is the function 
 getGlobalMemStoreSize, which is to check the high water mark of mem store. 
 This function takes almost 40% of total execution time of multiput when 
 instrumenting some metrics in the code.  
 The actual percentage may be more higher. The execution time is spent on 
 synchronize contention.
 One solution is to keep a static var in HRegion to keep the global MemStore 
 size instead of calculating them every time.
 Why using static variable?
 Since all the HRegion objects in the same JVM share the same memory heap, 
 they need to share fate as well.
 The static variable, globalMemStroeSize, naturally shows the total mem usage 
 in this shared memory heap for this JVM.
 If multiple RS need to run in the same JVM, they still need only one 
 globalMemStroeSize.
 If multiple RS run on different JVMs, everything is fine.
 After changing, in our cases, the avg multiput latency decrease from 60ms to 
 10ms.
 I will submit a patch based on the current trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3669) Region in PENDING_OPEN keeps being bounced between RS and master

2011-03-24 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010796#comment-13010796
 ] 

Jonathan Gray commented on HBASE-3669:
--

When I've seen this happen, there has been another RS cutting in and 
transferring to OPENING.

As someone in the other JIRA indicates, this kind of thing can happen when one 
of the RS is unable to open the region because it doesn't have the proper 
compression lib or some DFS error.

If the master successfully transfers to OFFLINE and the RS sees it as OPENING, 
then almost certainly there's another RS that has gotten in the way.

The contents of the RIT znode actually contains serverName, so we should 
probably add additional debug information when the state transfer fails.  
(Unable to go from OFFLINE to OPENING because already in OPENING by server 
#serverName#)

 Region in PENDING_OPEN keeps being bounced between RS and master
 

 Key: HBASE-3669
 URL: https://issues.apache.org/jira/browse/HBASE-3669
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.90.2


 After going crazy killing region servers after HBASE-3668, most of the 
 cluster recovered except for 3 regions that kept being refused by the region 
 servers.
 One the master I would see:
 {code}
 2011-03-17 22:23:14,828 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  state=PENDING_OPEN, ts=1300400554826
 2011-03-17 22:23:14,828 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_OPEN for too long, reassigning 
 region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  state=PENDING_OPEN, ts=1300400554826
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for 
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  so generated a random one; 
 hri=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.,
  src=, dest=sv2borg171,60020,1300399357135; 17 (online=17, exclude=null) 
 available servers
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  to sv2borg171,60020,1300399357135
 {code}
 Then on the region server:
 {code}
 2011-03-17 22:23:14,829 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x22d627c142707d2 Attempting to transition node 
 f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to 
 RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 regionserver:60020-0x22d627c142707d2 Retrieved 166 byte(s) of data from znode 
 /hbase/unassigned/f11849557c64c4efdbe0498f3fe97a21; 
 data=region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.,
  server=sv2borg180,60020,1300384550966, state=RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x22d627c142707d2 Attempt to transition the unassigned 
 node for f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to 
 RS_ZK_REGION_OPENING failed, the node existed but was in the state 
 RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
 transition from OFFLINE to OPENING for region=f11849557c64c4efdbe0498f3fe97a21
 {code}
 I'm not sure I fully understand what was going on... the master was suppose 
 to OFFLINE the znode but then that's not what the region server was seeing? 
 In any case, I was able to recover by doing a force unassign for each region 
 and then assign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3669) Region in PENDING_OPEN keeps being bounced between RS and master

2011-03-24 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3669:
-

Attachment: HBASE-3669-debug-v1.patch

Adds more debug

 Region in PENDING_OPEN keeps being bounced between RS and master
 

 Key: HBASE-3669
 URL: https://issues.apache.org/jira/browse/HBASE-3669
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.90.2

 Attachments: HBASE-3669-debug-v1.patch


 After going crazy killing region servers after HBASE-3668, most of the 
 cluster recovered except for 3 regions that kept being refused by the region 
 servers.
 One the master I would see:
 {code}
 2011-03-17 22:23:14,828 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  state=PENDING_OPEN, ts=1300400554826
 2011-03-17 22:23:14,828 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_OPEN for too long, reassigning 
 region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  state=PENDING_OPEN, ts=1300400554826
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for 
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  so generated a random one; 
 hri=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.,
  src=, dest=sv2borg171,60020,1300399357135; 17 (online=17, exclude=null) 
 available servers
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  to sv2borg171,60020,1300399357135
 {code}
 Then on the region server:
 {code}
 2011-03-17 22:23:14,829 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x22d627c142707d2 Attempting to transition node 
 f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to 
 RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 regionserver:60020-0x22d627c142707d2 Retrieved 166 byte(s) of data from znode 
 /hbase/unassigned/f11849557c64c4efdbe0498f3fe97a21; 
 data=region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.,
  server=sv2borg180,60020,1300384550966, state=RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x22d627c142707d2 Attempt to transition the unassigned 
 node for f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to 
 RS_ZK_REGION_OPENING failed, the node existed but was in the state 
 RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
 transition from OFFLINE to OPENING for region=f11849557c64c4efdbe0498f3fe97a21
 {code}
 I'm not sure I fully understand what was going on... the master was suppose 
 to OFFLINE the znode but then that's not what the region server was seeing? 
 In any case, I was able to recover by doing a force unassign for each region 
 and then assign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3627) NPE in EventHandler when region already reassigned

2011-03-24 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010807#comment-13010807
 ] 

Jonathan Gray commented on HBASE-3627:
--

looks good, +1

 NPE in EventHandler when region already reassigned
 --

 Key: HBASE-3627
 URL: https://issues.apache.org/jira/browse/HBASE-3627
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Critical
 Fix For: 0.90.2

 Attachments: 3627.txt


 When a region takes too long to open, it will try to update the unassigned 
 znode and will fail on an ugly NPE like this:
 {quote}
 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x22dc571dde04ca7 Attempting to transition node 
 0519dc3b62a569347526875048c37faa from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 regionserver:60020-0x22dc571dde04ca7 Unable to get data of znode 
 /hbase/unassigned/0519dc3b62a569347526875048c37faa because node does not 
 exist (not necessarily an error)
 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while 
 processing event M_RS_OPEN_REGION
 java.lang.NullPointerException
   at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
   at 
 org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.retransitionNodeOpening(ZKAssign.java:585)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.tickleOpening(OpenRegionHandler.java:322)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:97)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 {quote}
 I think the region server in this case should be closing the region ASAP.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3654) Weird blocking between getOnlineRegion and createRegionLoad

2011-03-24 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010811#comment-13010811
 ] 

Jonathan Gray commented on HBASE-3654:
--

I'm late to the conversation, but have also seen contention on the onlineRegion 
map.  Changing to CHM helped.

 Weird blocking between getOnlineRegion and createRegionLoad
 ---

 Key: HBASE-3654
 URL: https://issues.apache.org/jira/browse/HBASE-3654
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.2

 Attachments: ConcurrentHM, ConcurrentSKLM, CopyOnWrite, 
 HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_COWAL.patch,
  
 HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_COWAL1.patch,
  
 HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_ConcurrentHM.patch,
  TestOnlineRegions.java, hashmap


 Saw this when debugging something else:
 {code}
 regionserver60020 prio=10 tid=0x7f538c1c nid=0x4c7 runnable 
 [0x7f53931da000]
java.lang.Thread.State: RUNNABLE
   at 
 org.apache.hadoop.hbase.regionserver.Store.getStorefilesIndexSize(Store.java:1380)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:916)
   - locked 0x000672aa0a00 (a 
 java.util.concurrent.ConcurrentSkipListMap)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:767)
   - locked 0x000656f62710 (a java.util.HashMap)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:722)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:591)
   at java.lang.Thread.run(Thread.java:662)
 IPC Reader 9 on port 60020 prio=10 tid=0x7f538c1be000 nid=0x4c6 waiting 
 for monitor entry [0x7f53932db000]
java.lang.Thread.State: BLOCKED (on object monitor)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getFromOnlineRegions(HRegionServer.java:2295)
   - waiting to lock 0x000656f62710 (a java.util.HashMap)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getOnlineRegion(HRegionServer.java:2307)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2333)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.isMetaRegion(HRegionServer.java:379)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:422)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:361)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer.getQosLevel(HBaseServer.java:1126)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:982)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
   - locked 0x000656e60068 (a 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 ...
 IPC Reader 0 on port 60020 prio=10 tid=0x7f538c08b000 nid=0x4bd waiting 
 for monitor entry [0x7f5393be4000]
java.lang.Thread.State: BLOCKED (on object monitor)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getFromOnlineRegions(HRegionServer.java:2295)
   - waiting to lock 0x000656f62710 (a java.util.HashMap)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getOnlineRegion(HRegionServer.java:2307)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2333)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.isMetaRegion(HRegionServer.java:379)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:422)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:361)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer.getQosLevel(HBaseServer.java:1126)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:982)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
   at 
 

[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function

2011-03-24 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010983#comment-13010983
 ] 

Jonathan Gray commented on HBASE-3694:
--

Neither of these seem right.  Issue with adding another method for this?

 high multiput latency due to checking global mem store size in a synchronized 
 function
 --

 Key: HBASE-3694
 URL: https://issues.apache.org/jira/browse/HBASE-3694
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 The problem is we found the multiput latency is very high.
 In our case, we have almost 22 Regions in each RS and there are no flush 
 happened during these puts.
 After investigation, we believe that the root cause is the function 
 getGlobalMemStoreSize, which is to check the high water mark of mem store. 
 This function takes almost 40% of total execution time of multiput when 
 instrumenting some metrics in the code.  
 The actual percentage may be more higher. The execution time is spent on 
 synchronize contention.
 One solution is to keep a static var in HRegion to keep the global MemStore 
 size instead of calculating them every time.
 Why using static variable?
 Since all the HRegion objects in the same JVM share the same memory heap, 
 they need to share fate as well.
 The static variable, globalMemStroeSize, naturally shows the total mem usage 
 in this shared memory heap for this JVM.
 If multiple RS need to run in the same JVM, they still need only one 
 globalMemStroeSize.
 If multiple RS run on different JVMs, everything is fine.
 After changing, in our cases, the avg multiput latency decrease from 60ms to 
 10ms.
 I will submit a patch based on the current trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3052) Add ability to have multiple ZK servers in a quorum in MiniZooKeeperCluster for test writing

2011-03-24 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011070#comment-13011070
 ] 

Jonathan Gray commented on HBASE-3052:
--

How the heck do you re-open a task on this new jira? :)

 Add ability to have multiple ZK servers in a quorum in MiniZooKeeperCluster 
 for test writing
 

 Key: HBASE-3052
 URL: https://issues.apache.org/jira/browse/HBASE-3052
 Project: HBase
  Issue Type: Improvement
  Components: test, zookeeper
Reporter: Jonathan Gray
Assignee: Liyin Tang
Priority: Minor
 Attachments: HBASE_3052[r1083993].patch, HBASE_3052[r1084033].patch


 Interesting things can happen when you have a ZK quorum of multiple servers 
 and one of them dies.  Doing testing here on clusters, this has turned up 
 some bugs with HBase interaction with ZK.
 Would be good to add the ability to have multiple ZK servers in unit tests 
 and be able to kill them individually.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3691) Add compressor support for 'snappy', google's compressor

2011-03-23 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010234#comment-13010234
 ] 

Jonathan Gray commented on HBASE-3691:
--

It's slightly faster for both compression and decompression when compared to 
LZO (169/434 vs. 250/500).

I'm unsure of the difference in compression ratios but we can ship with it, yay

 Add compressor support for 'snappy', google's compressor
 

 Key: HBASE-3691
 URL: https://issues.apache.org/jira/browse/HBASE-3691
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Critical
 Fix For: 0.92.0


 http://code.google.com/p/snappy/ is apache licensed.
 bq. Snappy is a compression/decompression library. It does not aim for 
 maximum compression, or compatibility with any other compression library; 
 instead, it aims for very high speeds and reasonable compression. For 
 instance, compared to the fastest mode of zlib, Snappy is an order of 
 magnitude faster for most inputs, but the resulting compressed files are 
 anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 
 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses 
 at about 500 MB/sec or more.
 bq. Snappy is widely used inside Google, in everything from BigTable and 
 MapReduce to our internal RPC systems. (Snappy has previously been referred 
 to as Zippy in some presentations and the likes.)
 Lets get it in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3693) isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase

2011-03-23 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010252#comment-13010252
 ] 

Jonathan Gray commented on HBASE-3693:
--

+1 on caching this.  Good stuff!

 isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase
 --

 Key: HBASE-3693
 URL: https://issues.apache.org/jira/browse/HBASE-3693
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Liyin Tang

 We noticed that are lots of listStatus calls on the ColumnFamily directories 
 within each regions, coming from this codepath:
 {code}
 compactionSelection()
  -- isMajorCompaction 
 -- getLowestTimestamp()
--  FileStatus[] stats = fs.listStatus(p);
 {code}
 So on every compactionSelection() we're taking this hit. While not 
 immediately an issue, just from log inspection, this accounts for quite a 
 large number of RPCs to namenode at the moment and seems like an unnecessary 
 load to be sending to the namenode.
 Seems like it would be easy to cache the timestamp for each opened/created 
 StoreFile, in memory, in the region server, and avoid going to DFS each time 
 for this information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3687) Bulk assign on startup should handle a ServerNotRunningException

2011-03-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009842#comment-13009842
 ] 

Jonathan Gray commented on HBASE-3687:
--

Shouldn't the RS not check in to the master with an RPC until it is available?

 Bulk assign on startup should handle a ServerNotRunningException
 

 Key: HBASE-3687
 URL: https://issues.apache.org/jira/browse/HBASE-3687
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.90.2

 Attachments: 3687.txt


 On startup, we do bulk assign.  At the moment, if any problem during bulk 
 assign, we consider startup failed and expectation is that you need to retry 
 (We need to make this better but that is not what this issue is about).  One 
 exception that we should handle is the case where a RS is slow coming up and 
 its rpc is not yet up listening.  In this case it will throw: 
 ServerNotRunningException.  We should retry at least this one exception 
 during bulk assign.
 We had this happen to us starting up a prod cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3687) Bulk assign on startup should handle a ServerNotRunningException

2011-03-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009843#comment-13009843
 ] 

Jonathan Gray commented on HBASE-3687:
--

and weren't we just saying that we should not be putting in Thread.sleeps ;)

 Bulk assign on startup should handle a ServerNotRunningException
 

 Key: HBASE-3687
 URL: https://issues.apache.org/jira/browse/HBASE-3687
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.90.2

 Attachments: 3687.txt


 On startup, we do bulk assign.  At the moment, if any problem during bulk 
 assign, we consider startup failed and expectation is that you need to retry 
 (We need to make this better but that is not what this issue is about).  One 
 exception that we should handle is the case where a RS is slow coming up and 
 its rpc is not yet up listening.  In this case it will throw: 
 ServerNotRunningException.  We should retry at least this one exception 
 during bulk assign.
 We had this happen to us starting up a prod cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3687) Bulk assign on startup should handle a ServerNotRunningException

2011-03-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009850#comment-13009850
 ] 

Jonathan Gray commented on HBASE-3687:
--

I think it's fine for now.  The real fix should be having the RS not check in 
the master until it is fully online (agree, outside scope of this jira).

 Bulk assign on startup should handle a ServerNotRunningException
 

 Key: HBASE-3687
 URL: https://issues.apache.org/jira/browse/HBASE-3687
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.90.2

 Attachments: 3687.txt


 On startup, we do bulk assign.  At the moment, if any problem during bulk 
 assign, we consider startup failed and expectation is that you need to retry 
 (We need to make this better but that is not what this issue is about).  One 
 exception that we should handle is the case where a RS is slow coming up and 
 its rpc is not yet up listening.  In this case it will throw: 
 ServerNotRunningException.  We should retry at least this one exception 
 during bulk assign.
 We had this happen to us starting up a prod cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1755) Putting 'Meta' table into ZooKeeper

2011-03-21 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009290#comment-13009290
 ] 

Jonathan Gray commented on HBASE-1755:
--

I generally agree that we should store temporary data in ZK, but I see META as 
largely temporary.

Table/region meta data is already persisted on HDFS (we don't properly update, 
but that can be fixed without much trouble).  And we have plans to move schema 
and configuration information into ZK for online changes, so at least on a 
running cluster, we'll be depending on ZK for region configuration.

Otherwise, META is largely for locations.

I also think the possibility exists to keep a META region but maintain region 
locations in ZK.

In general, the special casing and exception handling around the reading and 
updating of META is extraordinarily painful both in the master and in the 
regionservers.

 Putting 'Meta' table into ZooKeeper
 ---

 Key: HBASE-1755
 URL: https://issues.apache.org/jira/browse/HBASE-1755
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.0
Reporter: Erik Holstad
 Fix For: 0.92.0


 Moving to 0.22.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3322) HLog sync slowdown under heavy load with HBASE-2467

2011-03-21 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray resolved HBASE-3322.
--

Resolution: Won't Fix

There is an issue here but upon further investigation, it's not really a bug.

The issue is around heavy concurrency / high number of threads in HLog.  The 
current behavior is that each thread does a notify to the LogSyncer and then 
does a wait on a single object.  The LogSyncer waits to be notified, then syncs 
what is pending, and then does a notifyAll to all the threads waiting for their 
sync.

This is a straightforward and correct pattern but under heavy concurrency, the 
fact that all threads are waiting on a single object to be notified becomes a 
bottleneck.

Will open other JIRAs to deal with solutions to this.  Closing this one as this 
is not a blocking bug.

 HLog sync slowdown under heavy load with HBASE-2467
 ---

 Key: HBASE-3322
 URL: https://issues.apache.org/jira/browse/HBASE-3322
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Priority: Blocker
 Fix For: 0.92.0


 Testing HBASE-2467 and HDFS-895 on 100 node cluster w/ a heavy increment 
 workload we experienced significant slowdown.
 Stack traces show that most threads are on HLog.updateLock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2549) Review Trackers (column, delete, etc) on Trunk after 2248 goes in for correctness and optimal earlying-out

2011-03-21 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009425#comment-13009425
 ] 

Jonathan Gray commented on HBASE-2549:
--

punted from 0.92

 Review Trackers (column, delete, etc) on Trunk after 2248 goes in for 
 correctness and optimal earlying-out
 --

 Key: HBASE-2549
 URL: https://issues.apache.org/jira/browse/HBASE-2549
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical

 Once we move to all Scans, the trackers could use a refresh.  There are often 
 times where we return, for example, a MatchCode.SKIP (which just goes to the 
 next KV not including the current one) where we could be sending a more 
 optimal return code like MatchCode.SEEK_NEXT_ROW.
 This is a jira to review all of this code after 2248 goes in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2549) Review Trackers (column, delete, etc) on Trunk after 2248 goes in for correctness and optimal earlying-out

2011-03-21 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009428#comment-13009428
 ] 

Jonathan Gray commented on HBASE-2549:
--

(punting because this was largely done but would be good to do a full analysis 
at some point down the road)

 Review Trackers (column, delete, etc) on Trunk after 2248 goes in for 
 correctness and optimal earlying-out
 --

 Key: HBASE-2549
 URL: https://issues.apache.org/jira/browse/HBASE-2549
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical

 Once we move to all Scans, the trackers could use a refresh.  There are often 
 times where we return, for example, a MatchCode.SKIP (which just goes to the 
 next KV not including the current one) where we could be sending a more 
 optimal return code like MatchCode.SEEK_NEXT_ROW.
 This is a jira to review all of this code after 2248 goes in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-2832) Priorities and multi-threading for MemStore flushing

2011-03-21 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-2832:
-

Fix Version/s: (was: 0.92.0)

punting from 0.92.  still needs to be done but should not be tied to a version 
until work is being actively done

 Priorities and multi-threading for MemStore flushing
 

 Key: HBASE-2832
 URL: https://issues.apache.org/jira/browse/HBASE-2832
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical

 Similar to HBASE-1476 and HBASE-2646 which are for compactions, but do this 
 for flushes.
 Flushing when we hit the normal flush size is a low priority flush.  Other 
 types of flushes (heap pressure, blocking client requests, etc) are high 
 priority.
 Should have a tunable number of concurrent flushes.
 Will use the {{HBaseExecutorService}} and {{HBaseEventHandler}} introduced 
 from master/zk changes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-2375) Make decision to split based on aggregate size of all StoreFiles and revisit related config params

2011-03-21 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-2375:
-

Fix Version/s: (was: 0.92.0)

punting from 0.92.  still needs to be done but should not be tied to a version 
until work is being actively done

 Make decision to split based on aggregate size of all StoreFiles and revisit 
 related config params
 --

 Key: HBASE-2375
 URL: https://issues.apache.org/jira/browse/HBASE-2375
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.20.3
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
  Labels: moved_from_0_20_5
 Attachments: HBASE-2375-v8.patch


 Currently we will make the decision to split a region when a single StoreFile 
 in a single family exceeds the maximum region size.  This issue is about 
 changing the decision to split to be based on the aggregate size of all 
 StoreFiles in a single family (but still not aggregating across families).  
 This would move a check to split after flushes rather than after compactions. 
  This issue should also deal with revisiting our default values for some 
 related configuration parameters.
 The motivating factor for this change comes from watching the behavior of 
 RegionServers during heavy write scenarios.
 Today the default behavior goes like this:
 - We fill up regions, and as long as you are not under global RS heap 
 pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) 
 StoreFiles.
 - After we get 3 StoreFiles (hbase.hstore.compactionThreshold) we trigger a 
 compaction on this region.
 - Compaction queues notwithstanding, this will create a 192MB file, not 
 triggering a split based on max region size (hbase.hregion.max.filesize).
 - You'll then flush two more 64MB MemStores and hit the compactionThreshold 
 and trigger a compaction.
 - You end up with 192 + 64 + 64 in a single compaction.  This will create a 
 single 320MB and will trigger a split.
 - While you are performing the compaction (which now writes out 64MB more 
 than the split size, so is about 5X slower than the time it takes to do a 
 single flush), you are still taking on additional writes into MemStore.
 - Compaction finishes, decision to split is made, region is closed.  The 
 region now has to flush whichever edits made it to MemStore while the 
 compaction ran.  This flushing, in our tests, is by far the dominating factor 
 in how long data is unavailable during a split.  We measured about 1 second 
 to do the region closing, master assignment, reopening.  Flushing could take 
 5-6 seconds, during which time the region is unavailable.
 - The daughter regions re-open on the same RS.  Immediately when the 
 StoreFiles are opened, a compaction is triggered across all of their 
 StoreFiles because they contain references.  Since we cannot currently split 
 a split, we need to not hang on to these references for long.
 This described behavior is really bad because of how often we have to rewrite 
 data onto HDFS.  Imports are usually just IO bound as the RS waits to flush 
 and compact.  In the above example, the first cell to be inserted into this 
 region ends up being written to HDFS 4 times (initial flush, first compaction 
 w/ no split decision, second compaction w/ split decision, third compaction 
 on daughter region).  In addition, we leave a large window where we take on 
 edits (during the second compaction of 320MB) and then must make the region 
 unavailable as we flush it.
 If we increased the compactionThreshold to be 5 and determined splits based 
 on aggregate size, the behavior becomes:
 - We fill up regions, and as long as you are not under global RS heap 
 pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) 
 StoreFiles.
 - After each MemStore flush, we calculate the aggregate size of all 
 StoreFiles.  We can also check the compactionThreshold.  For the first three 
 flushes, both would not hit the limit.  On the fourth flush, we would see 
 total aggregate size = 256MB and determine to make a split.
 - Decision to split is made, region is closed.  This time, the region just 
 has to flush out whichever edits made it to the MemStore during the 
 snapshot/flush of the previous MemStore.  So this time window has shrunk by 
 more than 75% as it was the time to write 64MB from memory not 320MB from 
 aggregating 5 hdfs files.  This will greatly reduce the time data is 
 unavailable during splits.
 - The daughter regions re-open on the same RS.  Immediately when the 
 StoreFiles are opened, a compaction is triggered across all of their 
 StoreFiles because they contain references.  This would stay the same.
 In this example, we 

[jira] [Updated] (HBASE-3641) LruBlockCache.CacheStats.getHitCount() is not using the correct variable

2011-03-21 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3641:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to branch and trunk.  Thanks stack.

 LruBlockCache.CacheStats.getHitCount() is not using the correct variable
 

 Key: HBASE-3641
 URL: https://issues.apache.org/jira/browse/HBASE-3641
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.90.1, 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.90.2, 0.92.0

 Attachments: HBASE-3641-v1.patch, HBASE-3641-v2.patch


 {code}
 public long getHitCount() {
   return hitCachingCount.get();
 }
 {code}
 This should be {{hitCount.get()}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1110) Distribute the master role to HRS after ZK integration

2011-03-21 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009435#comment-13009435
 ] 

Jonathan Gray commented on HBASE-1110:
--

Is this really that important to do now?  Seems simple enough to start master 
processes on slave nodes if you want lots of backups.  If each RS can become a 
master, then you have to reserve heap in each to handle the master role (which 
is a non-trivial amount).

I think this is a fine area to explore and always good to have options (this 
could make sense on a small cluster).  But I'd opt to move out of 0.92.

 Distribute the master role to HRS after ZK integration
 --

 Key: HBASE-1110
 URL: https://issues.apache.org/jira/browse/HBASE-1110
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
 Fix For: 0.92.0


 After ZK integration, the master role can be distributed out to the HRS as 
 group behaviors mediated by synchronization and rendezvous points in ZK.
 - State sharing, for example load.
-- Load information can be shared with neighbors via ephemeral child 
 status znodes of a znode representing the cluster root.
-- Region servers can periodically walk the status nodes of their 
 neighbors. If they find themselves loaded relative to others, they can 
 release regions. If they find themselves less loaded relative to others, they 
 can be more aggressive about finding unassigned regions (see below).
 - Ephemeral znodes for region ownership, e.g. 
 /hbase//region/ephemeral-node
   -- Use a permanent child of region to serve as a 'dirty' flag, removed 
 during normal close.
 - A distributed queue for region assignment. 
   -- When coming up, HRS can check the assignment queue for candidates.
   -- HRS shutdown includes marking regions clean and moving them onto 
 assignment queue.
   -- All/any HRS can do occasional random walks over region leases looking 
 for expired-dirty state (when timeout causes ZK to delete the ephemeral node 
 representing the lease), and can helpfully move them first to a queue (+ 
 barrier) for splitting then onto the assignment queue. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-03-21 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009449#comment-13009449
 ] 

Jonathan Gray commented on HBASE-3417:
--

Just verified that this is the same as what we have been running with in 
production (since the patch was put up in January).

I'm ready to commit if you want to +1 me :)

 CacheOnWrite is using the temporary output path for block names, need to use 
 a more consistent block naming scheme
 --

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, 
 HBASE-3417-v5.patch


 Currently the block names used in the block cache are built using the 
 filesystem path.  However, for cache on write, the path is a temporary output 
 file.
 The original COW patch actually made some modifications to block naming stuff 
 to make it more consistent but did not do enough.  Should add a separate 
 method somewhere for generating block names using some more easily mocked 
 scheme (rather than just raw path as we generate a random unique file name 
 twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3052) Add ability to have multiple ZK servers in a quorum in MiniZooKeeperCluster for test writing

2011-03-21 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009454#comment-13009454
 ] 

Jonathan Gray commented on HBASE-3052:
--

Patch is looking good but I'm confused by a few things.

Are you starting all the servers at the beginning?  Or do the ZK servers only 
actually start/run once you kill another one?

The idea for this is to create a ZK quorum of servers and then be able to kill 
individual ones.  Ideally, we'd also be able to specifically kill whichever 
server is the quorum leader.

Also, I'm unclear on the meaning of candidate in this context.  Is the 
candidate server the active server?  Does that mean it's online?  Maybe 
change the name or at least add some javadoc explaining what exactly is 
happening.

 Add ability to have multiple ZK servers in a quorum in MiniZooKeeperCluster 
 for test writing
 

 Key: HBASE-3052
 URL: https://issues.apache.org/jira/browse/HBASE-3052
 Project: HBase
  Issue Type: Improvement
  Components: test, zookeeper
Reporter: Jonathan Gray
Assignee: Liyin Tang
Priority: Minor
 Attachments: HBASE_3052[r1083993].patch


 Interesting things can happen when you have a ZK quorum of multiple servers 
 and one of them dies.  Doing testing here on clusters, this has turned up 
 some bugs with HBase interaction with ZK.
 Would be good to add the ability to have multiple ZK servers in unit tests 
 and be able to kill them individually.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-3658) Alert when heap is over committed

2011-03-17 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008052#comment-13008052
 ] 

Jonathan Gray commented on HBASE-3658:
--

+1 on refusing to start

 Alert when heap is over committed
 -

 Key: HBASE-3658
 URL: https://issues.apache.org/jira/browse/HBASE-3658
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0


 Something I just witnessed, the block cache setting was at 70% but the max 
 global memstore size was at the default of 40% meaning that 110% of the heap 
 can potentially be assigned and then you need more heap to do stuff like 
 flushing and compacting.
 We should run a configuration check that alerts the user when that happens 
 and maybe even refuse to start.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-3663) The starvation problem in current load balance algorithm

2011-03-17 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008089#comment-13008089
 ] 

Jonathan Gray commented on HBASE-3663:
--

I think this is an issue in the 0.20 / 0.89 version of the load balancer which 
is no longer in any active branches.

 The starvation problem in current load balance algorithm
 

 Key: HBASE-3663
 URL: https://issues.apache.org/jira/browse/HBASE-3663
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang
 Attachments: result_new_load_balance.txt, result_old_load_balance.txt


 This is an interesting starvation case. There are 2 conditions to trigger 
 this problem.
 Condition1: r/s - r/(s+1)  1 
 Let r: the number of regions
 Let s: the number of servers
 Condition2: for each server, the load of each server is less or equal the 
 ceil of avg load.
 Here is the unit test to verify this problem: 
 For example, there are 16 servers and 62 regions. The avg load is 
 3.875. And setting the slot to 0 to keep the load of each server either 3 or 
 4. 
 When a new server is coming,  no server needs to assign regions to this new 
 server, since no one is larger the ceil of the avg.
 (Setting slot to 0 is to easily trigger this situation, otherwise it needs 
 much larger numbers)
 Solutions is pretty straightforward. Just compare the floor of the avg 
 instead of the ceil. This solution will evenly balance the load from the 
 servers which is little more loaded than others. 
 I also attached the comparison result  for the case mentioned above between 
 the old balance algorithm and new balance algorithm. (I set the slot = 0 when 
 testing)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (HBASE-3641) LruBlockCache.CacheStats.getHitCount() is not using the correct variable

2011-03-14 Thread Jonathan Gray (JIRA)
LruBlockCache.CacheStats.getHitCount() is not using the correct variable


 Key: HBASE-3641
 URL: https://issues.apache.org/jira/browse/HBASE-3641
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.90.1, 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.90.2, 0.92.0


{code}
public long getHitCount() {
  return hitCachingCount.get();
}
{code}

This should be {{hitCount.get()}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HBASE-3641) LruBlockCache.CacheStats.getHitCount() is not using the correct variable

2011-03-14 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3641:
-

Status: Patch Available  (was: Open)

 LruBlockCache.CacheStats.getHitCount() is not using the correct variable
 

 Key: HBASE-3641
 URL: https://issues.apache.org/jira/browse/HBASE-3641
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.90.1, 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.90.2, 0.92.0

 Attachments: HBASE-3641-v1.patch


 {code}
 public long getHitCount() {
   return hitCachingCount.get();
 }
 {code}
 This should be {{hitCount.get()}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HBASE-3641) LruBlockCache.CacheStats.getHitCount() is not using the correct variable

2011-03-14 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3641:
-

Attachment: HBASE-3641-v1.patch

 LruBlockCache.CacheStats.getHitCount() is not using the correct variable
 

 Key: HBASE-3641
 URL: https://issues.apache.org/jira/browse/HBASE-3641
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.90.1, 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.90.2, 0.92.0

 Attachments: HBASE-3641-v1.patch


 {code}
 public long getHitCount() {
   return hitCachingCount.get();
 }
 {code}
 This should be {{hitCount.get()}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-03-11 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005821#comment-13005821
 ] 

Jonathan Gray commented on HBASE-1364:
--

FYI, Prakash Khemani is working on this right now.  Not sure when a patch will 
be up but it's looking good so far.  It is built on top of the new ZK stuff in 
0.90 and above.

 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Alex Newman
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-1364.patch

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-3622) Deadlock in HBaseServer (JVM bug?)

2011-03-10 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005499#comment-13005499
 ] 

Jonathan Gray commented on HBASE-3622:
--

We run with +UseMembar at FB.  I ran experiments on CPU-bound workloads and 
there was no significant difference in performance either way.

 Deadlock in HBaseServer (JVM bug?)
 --

 Key: HBASE-3622
 URL: https://issues.apache.org/jira/browse/HBASE-3622
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3622.patch


 On Dmitriy's cluster:
 {code}
 IPC Reader 0 on port 60020 prio=10 tid=0x2aacb4a82800 nid=0x3a72 
 waiting on condition [0x429ba000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aaabf5fa6d0 (a 
 java.util.concurrent.locks.ReentrantLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
 at 
 java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
 at 
 java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
 at 
 java.util.concurrent.LinkedBlockingQueue.signalNotEmpty(LinkedBlockingQueue.java:103)
 at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:267)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:985)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
 - locked 0x2aaabf580fb0 (a 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 ...
 IPC Server handler 29 on 60020 daemon prio=10 tid=0x2aacbc163800 
 nid=0x3acc waiting on condition [0x462f3000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aaabf5e3800 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025)
 IPC Server handler 28 on 60020 daemon prio=10 tid=0x2aacbc161800 
 nid=0x3acb waiting on condition [0x461f2000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aaabf5e3800 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025
 ...
 {code}
 This region server stayed in this state for hours. The reader is waiting to 
 put and the handlers are waiting to take, and they wait on different lock 
 ids. It reminds me of the UseMembar thing about the JVM sometime missing to 
 notify waiters. In any case, that RS needed to be closed in order to get out 
 of that state. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-3614) Expose per-region request rate metrics

2011-03-09 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004817#comment-13004817
 ] 

Jonathan Gray commented on HBASE-3614:
--

I'm not sure if there is a JIRA yet, but some guys at FB did a bunch of work on 
doing per-family metrics.  They did work to dynamically generate new metric 
names, etc.

I think we could work on this at the same time we start to think about using 
the info for better load balancing and such.  This could obviously come first.

 Expose per-region request rate metrics
 --

 Key: HBASE-3614
 URL: https://issues.apache.org/jira/browse/HBASE-3614
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Reporter: Gary Helmling
Priority: Minor

 We currently export metrics on request rates for each region server, and this 
 can help with identifying uneven load at a high level. But once you see a 
 given server under high load, you're forced to extrapolate based on your 
 application patterns and the data it's serving what the likely culprit is.  
 This can and should be much easier if we just exported request rate metrics 
 per-region on each server.
 Dynamically updating the metrics keys based on assigned regions may pose some 
 minor challenges, but this seems a very valuable diagnostic tool to have 
 available.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-3573) Move shutdown messaging OFF hearbeat; prereq for fix of hbase-1502

2011-02-28 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000613#comment-13000613
 ] 

Jonathan Gray commented on HBASE-3573:
--

not sure if it matters, but one check returns true if it the server holds a 
catalog region.  then another check uses that check to determine that the last 
two server *only* hold catalogs.  so in that case, they could still be holding 
other user regions?

 Move shutdown messaging OFF hearbeat; prereq for fix of hbase-1502
 --

 Key: HBASE-3573
 URL: https://issues.apache.org/jira/browse/HBASE-3573
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 3573.txt, 3573.txt




-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3573) Move shutdown messaging OFF hearbeat; prereq for fix of hbase-1502

2011-02-28 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000654#comment-13000654
 ] 

Jonathan Gray commented on HBASE-3573:
--

Yeah, that all makes sense.  Just making sure that's what you intended.  +1 if 
tests pass and you tried it up on cluster.

 Move shutdown messaging OFF hearbeat; prereq for fix of hbase-1502
 --

 Key: HBASE-3573
 URL: https://issues.apache.org/jira/browse/HBASE-3573
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 3573.txt, 3573.txt




-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-2947) MultiIncrement (MultiGet functionality for increments)

2011-02-20 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12997163#comment-12997163
 ] 

Jonathan Gray commented on HBASE-2947:
--

HBASE-2814 seems to only be about thrift.  This is to make Increment a Row 
operation so it can be used with the existing MultiAction stuff.

 MultiIncrement (MultiGet functionality for increments)
 --

 Key: HBASE-2947
 URL: https://issues.apache.org/jira/browse/HBASE-2947
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Attachments: HBASE-2947-v1.patch


 HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
 operations.  We should add a way to do that with increments.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   4   5   6   7   8   >