from:"Jonathan Gray \(JIRA\)"

[jira] [Commented] (HBASE-4583) Integrate RWCC with Append and Increment operations

2012-11-06 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491871#comment-13491871
 ] 

Jonathan Gray commented on HBASE-4583:
--

My vote (if only for one implementation) would be for the less radical patch 
that removes in-memory versions that are not visible rather than doing this 
cleanup on flush which has a number of performance implications.  I can see 
some reasons for wanting to keep versions around (providing support to an 
Omid-like transaction engine requires retaining old versions for at least some 
time), but it would be cool to have an option to prevent the deletion of the 
old versions rather than require that these exist in cases I won't ever use 
them.  In all my increment performance tests, of which there have been many, 
the upsert/removal of old versions is one of the biggest gains, especially if 
you have particularly hot columns.

I'm not sure which design you are referring to when you talk about being true 
to HBase's design ;) Or maybe you're referring to the general principles of 
HBase (append-only), but the increment operation itself was not part of any 
original design or implementation of HBase and has been a hack in one way or 
another from the very first implementation.  For the reason that the 
implementation has been targeted at performance over purity.  I've always seen 
it as an atomic operation that would have any notion of versioning as opaque to 
the user of the atomic increment.  Again, I can see use cases for it, but I'd 
lean towards having it as an option rather than requirement.

Thanks for doing this work, good stuff.  +1

 Integrate RWCC with Append and Increment operations
 ---

 Key: HBASE-4583
 URL: https://issues.apache.org/jira/browse/HBASE-4583
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.96.0

 Attachments: 4583-trunk-less-radical.txt, 
 4583-trunk-less-radical-v2.txt, 4583-trunk-less-radical-v3.txt, 
 4583-trunk-less-radical-v4.txt, 4583-trunk-less-radical-v5.txt, 
 4583-trunk-less-radical-v6.txt, 4583-trunk-radical.txt, 
 4583-trunk-radical_v2.txt, 4583-trunk-v3.txt, 4583.txt, 4583-v2.txt, 
 4583-v3.txt, 4583-v4.txt


 Currently Increment and Append operations do not work with RWCC and hence a 
 client could see the results of multiple such operation mixed in the same 
 Get/Scan.
 The semantics might be a bit more interesting here as upsert adds and removes 
 to and from the memstore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4583) Integrate RWCC with Append and Increment operations

2012-11-06 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492144#comment-13492144
 ] 

Jonathan Gray commented on HBASE-4583:
--

That makes sense to me (versions = 1 means upsert).

Big +1 from me on adding support for setting the timestamp.

 Integrate RWCC with Append and Increment operations
 ---

 Key: HBASE-4583
 URL: https://issues.apache.org/jira/browse/HBASE-4583
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.96.0

 Attachments: 4583-mixed.txt, 4583-trunk-less-radical.txt, 
 4583-trunk-less-radical-v2.txt, 4583-trunk-less-radical-v3.txt, 
 4583-trunk-less-radical-v4.txt, 4583-trunk-less-radical-v5.txt, 
 4583-trunk-less-radical-v6.txt, 4583-trunk-radical.txt, 
 4583-trunk-radical_v2.txt, 4583-trunk-v3.txt, 4583.txt, 4583-v2.txt, 
 4583-v3.txt, 4583-v4.txt


 Currently Increment and Append operations do not work with RWCC and hence a 
 client could see the results of multiple such operation mixed in the same 
 Get/Scan.
 The semantics might be a bit more interesting here as upsert adds and removes 
 to and from the memstore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4014) Coprocessors: Flag the presence of coprocessors in logged exceptions

2011-09-25 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114422#comment-13114422
 ] 

Jonathan Gray commented on HBASE-4014:
--

Ted, why is this JIRA scattered over so many commits?  And the commit message 
is a non-standard format (the first line is: HBASE-4014 is marked as 
Improvement).  I've been trying to build some tools to help keep track of and 
in sync with the Apache repos but this kind of stuff makes it very difficult.

 Coprocessors: Flag the presence of coprocessors in logged exceptions
 

 Key: HBASE-4014
 URL: https://issues.apache.org/jira/browse/HBASE-4014
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Andrew Purtell
Assignee: Eugene Koontz
 Fix For: 0.92.0

 Attachments: 4014.final, HBASE-4014.patch, HBASE-4014.patch, 
 HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch


 For some initial triage of bug reports for core versus for deployments with 
 loaded coprocessors, we need something like the Linux kernel's taint flag, 
 and list of linked in modules that show up in the output of every OOPS, to 
 appear above or below exceptions that appear in the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4014) Coprocessors: Flag the presence of coprocessors in logged exceptions

2011-09-24 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114040#comment-13114040
 ] 

Jonathan Gray commented on HBASE-4014:
--

What's the status of this?

 Coprocessors: Flag the presence of coprocessors in logged exceptions
 

 Key: HBASE-4014
 URL: https://issues.apache.org/jira/browse/HBASE-4014
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Andrew Purtell
Assignee: Eugene Koontz
 Fix For: 0.92.0

 Attachments: HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch, 
 HBASE-4014.patch, HBASE-4014.patch


 For some initial triage of bug reports for core versus for deployments with 
 loaded coprocessors, we need something like the Linux kernel's taint flag, 
 and list of linked in modules that show up in the output of every OOPS, to 
 appear above or below exceptions that appear in the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-24 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114043#comment-13114043
 ] 

Jonathan Gray commented on HBASE-4460:
--

Since security stuff can be dealt with in a separate JIRA, what do people think 
of the patch I have up?  Shall I submit to rb?

 Support running an embedded ThriftServer within a RegionServer
 --

 Key: HBASE-4460
 URL: https://issues.apache.org/jira/browse/HBASE-4460
 Project: HBase
  Issue Type: New Feature
  Components: regionserver, thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4460-v1.patch


 Rather than a separate process, it can be advantageous in some situations for 
 each RegionServer to embed their own ThriftServer.  This allows each embedded 
 ThriftServer to short-circuit any queries that should be executed on the 
 local RS and skip the extra hop.  This then enables the building of fat 
 Thrift clients that cache region locations and avoid extra hops all together.
 This JIRA is just about the embedded ThriftServer.  Will open others for the 
 rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4461) Expose getRowOrBefore via Thrift

2011-09-23 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113669#comment-13113669
 ] 

Jonathan Gray commented on HBASE-4461:
--

Well my plan is to use it internally on 0.92 (we are porting all the changes 
necessary for our fat C++ client from our internal 90 branch).  But wherever 
you think it should go is fine.

 Expose getRowOrBefore via Thrift
 

 Key: HBASE-4461
 URL: https://issues.apache.org/jira/browse/HBASE-4461
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4461-v2.patch


 In order for fat Thrift-based clients to locate region locations they need to 
 utilize the getRowOrBefore method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4461) Expose getRowOrBefore via Thrift

2011-09-23 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113671#comment-13113671
 ] 

Jonathan Gray commented on HBASE-4461:
--

and I'm saving up my new features to force in 92 to try and get the 
HLog/Delayable stuff in ;)

 Expose getRowOrBefore via Thrift
 

 Key: HBASE-4461
 URL: https://issues.apache.org/jira/browse/HBASE-4461
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4461-v2.patch


 In order for fat Thrift-based clients to locate region locations they need to 
 utilize the getRowOrBefore method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-23 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113674#comment-13113674
 ] 

Jonathan Gray commented on HBASE-4131:
--

Thanks stack!

 Make the Replication Service pluggable via a standard interface definition
 --

 Key: HBASE-4131
 URL: https://issues.apache.org/jira/browse/HBASE-4131
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: replicationInterface1.txt, replicationInterface2.txt, 
 replicationInterface3.txt


 The current HBase code supports a replication service that can be used to 
 sync data from from one hbase cluster to another. It would be nice to make it 
 a pluggable interface so that other cross-data-center replication services 
 can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4461) Expose getRowOrBefore via Thrift

2011-09-23 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113768#comment-13113768
 ] 

Jonathan Gray commented on HBASE-4461:
--

Man, I remember when i could buy your vote for $2.00!

 Expose getRowOrBefore via Thrift
 

 Key: HBASE-4461
 URL: https://issues.apache.org/jira/browse/HBASE-4461
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4461-v2.patch


 In order for fat Thrift-based clients to locate region locations they need to 
 utilize the getRowOrBefore method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms

2011-09-23 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113870#comment-13113870
 ] 

Jonathan Gray commented on HBASE-4449:
--

Is this done now?

 LoadIncrementalHFiles should be able to handle CFs with blooms
 --

 Key: HBASE-4449
 URL: https://issues.apache.org/jira/browse/HBASE-4449
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Dave Revell
Assignee: Dave Revell
 Fix For: 0.90.5

 Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, 
 HBASE-4449.patch


 When LoadIncrementalHFiles loads a store file that crosses region boundaries, 
 it will split the file at the boundary to create two store files. If the 
 store file is for a column family that has a bloom filter, then a 
 java.lang.ArithmeticException: / by zero will be raised because 
 ByteBloomFilter() is called with maxKeys of 0.
 The included patch assumes that the number of keys in each split child will 
 be equal to the number of keys in the parent's bloom filter (instead of 0). 
 This is an overestimate, but it's safe and easy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms

2011-09-23 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113872#comment-13113872
 ] 

Jonathan Gray commented on HBASE-4449:
--

It looks like the test change was committed but not the change to 
LoadIncrementalHFiles?

 LoadIncrementalHFiles should be able to handle CFs with blooms
 --

 Key: HBASE-4449
 URL: https://issues.apache.org/jira/browse/HBASE-4449
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Dave Revell
Assignee: Dave Revell
 Fix For: 0.90.5

 Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, 
 HBASE-4449.patch


 When LoadIncrementalHFiles loads a store file that crosses region boundaries, 
 it will split the file at the boundary to create two store files. If the 
 store file is for a column family that has a bloom filter, then a 
 java.lang.ArithmeticException: / by zero will be raised because 
 ByteBloomFilter() is called with maxKeys of 0.
 The included patch assumes that the number of keys in each split child will 
 be equal to the number of keys in the parent's bloom filter (instead of 0). 
 This is an overestimate, but it's safe and easy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Jonathan Gray (JIRA)

Support running an embedded ThriftServer within a RegionServer
--

 Key: HBASE-4460
 URL: https://issues.apache.org/jira/browse/HBASE-4460
 Project: HBase
  Issue Type: New Feature
  Components: regionserver, thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray


Rather than a separate process, it can be advantageous in some situations for 
each RegionServer to embed their own ThriftServer.  This allows each embedded 
ThriftServer to short-circuit any queries that should be executed on the local 
RS and skip the extra hop.  This then enables the building of fat Thrift 
clients that cache region locations and avoid extra hops all together.

This JIRA is just about the embedded ThriftServer.  Will open others for the 
rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4460:
-

Attachment: HBASE-4460-v1.patch

Adds {{HRegionThriftServer}}, a RegionServer hosted ThriftServer.  Default is 
off, can be turned on with hbase.regionserver.export.thrift set to true.

 Support running an embedded ThriftServer within a RegionServer
 --

 Key: HBASE-4460
 URL: https://issues.apache.org/jira/browse/HBASE-4460
 Project: HBase
  Issue Type: New Feature
  Components: regionserver, thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4460-v1.patch


 Rather than a separate process, it can be advantageous in some situations for 
 each RegionServer to embed their own ThriftServer.  This allows each embedded 
 ThriftServer to short-circuit any queries that should be executed on the 
 local RS and skip the extra hop.  This then enables the building of fat 
 Thrift clients that cache region locations and avoid extra hops all together.
 This JIRA is just about the embedded ThriftServer.  Will open others for the 
 rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112933#comment-13112933
 ] 

Jonathan Gray commented on HBASE-4460:
--

Replacing HRPC is another story but I think many of us are in agreement that 
we'd like to do that eventually.  The scope here is much smaller and I'm 
working on a set of changes to allow fat Thrift-based clients, not necessarily 
replacing normal HRPC.

Open to your feedback on what I can do to better integrate with security stuff 
but not sure what I can do at this point.

 Support running an embedded ThriftServer within a RegionServer
 --

 Key: HBASE-4460
 URL: https://issues.apache.org/jira/browse/HBASE-4460
 Project: HBase
  Issue Type: New Feature
  Components: regionserver, thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4460-v1.patch


 Rather than a separate process, it can be advantageous in some situations for 
 each RegionServer to embed their own ThriftServer.  This allows each embedded 
 ThriftServer to short-circuit any queries that should be executed on the 
 local RS and skip the extra hop.  This then enables the building of fat 
 Thrift clients that cache region locations and avoid extra hops all together.
 This JIRA is just about the embedded ThriftServer.  Will open others for the 
 rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4461) Expose getRowOrBefore via Thrift

2011-09-22 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112936#comment-13112936
 ] 

Jonathan Gray commented on HBASE-4461:
--

Thanks Ted.

 Expose getRowOrBefore via Thrift
 

 Key: HBASE-4461
 URL: https://issues.apache.org/jira/browse/HBASE-4461
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray

 In order for fat Thrift-based clients to locate region locations they need to 
 utilize the getRowOrBefore method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4296) Deprecate HTable[Interface].getRowOrBefore(...)

2011-09-22 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112935#comment-13112935
 ] 

Jonathan Gray commented on HBASE-4296:
--

Over in HBASE-4461 I am exposing this method to Thrift to enable building fat 
Thrift-based clients.  Rather than deprecating this, could we just notate that 
it is an expensive operation and not for normal operations?  Or even only allow 
it to work on ROOT and META?

 Deprecate HTable[Interface].getRowOrBefore(...)
 ---

 Key: HBASE-4296
 URL: https://issues.apache.org/jira/browse/HBASE-4296
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Trivial
 Fix For: 0.92.0

 Attachments: 4296.txt


 HTable's getRowOrBefore(...) internally calls into Store.getRowKeyAtOrBefore. 
 That method was created to allow our scanning of .META. (see HBASE-2600).
 Store.getRowKeyAtOrBefore(...) lists a bunch of requirements for this to be 
 performant that a user of HTable will not be aware of.
 I propose deprecating this in the public interface in 0.92 and removing it 
 from the public interface in 0.94. If we don't get to HBASE-2600 in 0.94 it 
 will still remain as internal interface for scanning meta.
 Comments?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112984#comment-13112984
 ] 

Jonathan Gray commented on HBASE-4460:
--

Gary, want to open another JIRA and link it here?

 Support running an embedded ThriftServer within a RegionServer
 --

 Key: HBASE-4460
 URL: https://issues.apache.org/jira/browse/HBASE-4460
 Project: HBase
  Issue Type: New Feature
  Components: regionserver, thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4460-v1.patch


 Rather than a separate process, it can be advantageous in some situations for 
 each RegionServer to embed their own ThriftServer.  This allows each embedded 
 ThriftServer to short-circuit any queries that should be executed on the 
 local RS and skip the extra hop.  This then enables the building of fat 
 Thrift clients that cache region locations and avoid extra hops all together.
 This JIRA is just about the embedded ThriftServer.  Will open others for the 
 rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4452:
-

Fix Version/s: 0.92.0

lgtm.  nice catch.  pulling in to 0.92

 Possibility of RS opening a region though tickleOpening fails due to znode 
 version mismatch
 ---

 Key: HBASE-4452
 URL: https://issues.apache.org/jira/browse/HBASE-4452
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-4452.patch


 Consider the following code
 {code}
 long period = Math.max(1, assignmentTimeout/ 3);
 long lastUpdate = now;
 while (!signaller.get()  t.isAlive()  !this.server.isStopped() 
 !this.rsServices.isStopping()  (endTime  now)) {
   long elapsed = now - lastUpdate;
   if (elapsed  period) {
 // Only tickle OPENING if postOpenDeployTasks is taking some time.
 lastUpdate = now;
 tickleOpening(post_open_deploy);
   }
 {code}
 Whenever the postopenDeploy tasks takes considerable time we try to 
 tickleOpening so that there is no timeout deducted.  But before it could do 
 this if the TimeoutMonitor tries to assign the node to another RS then the 
 other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
 tries to do tickleOpening the operation will fail. Now here lies the problem,
 {code}
 String encodedName = this.regionInfo.getEncodedName();
 try {
   this.version =
 ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
   this.regionInfo, this.server.getServerName(), this.version);
 } catch (KeeperException e) {
 {code}
 Now this.version becomes -1 as the operation failed.
 Now as in the first code snippet as the return type is not captured after 
 tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
 dont have any check for this condition as already the version has been 
 changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
 double assignment.
 {noformat}
 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
 node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
 expected version 2
 2011-09-22 00:57:33,494 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
 refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
 context=post_open_deploy
 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENED
 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENED
 2011-09-22 00:58:13,956 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
 t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
 {noformat}
 Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4462) Properly treating SocketTimeoutException

2011-09-22 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113011#comment-13113011
]

Jonathan Gray commented on HBASE-4462:
--

+1 on treating STE differently. I think we should treat it as DNRE and kick it
back to the client. There could be a configurable policy for socket timeouts
(or network level errors in general?) if some people want the HBase client to
retry once or something.

Properly treating SocketTimeoutException

Key: HBASE-4462
URL: https://issues.apache.org/jira/browse/HBASE-4462
Project: HBase
Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Fix For: 0.92.0

SocketTimeoutException is currently treated like any IOE inside of
HCM.getRegionServerWithRetries and I think this is a problem. This method
should only do retries in cases where we are pretty sure the operation will
complete, but with STE we already waited for (by default) 60 seconds and
nothing happened.
I found this while debugging Douglas Campbell's problem on the mailing list
where it seemed like he was using the same scanner from multiple threads, but
actually it was just the same client doing retries while the first run didn't
even finish yet (that's another problem). You could see the first scanner,
then up to two other handlers waiting for it to finish in order to run
(because of the synchronization on RegionScanner).
So what should we do? We could treat STE as a DoNotRetryException and let the
client deal with it, or we could retry only once.
There's also the option of having a different behavior for get/put/icv/scan,
the issue with operations that modify a cell is that you don't know if the
operation completed or not (same when a RS dies hard after completing let's
say a Put but just before returning to the client).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4296) Deprecate HTable[Interface].getRowOrBefore(...)

2011-09-22 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113037#comment-13113037
 ] 

Jonathan Gray commented on HBASE-4296:
--

We are already using the fat thrift client on our 0.90 branch.  I'm in the 
process of pushing this all out into open source so we can then pull it back in 
to our 0.92 based branch.  I'm happy to put this stuff into 0.92 in Apache as 
well but it's somewhat featurish :)

Was the method removed in 0.94 already?  Can we just hold off on removing it 
into 2600 happens and that way it won't matter and we can commit it anywhere.  
Following 2600 we can modify how it works and just use a normal scanner then?

 Deprecate HTable[Interface].getRowOrBefore(...)
 ---

 Key: HBASE-4296
 URL: https://issues.apache.org/jira/browse/HBASE-4296
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Trivial
 Fix For: 0.92.0

 Attachments: 4296.txt


 HTable's getRowOrBefore(...) internally calls into Store.getRowKeyAtOrBefore. 
 That method was created to allow our scanning of .META. (see HBASE-2600).
 Store.getRowKeyAtOrBefore(...) lists a bunch of requirements for this to be 
 performant that a user of HTable will not be aware of.
 I propose deprecating this in the public interface in 0.92 and removing it 
 from the public interface in 0.94. If we don't get to HBASE-2600 in 0.94 it 
 will still remain as internal interface for scanning meta.
 Comments?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4461) Expose getRowOrBefore via Thrift

2011-09-22 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4461:
-

Attachment: HBASE-4461-v2.patch

Adds getRowOrBefore() exposed to Thrift.  Also adds server name and port to 
TRegionInfo so we can get assignment info through existing APIs in Thrift.

 Expose getRowOrBefore via Thrift
 

 Key: HBASE-4461
 URL: https://issues.apache.org/jira/browse/HBASE-4461
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-4461-v2.patch


 In order for fat Thrift-based clients to locate region locations they need to 
 utilize the getRowOrBefore method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4451) Improve zk node naming (/hbase/shutdown)

2011-09-21 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112155#comment-13112155
]

Jonathan Gray commented on HBASE-4451:
--

bq. Would changing this have an effect on compatibility?

If you wanted to support this change over a rolling restart or anything like
that, it would probably be rather complicated or impractical. So it would
require a full restart of the cluster most likely. In addition, any external
ops/monitoring/admin tools people have built might be looking at the specific
names. That shouldn't necessarily stop us though.

Perhaps we can do this as part of a fresh look at the names of the ZK nodes in
general. We might make some changes with the root node and such as well in 94.
Do you want to look at all the ZK node names and see if there's a new scheme
that would be more clear?

Improve zk node naming (/hbase/shutdown)

Key: HBASE-4451
URL: https://issues.apache.org/jira/browse/HBASE-4451
Project: HBase
Issue Type: Improvement
Components: master
Affects Versions: 0.94.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
Fix For: 0.94.0

Right now the node {{/hbase/shutdown}} is used to indicate cluster status
(cluster up, cluster down).
However, upon a chat with Lars George today, we feel that having a name
{{/hbase/shutdown}} is possibly bad. The {{/hbase/shutdown}} zknode contains
a date when the cluster was _started_. Now that is difficult to understand
and digest, given that a person may connect to zk and try to look at what it
is about (they may think it 'shutdown' at that date.).
I feel a better name may simply be: {{/hbase/running}}. Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4132) Extend the WALActionsListener API to accomodate log archival

2011-09-21 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112249#comment-13112249
 ] 

Jonathan Gray commented on HBASE-4132:
--

Looks good.  One thing:

{code}oldPath = new Path(/DUMMY-No-preexisting-logfile);{code}

Should we support passing a null path or at least use a static?

 Extend the WALActionsListener API to accomodate log archival
 

 Key: HBASE-4132
 URL: https://issues.apache.org/jira/browse/HBASE-4132
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: walArchive.txt, walArchive2.txt


 The WALObserver interface exposes the log roll events. It would be nice to 
 extend it to accomodate log archival events as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-21 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112250#comment-13112250
 ] 

Jonathan Gray commented on HBASE-4153:
--

Looks like this introduced a compile error in MockRegionServerServices?

 Handle RegionAlreadyInTransitionException in AssignmentManager
 --

 Key: HBASE-4153
 URL: https://issues.apache.org/jira/browse/HBASE-4153
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: 4153-v3.txt, HBASE-4153_1.patch, HBASE-4153_2.patch, 
 HBASE-4153_3.patch, HBASE-4153_4.patch, HBASE-4153_5.patch, HBASE-4153_6.patch


 Comment from Stack over in HBASE-3741:
 {quote}
 Question: Looking at this patch again, if we throw a 
 RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
 though RegionAlreadyInTransitionException in at least one case here is saying 
 that the region is already open on this regionserver?
 {quote}
 Indeed looking at the code it's going to be handled the same way other 
 exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-21 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112251#comment-13112251
 ] 

Jonathan Gray commented on HBASE-4153:
--

nevermind!

 Handle RegionAlreadyInTransitionException in AssignmentManager
 --

 Key: HBASE-4153
 URL: https://issues.apache.org/jira/browse/HBASE-4153
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: 4153-v3.txt, HBASE-4153_1.patch, HBASE-4153_2.patch, 
 HBASE-4153_3.patch, HBASE-4153_4.patch, HBASE-4153_5.patch, HBASE-4153_6.patch


 Comment from Stack over in HBASE-3741:
 {quote}
 Question: Looking at this patch again, if we throw a 
 RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
 though RegionAlreadyInTransitionException in at least one case here is saying 
 that the region is already open on this regionserver?
 {quote}
 Indeed looking at the code it's going to be handled the same way other 
 exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4432) Enable/Disable off heap cache with config

2011-09-19 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108104#comment-13108104
 ] 

Jonathan Gray commented on HBASE-4432:
--

+1

 Enable/Disable off heap cache with config
 -

 Key: HBASE-4432
 URL: https://issues.apache.org/jira/browse/HBASE-4432
 Project: HBase
  Issue Type: Improvement
Reporter: Li Pi
Assignee: Li Pi
Priority: Trivial
 Attachments: 4432.v3, enableswitchforoffheapcache.txt, patchv2.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-19 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108110#comment-13108110
]

Jonathan Gray commented on HBASE-4433:
--

Good stuff. I think the first iteration of the ColumnTracker had the
INCLUDE_AND_* primitives but it was simplified. Would be pretty cool that
write up a unit test that creates single-KV sized blocks and you could run
various queries to see the number of blocks accessed. Especially nice to catch
regressions in the future.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

[Noticed this in 89, but quite likely true of trunk as well.]
When we are done with the requested column(s) the code still does an extra
next() call before it realizes that it is actually done. This extra next()
call could potentially result in an unnecessary extra block load. This is
likely to be especially bad for CFs where the KVs are large blobs where each
KV may be occupying a block of its own. So the next() can often load a new
unrelated block unnecessarily.
--
For the simple case of reading say the top-most column in a row in a single
file, where each column (KV) was say a block of its own-- it seems that we
are reading 3 blocks, instead of 1 block!
I am working on a simple patch and with that the number of seeks is down to
2.
[There is still an extra seek left. I think there were two levels of
extra/unnecessary next() we were doing without actually confirming that the
next was needed. One at the StoreScanner/ScanQueryMatcher level which this
diff avoids. I think the other is at hfs.next() (at the storefile scanner
level) that's happening whenever a HFile scanner servers out a data-- and
perhaps that's the additional seek that we need to avoid. But I want to
tackle this optimization first as the two issues seem unrelated.]
--
The basic idea of the patch I am working on/testing is as follows. The
ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if
the KV needs to be included and then if done, only in the the next call it
returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases
when ExplicitColumnTracker knows it is done with a particular column/row, the
patch attempts to combine the INCLUDE code and done hint into a single match
code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4410) FilterList.filterKeyValue can return suboptimal ReturnCodes

2011-09-18 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107495#comment-13107495
 ] 

Jonathan Gray commented on HBASE-4410:
--

Actually I think Lars is correct.  It's a question of whether we should execute 
all filters in a list filterKeyValue() or not.

I think the right behavior is actually just to make it execute how one would 
expect this type of conditional to execute:

if (conditionA  conditionB)

If conditionA fails, we don't expect conditionB to be executed.

if (conditionA || conditionB)

If conditionA passes, we don't expect conditionB to be executed.

This was the previous behavior and my patch undoes it.  I will work on a new 
patch.

 FilterList.filterKeyValue can return suboptimal ReturnCodes
 ---

 Key: HBASE-4410
 URL: https://issues.apache.org/jira/browse/HBASE-4410
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4410-v1.patch


 FilterList.filterKeyValue does not always return the most optimal ReturnCode 
 in both the AND and OR conditions.
 For example, if you have F1 AND F2, F1 returns SKIP.  It immediately returns 
 the SKIP.  However, if F2 would have returned NEXT_COL or NEXT_ROW or 
 SEEK_NEXT_USING_HINT, we would actually be able to return the more optimal 
 ReturnCode from F2.
 For AND conditions, we can always pick the *most restrictive* return code.
 For OR conditions, we must always pick the *least restrictive* return code.
 This JIRA is to review the FilterList.filterKeyValue() method to try and make 
 it more optimal and to add a new unit test which verifies the correct 
 behavior.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4373) HBaseAdmin.assign() does not use force flag

2011-09-16 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106760#comment-13106760
 ] 

Jonathan Gray commented on HBASE-4373:
--

Trying to understand this patch.  So with the force flag removed, what is the 
default behavior?  If the state is not OFFLINE and we try to assign somewhere 
else, do we force the node to OFFLINE always?

 HBaseAdmin.assign() does not use force flag
 ---

 Key: HBASE-4373
 URL: https://issues.apache.org/jira/browse/HBASE-4373
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4373.patch, HBASE-4373_1.patch


 The HBaseAdmin.assign()
 {code}
   public void assign(final byte [] regionName, final boolean force)
   throws MasterNotRunningException, ZooKeeperConnectionException, IOException 
 {
 getMaster().assign(regionName, force);
   }
 {code}
 In the HMaster we call 
 {code}
 PairHRegionInfo, ServerName pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toString(regionName));
 if (cpHost != null) {
   if (cpHost.preAssign(pair.getFirst(), force)) {
 return;
   }
 }
 assignRegion(pair.getFirst());
 if (cpHost != null) {
   cpHost.postAssign(pair.getFirst(), force);
 }
 {code}
 The force flag is not getting used.  May be we need to update the javadoc or 
 do not provide the force flag as a parameter if we are not going to use it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4422) Move block cache parameters and references into single CacheConf class

2011-09-16 Thread Jonathan Gray (JIRA)

Move block cache parameters and references into single CacheConf class
--

 Key: HBASE-4422
 URL: https://issues.apache.org/jira/browse/HBASE-4422
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.92.0


From StoreFile down to HFile, we currently use a boolean argument for each of 
the various block cache configuration parameters that exist.  The number of 
parameters is going to continue to increase as we look at compressed cache, 
delta encoding, and more specific L1/L2 configuration.  Every new config 
currently requires changing many constructors because it introduces a new 
boolean.

We should move everything into a single class so that modifications are much 
less disruptive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4410) FilterList.filterKeyValue can return suboptimal ReturnCodes

2011-09-14 Thread Jonathan Gray (JIRA)

FilterList.filterKeyValue can return suboptimal ReturnCodes
---

 Key: HBASE-4410
 URL: https://issues.apache.org/jira/browse/HBASE-4410
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0


FilterList.filterKeyValue does not always return the most optimal ReturnCode in 
both the AND and OR conditions.

For example, if you have F1 AND F2, F1 returns SKIP.  It immediately returns 
the SKIP.  However, if F2 would have returned NEXT_COL or NEXT_ROW or 
SEEK_NEXT_USING_HINT, we would actually be able to return the more optimal 
ReturnCode from F2.

For AND conditions, we can always pick the *most restrictive* return code.

For OR conditions, we must always pick the *least restrictive* return code.

This JIRA is to review the FilterList.filterKeyValue() method to try and make 
it more optimal and to add a new unit test which verifies the correct behavior.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4410) FilterList.filterKeyValue can return suboptimal ReturnCodes

2011-09-14 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4410:
-

Attachment: HBASE-4410-v1.patch

Implements changes described in description and includes unit test.  New test 
and existing tests are passing, kicking off full suite now.

 FilterList.filterKeyValue can return suboptimal ReturnCodes
 ---

 Key: HBASE-4410
 URL: https://issues.apache.org/jira/browse/HBASE-4410
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4410-v1.patch


 FilterList.filterKeyValue does not always return the most optimal ReturnCode 
 in both the AND and OR conditions.
 For example, if you have F1 AND F2, F1 returns SKIP.  It immediately returns 
 the SKIP.  However, if F2 would have returned NEXT_COL or NEXT_ROW or 
 SEEK_NEXT_USING_HINT, we would actually be able to return the more optimal 
 ReturnCode from F2.
 For AND conditions, we can always pick the *most restrictive* return code.
 For OR conditions, we must always pick the *least restrictive* return code.
 This JIRA is to review the FilterList.filterKeyValue() method to try and make 
 it more optimal and to add a new unit test which verifies the correct 
 behavior.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4310) SlabCache metrics bugfix.

2011-09-14 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104949#comment-13104949
 ] 

Jonathan Gray commented on HBASE-4310:
--

Can someone explain the three commits on this JIRA?  Is the final commit from a 
different JIRA?  It has a different commit message name but is linked to this 
JIRA and there is nothing in CHANGES.txt and nothing here in the JIRA talking 
about the change?

 SlabCache metrics bugfix.
 -

 Key: HBASE-4310
 URL: https://issues.apache.org/jira/browse/HBASE-4310
 Project: HBase
  Issue Type: Sub-task
Reporter: Li Pi
Assignee: Li Pi
Priority: Minor
 Fix For: 0.92.0

 Attachments: metrics.txt, metrics.txt, metrics.txt, metricsv2.txt, 
 metricsv2.txt, metricsv3.txt


 math error in metrics makes it display incorrect metrics. also no longer logs 
 metrics of size 0 to save space. Also added second log for those things that 
 are successfully cached.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4310) SlabCache metrics bugfix.

2011-09-14 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104954#comment-13104954
 ] 

Jonathan Gray commented on HBASE-4310:
--

I see two separate lines for this JIRA in CHANGES as well.  Is this was 
prompted some of those discussions about multiple commits on a JIRA?  We should 
at least amend the CHANGES and commit message that it's a follow-up if nothing 
else.

 SlabCache metrics bugfix.
 -

 Key: HBASE-4310
 URL: https://issues.apache.org/jira/browse/HBASE-4310
 Project: HBase
  Issue Type: Sub-task
Reporter: Li Pi
Assignee: Li Pi
Priority: Minor
 Fix For: 0.92.0

 Attachments: metrics.txt, metrics.txt, metrics.txt, metricsv2.txt, 
 metricsv2.txt, metricsv3.txt


 math error in metrics makes it display incorrect metrics. also no longer logs 
 metrics of size 0 to save space. Also added second log for those things that 
 are successfully cached.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4320) Off Heap Cache never creates Slabs

2011-09-13 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103794#comment-13103794
 ] 

Jonathan Gray commented on HBASE-4320:
--

Looks like this was committed with HBASE-4027 in the message and not 
HBASE-4320.  Guess there's no way to retroactively fix that but in case anyone 
comes here looking for the revision info it's linked over in the other jira.

 Off Heap Cache never creates Slabs
 --

 Key: HBASE-4320
 URL: https://issues.apache.org/jira/browse/HBASE-4320
 Project: HBase
  Issue Type: Sub-task
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.92.0

 Attachments: confnotloading.txt


 On testing, the configuration file is never loaded by the off heap cache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4394) Add support for seeking hints to FilterList

2011-09-13 Thread Jonathan Gray (JIRA)

Add support for seeking hints to FilterList
---

 Key: HBASE-4394
 URL: https://issues.apache.org/jira/browse/HBASE-4394
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0


Currently FilterList's do not support getNextKeyHint() even if the underlying 
filters are giving hints.  We should add support for FilterList to pass these 
through.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList

2011-09-13 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4394:
-

Attachment: HBASE-4394-v1.patch

Adds support for seek hints to FilterList and adds a unit test to 
TestFilterList that ensures it does the right thing across the different 
variations of inputs to a filterlist.

 Add support for seeking hints to FilterList
 ---

 Key: HBASE-4394
 URL: https://issues.apache.org/jira/browse/HBASE-4394
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4394-v1.patch


 Currently FilterList's do not support getNextKeyHint() even if the underlying 
 filters are giving hints.  We should add support for FilterList to pass these 
 through.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList

2011-09-13 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4394:
-

Status: Patch Available  (was: Open)

 Add support for seeking hints to FilterList
 ---

 Key: HBASE-4394
 URL: https://issues.apache.org/jira/browse/HBASE-4394
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4394-v1.patch


 Currently FilterList's do not support getNextKeyHint() even if the underlying 
 filters are giving hints.  We should add support for FilterList to pass these 
 through.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList

2011-09-13 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4394:
-

Attachment: HBASE-4394-trunk-v2.patch

Rebased for trunk

 Add support for seeking hints to FilterList
 ---

 Key: HBASE-4394
 URL: https://issues.apache.org/jira/browse/HBASE-4394
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4394-trunk-v2.patch, HBASE-4394-v1.patch


 Currently FilterList's do not support getNextKeyHint() even if the underlying 
 filters are giving hints.  We should add support for FilterList to pass these 
 through.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4239) HBASE-4012 introduced duplicate variable Bytes.LONG_BYTES

2011-08-22 Thread Jonathan Gray (JIRA)

HBASE-4012 introduced duplicate variable Bytes.LONG_BYTES
-

 Key: HBASE-4239
 URL: https://issues.apache.org/jira/browse/HBASE-4239
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Ted Yu
Priority: Trivial
 Fix For: 0.92.0


HBASE-4012 introduced Bytes.LONG_SIZE.  This is a duplicate of 
Bytes.SIZEOF_LONG.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4239) HBASE-4012 introduced duplicate variable Bytes.LONG_BYTES

2011-08-22 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089149#comment-13089149
 ] 

Jonathan Gray commented on HBASE-4239:
--

+1

 HBASE-4012 introduced duplicate variable Bytes.LONG_BYTES
 -

 Key: HBASE-4239
 URL: https://issues.apache.org/jira/browse/HBASE-4239
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Ted Yu
Priority: Trivial
 Fix For: 0.92.0

 Attachments: 4239.txt


 HBASE-4012 introduced Bytes.LONG_SIZE.  This is a duplicate of 
 Bytes.SIZEOF_LONG.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-08-17 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086708#comment-13086708
]

Jonathan Gray commented on HBASE-4218:
--

bq. in the mean time there will be places it has to cut a full KeyValue by
copying bytes
Agreed. There's some other work going on around slab allocators and object
reuse that could be paired with this to ameliorate some of that overhead.

Delta Encoding of KeyValues (aka prefix compression)
-

Key: HBASE-4218
URL: https://issues.apache.org/jira/browse/HBASE-4218
Project: HBase
Issue Type: Improvement
Components: io
Reporter: Jacek Migdal
Labels: compression

A compression for keys. Keys are sorted in HFile and they are usually very
similar. Because of that, it is possible to design better compression than
general purpose algorithms,
It is an additional step designed to be used in memory. It aims to save
memory in cache as well as speeding seeks within HFileBlocks. It should
improve performance a lot, if key lengths are larger than value lengths. For
example, it makes a lot of sense to use it when value is a counter.
Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes)
shows that I could achieve decent level of compression:
key compression ratio: 92%
total compression ratio: 85%
LZO on the same data: 85%
LZO after delta encoding: 91%
While having much better performance (20-80% faster decompression ratio than
LZO). Moreover, it should allow far more efficient seeking which should
improve performance a bit.
It seems that a simple compression algorithms are good enough. Most of the
savings are due to prefix compression, int128 encoding, timestamp diffs and
bitfields to avoid duplication. That way, comparisons of compressed data can
be much faster than a byte comparator (thanks to prefix compression and
bitfields).
In order to implement it in HBase two important changes in design will be
needed:
-solidify interface to HFileBlock / HFileReader Scanner to provide seeking
and iterating; access to uncompressed buffer in HFileBlock will have bad
performance
-extend comparators to support comparison assuming that N first bytes are
equal (or some fields are equal)
Link to a discussion about something similar:
http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy

2011-08-12 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084333#comment-13084333
 ] 

Jonathan Gray commented on HBASE-4015:
--

Sorry I'm a little late to this discussion but I like the idea of not adding a 
new state.  Instead, we can just pass the znode version number in the RPC to 
the regionservers.  Or encode the servername in the znode.

 Refactor the TimeoutMonitor to make it less racy
 

 Key: HBASE-4015
 URL: https://issues.apache.org/jira/browse/HBASE-4015
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.90.3
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.92.0

 Attachments: HBASE-4015_1_trunk.patch, Timeoutmonitor with state 
 diagrams.pdf


 The current implementation of the TimeoutMonitor acts like a race condition 
 generator, mostly making things worse rather than better. It does it's own 
 thing for a while without caring for what's happening in the rest of the 
 master.
 The first thing that needs to happen is that the regions should not be 
 processed in one big batch, because that sometimes can take minutes to 
 process (meanwhile a region that timed out opening might have opened, then 
 what happens is it will be reassigned by the TimeoutMonitor generating the 
 never ending PENDING_OPEN situation).
 Those operations should also be done more atomically, although I'm not sure 
 how to do it in a scalable way in this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3899) enhance HBase RPC to support free-ing up server handler threads even if response is not ready

2011-07-27 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071861#comment-13071861
]

Jonathan Gray commented on HBASE-3899:
--

Test passes for me on trunk.

enhance HBase RPC to support free-ing up server handler threads even if
response is not ready
-

Key: HBASE-3899
URL: https://issues.apache.org/jira/browse/HBASE-3899
Project: HBase
Issue Type: Improvement
Components: ipc
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Fix For: 0.92.0

Attachments: HBASE-3899-2.patch, HBASE-3899.patch, asyncRpc.txt,
asyncRpc.txt

In the current implementation, the server handler thread picks up an item
from the incoming callqueue, processes it and then wraps the response as a
Writable and sends it back to the IPC server module. This wastes
thread-resources when the thread is blocked for disk IO (transaction logging,
read into block cache, etc).
It would be nice if we can make the RPC Server Handler threads pick up a call
from the IPC queue, hand it over to the application (e.g. HRegion), the
application can queue it to be processed asynchronously and send a response
back to the IPC server module saying that the response is not ready. The RPC
Server Handler thread is now ready to pick up another request from the
incoming callqueue. When the queued call is processed by the application, it
indicates to the IPC module that the response is now ready to be sent back to
the client.
The RPC client continues to experience the same behaviour as before. A RPC
client is synchronous and blocks till the response arrives.
This RPC enhancement allows us to do very powerful things with the
RegionServer. In future, we can make enhance the RegionServer's threading
model to a message-passing model for better performance. We will not be
limited by the number of threads in the RegionServer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4060) Making region assignment more robust

2011-07-24 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070218#comment-13070218
]

Jonathan Gray commented on HBASE-4060:
--

The primary difference between the suggestion by Eran and what is currently
implemented is that the per-region znodes are never deleted in Eran's design.
The existing implementation uses znodes to track regions that are currently in
transition. An assigned and open region doesn't have a znode (nor would an
unassigned and closed region of a disabled table).

Check out ZKAssign and AssignmentManager for details on how that works.

Making region assignment more robust

Key: HBASE-4060
URL: https://issues.apache.org/jira/browse/HBASE-4060
Project: HBase
Issue Type: Bug
Reporter: Ted Yu
Fix For: 0.92.0

From Eran Kutner:
My concern is that the region allocation process seems to rely too much on
timing considerations and doesn't seem to take enough measures to guarantee
conflicts do not occur. I understand that in a distributed environment, when
you don't get a timely response from a remote machine you can't know for
sure if it did or did not receive the request, however there are things that
can be done to mitigate this and reduce the conflict time significantly. For
example, when I run dbck it knows that some regions are multiply assigned,
the master could do the same and try to resolve the conflict. Another
approach would be to handle late responses, even if the response from the
remote machine arrives after it was assumed to be dead the master should
have enough information to know it had created a conflict by assigning the
region to another server. An even better solution, I think, is for the RS to
periodically test that it is indeed the rightful owner of every region it
holds and relinquish control over the region if it's not.
Obviously a state where two RSs hold the same region is pathological and can
lead to data loss, as demonstrated in my case. The system should be able to
actively protect itself against such a scenario. It probably doesn't need
saying but there is really nothing worse for a data storage system than data
loss.
In my case the problem didn't happen in the initial phase but after
disabling and enabling a table with about 12K regions.
For more background information, see 'Errors after major compaction'
discussion on u...@hbase.apache.org

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-07-21 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069204#comment-13069204
 ] 

Jonathan Gray commented on HBASE-3417:
--

It does support COW but if it doesn't include changes to how files are named, 
it will still need this fix.  Will follow-up with Mikhail.

 CacheOnWrite is using the temporary output path for block names, need to use 
 a more consistent block naming scheme
 --

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, 
 HBASE-3417-v5.patch


 Currently the block names used in the block cache are built using the 
 filesystem path.  However, for cache on write, the path is a temporary output 
 file.
 The original COW patch actually made some modifications to block naming stuff 
 to make it more consistent but did not do enough.  Should add a separate 
 method somewhere for generating block names using some more easily mocked 
 scheme (rather than just raw path as we generate a random unique file name 
 twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4084) Auto-Split runs only if there are many store files per region

2011-07-11 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13063476#comment-13063476
 ] 

Jonathan Gray commented on HBASE-4084:
--

I thought splits were triggered following a compaction not a flush?

 Auto-Split runs only if there are many store files per region
 -

 Key: HBASE-4084
 URL: https://issues.apache.org/jira/browse/HBASE-4084
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: John Heitmann

 Currently, MemStoreFlusher.flushRegion() is the driver of auto-splitting. It 
 only decides to auto-split a region if there are too many store files per 
 region. Since it's not guaranteed that the number of store files per region 
 always grows above the too many count before compaction reduces the count, 
 there is no guarantee that auto-split will ever happen. In my test setup, 
 compaction seems to always win the race and I haven't noticed auto-splitting 
 happen once.
 It appears that the intention is to have split be mutually exclusive with 
 compaction, and to have flushing be mutually exclusive with regions badly in 
 need of compaction, but that resulted in auto-splitting being nested in a 
 too-restrictive spot.
 I'm not sure what the right fix is. Having one method that is essentially 
 requestSplitOrCompact would probably help readability, and could be the 
 ultimate solution if it replaces other calls of requestCompaction().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4056) Support for using faster storage for write-ahead log

2011-07-08 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061786#comment-13061786
]

Jonathan Gray commented on HBASE-4056:
--

Thanks for opening this JIRA.

What do you see as the primary benefit of using flash for the WAL? I've seen
some improvement in sequential write throughput, but not drastically different.

It seems to me that a significant benefit of using flash is the fast random
read access, and there are no random reads on the WAL.

One idea that has floated around is to do something like cache-on-write to copy
recently written files onto flash (in addition to HDFS) to allow for fast
random read access. Or use flash as some kind of extension to the block cache.

But regardless, making all of this stuff configurable and supporting more
diverse setups is a good thing in general. Some experiments and benchmarks
around this would be awesome. Good stuff.

Support for using faster storage for write-ahead log

Key: HBASE-4056
URL: https://issues.apache.org/jira/browse/HBASE-4056
Project: HBase
Issue Type: New Feature
Reporter: Praveen Kumar
Priority: Minor
Labels: features

On clusters with heterogeneous storage components like hard drives and flash
memory, it could be beneficial to use flash memory for write-ahead log. This
can be accomplished by using client side mount table support (HADOOP-7257)
that is offered by HDFS federation (HDFS-1052) feature. One can define two
HDFS namespaces (faster and slower), and configure HBase to use faster
storage namespace for storing WAL.
This is an abstract task that captures the idea. More brainstorming and
subtasks identification to follow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version

2011-07-07 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061102#comment-13061102
 ] 

Jonathan Gray commented on HBASE-4071:
--

I like this idea.  It's somewhat related to an idea for a TTKAV 
(TimeToKeepAllValues) parameter that would allow a point-in-time 
SnapshotScanner.  See HBASE-2376

 Data GC: Remove all versions  TTL EXCEPT the last written version
 --

 Key: HBASE-4071
 URL: https://issues.apache.org/jira/browse/HBASE-4071
 Project: HBase
  Issue Type: New Feature
Reporter: stack

 We were chatting today about our backup cluster.  What we want is to be able 
 to restore the dataset from any point of time but only within a limited 
 timeframe -- say one week.  Thereafter, if the versions are older than one 
 week, rather than as we do with TTL where we let go of all versions older 
 than TTL, instead, let go of all versions EXCEPT the last one written.  So, 
 its like versions==1 when TTL  one week.  We want to allow that if an error 
 is caught within a week of its happening -- user mistakenly removes a 
 critical table -- then we'll be able to restore up the the moment just before 
 catastrophe hit otherwise, we keep one version only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4060) Making region assignment more robust

2011-07-05 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060183#comment-13060183
]

Jonathan Gray commented on HBASE-4060:
--

Andrew, we are already doing something like what you describe. It seems the
issue is what Ted describes in #2 but it's not clear to me how this bug is
being triggered.

In TimeoutMonitor, we attempt to do an atomic change of state from OPENING to
OFFLINE. If this fails, we don't do anything. If it succeeds, we attempt to
do a reassign.

In OpenRegionHandler (in the RS), we attempt an atomic change of state from
OPENING to OPENED. If this fails, we roll back our open. If it succeeds, we
are opened and the node is at OPENED.

In OpenedRegionHandler (in the master), the first thing we do is delete a node
but only if in OPENED state. If the TimeoutMonitor had done anything, it would
have switched the state to OFFLINE.

What am I missing?

Making region assignment more robust

Key: HBASE-4060
URL: https://issues.apache.org/jira/browse/HBASE-4060
Project: HBase
Issue Type: Bug
Reporter: Ted Yu
Fix For: 0.92.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache

2011-06-24 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054639#comment-13054639
 ] 

Jonathan Gray commented on HBASE-4027:
--

In the new HFile v2 over in HBASE-3857 the block cache interface changes from 
ByteBuffer to HeapSize.  So you can now put anything you want into the cache 
that implements HeapSize (there is a new HFileBlock that is used in HFile v2).

One big question is whether you're going to make copies out of the direct byte 
buffers on each read of that block, or if you're going to change KeyValue to 
use the ByteBuffer interface (or some other) instead of the byte[] directly.  
With a DBB you can't get access to an underlying byte[].

 Enable direct byte buffers LruBlockCache
 

 Key: HBASE-4027
 URL: https://issues.apache.org/jira/browse/HBASE-4027
 Project: HBase
  Issue Type: Improvement
Reporter: Jason Rutherglen
Priority: Minor

 Java offers the creation of direct byte buffers which are allocated outside 
 of the heap.
 They need to be manually free'd, which can be accomplished using an 
 documented {{clean}} method.
 The feature will be optional.  After implementing, we can benchmark for 
 differences in speed and garbage collection observances.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4018) Attach memcached as secondary block cache to regionserver

2011-06-23 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054172#comment-13054172
]

Jonathan Gray commented on HBASE-4018:
--

bq. in many cases the CPU overhead dwarfs (or should) the extra RAM consumption
from uncompressing into heap space.

This is not necessarily the case. Many applications see 4-5X compression ratio
and it means being able to increase your cache capacity by that much. Some
applications can also be CPU bound, or the might be IO bound, or they might
actually be IO bound because they are RAM bound (can't fit working set in
memory). In general, it's hard to generalize here I think.

bq. Perhaps it's easily offset with a less intensive comp algorithm.

That's one of the major motivations for an hbase-specific prefix compression
algorithm

Attach memcached as secondary block cache to regionserver
-

Key: HBASE-4018
URL: https://issues.apache.org/jira/browse/HBASE-4018
Project: HBase
Issue Type: Improvement
Components: regionserver
Reporter: Li Pi
Assignee: Li Pi

Currently, block caches are limited by heap size, which is limited by garbage
collection times in Java.
We can get around this by using memcached w/JNI as a secondary block cache.
This should be faster than the linux file system's caching, and allow us to
very quickly gain access to a high quality slab allocated cache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4017) BlockCache interface should be truly modular

2011-06-22 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053439#comment-13053439
 ] 

Jonathan Gray commented on HBASE-4017:
--

+1

FYI, in the upcoming HFile v2 stuff, there is a change in the block cache 
interface so that instead of ByteBuffer it takes HeapSize (so basically, any 
heap-size-aware structure).

 BlockCache interface should be truly modular
 

 Key: HBASE-4017
 URL: https://issues.apache.org/jira/browse/HBASE-4017
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Li Pi

 Currently, the if the BlockCache that used isn't an LruBlockCache, somewhere 
 in metrics will try to cast it to an LruBlockCache and cause an exception. 
 The code should be modular enough to allow for the use of different block 
 caches without throwing an exception.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4018) Attach memcached as secondary block cache to regionserver

2011-06-22 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053446#comment-13053446
 ] 

Jonathan Gray commented on HBASE-4018:
--

The perf gain over the FS caching would be less-so if using short-circuited 
local reads.  But anything that bypasses the DataNode is great for random read 
perf.

Even still, making a copy out of in-process memory should be faster than linux 
fs caching.

 Attach memcached as secondary block cache to regionserver
 -

 Key: HBASE-4018
 URL: https://issues.apache.org/jira/browse/HBASE-4018
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Li Pi
Assignee: Li Pi

 Currently, block caches are limited by heap size, which is limited by garbage 
 collection times in Java.
 We can get around this by using memcached w/JNI as a secondary block cache. 
 This should be faster than the linux file system's caching, and allow us to 
 very quickly gain access to a high quality slab allocated cache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4018) Attach memcached as secondary block cache to regionserver

2011-06-22 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053486#comment-13053486
]

Jonathan Gray commented on HBASE-4018:
--

bq. Optimal solution would be building a slab allocated block cache within
java. Use reference counting for a zero copy solution. This is difficult to
implement and debug though.

I'm working on this. I think implementing both directions is worthwhile and we
can run good comparisons (including against linux fs cache + local datanodes).

bq. It would seem best to move in the direction of local HDFS file access and
allow plugging in the block cache as a point of comparison / legacy.

I think it's best to move in all directions and do comparisons. I've already
seen performance differences between fs cache and the actual hbase block cache.
There's also compressed vs. decompressed (fs cache will always be compressed)

Attach memcached as secondary block cache to regionserver
-

Key: HBASE-4018
URL: https://issues.apache.org/jira/browse/HBASE-4018
Project: HBase
Issue Type: Improvement
Components: regionserver
Reporter: Li Pi
Assignee: Li Pi

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3340) Eventually Consistent Secondary Indexing via Coprocessors

2011-06-20 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052307#comment-13052307
 ] 

Jonathan Gray commented on HBASE-3340:
--

I'm not actively working on this but it's also a potential intern project at 
fb.  A code drop on GitHub would be great and maybe we can work together.  
There are quite a few alternative directions to go for indexing.  And an 
endless amount of development that could be done around APIs, schemas, filters, 
etc.  So the more the merrier.

The basic design I was thinking would be something similar to google percolator 
or what the Lily guys are doing 
(http://www.lilyproject.org/lily/about/playground/hbaserowlog/version/1)

 Eventually Consistent Secondary Indexing via Coprocessors
 -

 Key: HBASE-3340
 URL: https://issues.apache.org/jira/browse/HBASE-3340
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Jonathan Gray
Assignee: Jonathan Gray

 Secondary indexing support via coprocessors with an eventual consistency 
 guarantee.  Design to come.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3945) Load balancer shouldn't move the same region in two consective balancing actions

2011-06-03 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044017#comment-13044017
 ] 

Jonathan Gray commented on HBASE-3945:
--

I worry about this approach of more and more knobs, especially when they don't 
directly address what a good/bad load balance really is.

If a region gets moved in two consecutive balancing actions, then something is 
wrong with the balancer in the first place.  While I agree in principle that 
regions moving multiple times and quickly is not desirable, this will be a 
common outcome if the balancing algorithm isn't already taking into account 
metrics over time (rather than short snapshots).  If we're using load but then 
adding all these limits/controls, it's hard to ever understand the behavior of 
the balancer.

 Load balancer shouldn't move the same region in two consective balancing 
 actions
 

 Key: HBASE-3945
 URL: https://issues.apache.org/jira/browse/HBASE-3945
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu

 Keeping a region on the same region server would give good stability for 
 active scanners.
 We shouldn't reassign the same region in two successive calls to 
 balanceCluster().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3947) SplitLog in HMaster spend long time, move it to regionserver

2011-06-01 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray resolved HBASE-3947.
--

Resolution: Duplicate

This was implemented over in HBASE-1364 and committed into trunk.

 SplitLog in HMaster spend long time, move it to regionserver
 

 Key: HBASE-3947
 URL: https://issues.apache.org/jira/browse/HBASE-3947
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, zookeeper
Reporter: mingjian
 Fix For: 0.90.4


 One of our 100 nodes cluster crashed by namenode crash.
 We restarted and found it spend about two and a half hours to split hlogs.
 After crashed, there are about 3,500 hfiles in /hbase/.logs/. Split 1 of them 
 need about 2~3 seconds.
 SplitLog works in a single thread of HMaster. Why not move it to 
 regionservers? And HMaster only creates split plans and notifies regionserver 
 through zookeeper.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3732) New configuration option for client-side compression

2011-06-01 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042595#comment-13042595
]

Jonathan Gray commented on HBASE-3732:
--

I agree that value compression is easily done at the application level. In
cases where you have very large values, compressing that data is something you
should always be thinking about.

Published or contributed code samples could go a long way. Are there things we
could add in Put/Get to make this kind of stuff easily pluggable?

If it can be integrated simply, then this might be okay, but it should probably
be part of a larger conversation about compression. And anything that touches
KV needs to be thought through.

I think there could be some substantial savings in hbase-specific prefix or
row/family/qualifier compression, both on-disk and in-memory. One idea there
would require some complicating of KeyValue and its comparator, or a simpler
solution would require short-term memory allocations to reconstitute KVs as
they make their way through the KVHeap/KVScanner.

I've also done some work on supporting a two-level compressed/uncompressed
block cache patch (with lzo). I'm waiting to finish until HBASE-3857 goes in
as it adds some things that make life easier in the HFile code.

New configuration option for client-side compression

Key: HBASE-3732
URL: https://issues.apache.org/jira/browse/HBASE-3732
Project: HBase
Issue Type: New Feature
Reporter: Jean-Daniel Cryans
Fix For: 0.92.0

Attachments: compressed_streams.jar

We have a case here where we have to store very fat cells (arrays of
integers) which can amount into the hundreds of KBs that we need to read
often, concurrently, and possibly keep in cache. Compressing the values on
the client using java.util.zip's Deflater before sending them to HBase proved
to be in our case almost an order of magnitude faster.
There reasons are evident: less data sent to hbase, memstore contains
compressed data, block cache contains compressed data too, etc.
I was thinking that it might be something useful to add to a family schema,
so that Put/Result do the conversion for you. The actual compression algo
should also be configurable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3725) HBase increments from old value after delete and write to disk

2011-04-28 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3725:
-

Attachment: HBASE-3725-v3.patch

This fixes the problem in the only simple way I could think of.

A new configuration option is added hbase.hregion.increment.supportdeletes 
which defaults to true (because it is required for correctness).

When this option is true, then when the scan against StoreFiles is done, it 
will also include the MemStore.  This should ensure correctness for cases where 
delete markers are present in the MemStore that need to apply to KVs in the 
StoreFiles.

I made this a configuration option because it makes increment operations less 
optimal, so for increment workloads that do not need to support deletes, they 
can keep the option turned off and avoid the double scan of the MemStore.

A potential optimal and correct solution to this could be to use the old Get 
delete tracker which would retain delete information across files (for in-order 
file processing rather than one mega merge).  Some work is going into 
re-integrating those, so if they do make it back in the HBase, we could utilize 
them here.

This should suffice for now.

 HBase increments from old value after delete and write to disk
 --

 Key: HBASE-3725
 URL: https://issues.apache.org/jira/browse/HBASE-3725
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.1
Reporter: Nathaniel Cook
 Attachments: HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, 
 HBASE-3725.patch


 Deleted row values are sometimes used for starting points on new increments.
 To reproduce:
 Create a row r. Set column x to some default value.
 Force hbase to write that value to the file system (such as restarting the 
 cluster).
 Delete the row.
 Call table.incrementColumnValue with some_value
 Get the row.
 The returned value in the column was incremented from the old value before 
 the row was deleted instead of being initialized to some_value.
 Code to reproduce:
 {code}
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.HColumnDescriptor;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Get;
 import org.apache.hadoop.hbase.client.HBaseAdmin;
 import org.apache.hadoop.hbase.client.HTableInterface;
 import org.apache.hadoop.hbase.client.HTablePool;
 import org.apache.hadoop.hbase.client.Increment;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.util.Bytes;
 public class HBaseTestIncrement
 {
   static String tableName  = testIncrement;
   static byte[] infoCF = Bytes.toBytes(info);
   static byte[] rowKey = Bytes.toBytes(test-rowKey);
   static byte[] newInc = Bytes.toBytes(new);
   static byte[] oldInc = Bytes.toBytes(old);
   /**
* This code reproduces a bug with increment column values in hbase
* Usage: First run part one by passing '1' as the first arg
*Then restart the hbase cluster so it writes everything to disk
*Run part two by passing '2' as the first arg
*
* This will result in the old deleted data being found and used for 
 the increment calls
*
* @param args
* @throws IOException
*/
   public static void main(String[] args) throws IOException
   {
   if(1.equals(args[0]))
   partOne();
   if(2.equals(args[0]))
   partTwo();
   if (both.equals(args[0]))
   {
   partOne();
   partTwo();
   }
   }
   /**
* Creates a table and increments a column value 10 times by 10 each 
 time.
* Results in a value of 100 for the column
*
* @throws IOException
*/
   static void partOne()throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HBaseAdmin admin = new HBaseAdmin(conf);
   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
   tableDesc.addFamily(new HColumnDescriptor(infoCF));
   if(admin.tableExists(tableName))
   {
   admin.disableTable(tableName);
   admin.deleteTable(tableName);
   }
   admin.createTable(tableDesc);
   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
   //Increment unitialized column
   for (int j = 0; j  10; j++)

[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-04-18 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021160#comment-13021160
]

Jonathan Gray commented on HBASE-1364:
--

Great work Prakash!

[performance] Distributed splitting of regionserver commit logs
---

Key: HBASE-1364
URL: https://issues.apache.org/jira/browse/HBASE-1364
Project: HBase
Issue Type: Improvement
Components: coprocessors
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
Fix For: 0.92.0

Attachments: 1364-v5.txt, HBASE-1364.patch,
org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt

Time Spent: 8h
Remaining Estimate: 0h

HBASE-1008 has some improvements to our log splitting on regionserver crash;
but it needs to run even faster.
(Below is from HBASE-1008)
In bigtable paper, the split is distributed. If we're going to have 1000
logs, we need to distribute or at least multithread the splitting.
1. As is, regions starting up expect to find one reconstruction log only.
Need to make it so pick up a bunch of edit logs and it should be fine that
logs are elsewhere in hdfs in an output directory written by all split
participants whether multithreaded or a mapreduce-like distributed process
(Lets write our distributed sort first as a MR so we learn whats involved;
distributed sort, as much as possible should use MR framework pieces). On
startup, regions go to this directory and pick up the files written by split
participants deleting and clearing the dir when all have been read in. Making
it so can take multiple logs for input, can also make the split process more
robust rather than current tenuous process which loses all edits if it
doesn't make it to the end without error.
2. Each column family rereads the reconstruction log to find its edits. Need
to fix that. Split can sort the edits by column family so store only reads
its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2256) Delete row, followed quickly to put of the same row will sometimes fail.

2011-04-07 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017148#comment-13017148
 ] 

Jonathan Gray commented on HBASE-2256:
--

I think this would be a hacky non-solution, regardless of whether it's epoch 
nanos or not.

 Delete row, followed quickly to put of the same row will sometimes fail.
 

 Key: HBASE-2256
 URL: https://issues.apache.org/jira/browse/HBASE-2256
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.20.3
Reporter: Clint Morgan
 Attachments: hbase-2256.patch


 Doing a Delete of a whole row, followed immediately by a put to that row will 
 sometimes miss a cell. Attached is a test to provoke the issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3729) Get cells via shell with a time range predicate

2011-04-05 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016245#comment-13016245
 ] 

Jonathan Gray commented on HBASE-3729:
--

I think the default behavior of the shell should be the default behavior of the 
client, which is 1 version unless specified otherwise.  Specifying a time range 
and wanting the most recent from within that range is a valid and somewhat 
common use case.

 Get cells via shell with a time range predicate
 ---

 Key: HBASE-3729
 URL: https://issues.apache.org/jira/browse/HBASE-3729
 Project: HBase
  Issue Type: New Feature
  Components: shell
Reporter: Eric Charles
Assignee: Ted Yu
 Attachments: 3729-v2.txt, 3729-v3.txt, 3729.txt


 HBase shell allows to specify a timestamp to get a value
 - get 't1', 'r1', {COLUMN = 'c1', TIMESTAMP = ts1}
 If you don't give the exact timestamp, you get nothing... so it's difficult 
 to get the cell previous versions.
 It would be fine to have a time range predicate based get.
 The shell syntax could be (depending on technical feasibility)
 - get 't1', 'r1', {COLUMN = 'c1', TIMERANGE = (start_timestamp, 
 end_timestamp)}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3729) Get cells via shell with a time range predicate

2011-04-05 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016248#comment-13016248
 ] 

Jonathan Gray commented on HBASE-3729:
--

HTable (Get/Scan) default is 1 version not 3 versions.  I think you are 
thinking of the HColumnDescriptor default.

 Get cells via shell with a time range predicate
 ---

 Key: HBASE-3729
 URL: https://issues.apache.org/jira/browse/HBASE-3729
 Project: HBase
  Issue Type: New Feature
  Components: shell
Reporter: Eric Charles
Assignee: Ted Yu
 Attachments: 3729-v2.txt, 3729-v3.txt, 3729-v4.txt, 3729.txt


 HBase shell allows to specify a timestamp to get a value
 - get 't1', 'r1', {COLUMN = 'c1', TIMESTAMP = ts1}
 If you don't give the exact timestamp, you get nothing... so it's difficult 
 to get the cell previous versions.
 It would be fine to have a time range predicate based get.
 The shell syntax could be (depending on technical feasibility)
 - get 't1', 'r1', {COLUMN = 'c1', TIMERANGE = (start_timestamp, 
 end_timestamp)}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk

2011-04-04 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015683#comment-13015683
 ] 

Jonathan Gray commented on HBASE-3725:
--

Hey Nathaniel.  Thanks for posting the unit test!

I will take a look at this sometime this week and try to get a fix out for it.

 HBase increments from old value after delete and write to disk
 --

 Key: HBASE-3725
 URL: https://issues.apache.org/jira/browse/HBASE-3725
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.1
Reporter: Nathaniel Cook
 Attachments: HBASE-3725.patch


 Deleted row values are sometimes used for starting points on new increments.
 To reproduce:
 Create a row r. Set column x to some default value.
 Force hbase to write that value to the file system (such as restarting the 
 cluster).
 Delete the row.
 Call table.incrementColumnValue with some_value
 Get the row.
 The returned value in the column was incremented from the old value before 
 the row was deleted instead of being initialized to some_value.
 Code to reproduce:
 {code}
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.HColumnDescriptor;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Get;
 import org.apache.hadoop.hbase.client.HBaseAdmin;
 import org.apache.hadoop.hbase.client.HTableInterface;
 import org.apache.hadoop.hbase.client.HTablePool;
 import org.apache.hadoop.hbase.client.Increment;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.util.Bytes;
 public class HBaseTestIncrement
 {
   static String tableName  = testIncrement;
   static byte[] infoCF = Bytes.toBytes(info);
   static byte[] rowKey = Bytes.toBytes(test-rowKey);
   static byte[] newInc = Bytes.toBytes(new);
   static byte[] oldInc = Bytes.toBytes(old);
   /**
* This code reproduces a bug with increment column values in hbase
* Usage: First run part one by passing '1' as the first arg
*Then restart the hbase cluster so it writes everything to disk
*Run part two by passing '2' as the first arg
*
* This will result in the old deleted data being found and used for 
 the increment calls
*
* @param args
* @throws IOException
*/
   public static void main(String[] args) throws IOException
   {
   if(1.equals(args[0]))
   partOne();
   if(2.equals(args[0]))
   partTwo();
   if (both.equals(args[0]))
   {
   partOne();
   partTwo();
   }
   }
   /**
* Creates a table and increments a column value 10 times by 10 each 
 time.
* Results in a value of 100 for the column
*
* @throws IOException
*/
   static void partOne()throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HBaseAdmin admin = new HBaseAdmin(conf);
   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
   tableDesc.addFamily(new HColumnDescriptor(infoCF));
   if(admin.tableExists(tableName))
   {
   admin.disableTable(tableName);
   admin.deleteTable(tableName);
   }
   admin.createTable(tableDesc);
   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
   //Increment unitialized column
   for (int j = 0; j  10; j++)
   {
   table.incrementColumnValue(rowKey, infoCF, oldInc, 
 (long)10);
   Increment inc = new Increment(rowKey);
   inc.addColumn(infoCF, newInc, (long)10);
   table.increment(inc);
   }
   Get get = new Get(rowKey);
   Result r = table.get(get);
   System.out.println(initial values: new  + 
 Bytes.toLong(r.getValue(infoCF, newInc)) +  old  + 
 Bytes.toLong(r.getValue(infoCF, oldInc)));
   }
   /**
* First deletes the data then increments the column 10 times by 1 each 
 time
*
* Should result in a value of 10 but it doesn't, it results in a 
 values of 110
*
* @throws IOException
*/
   static void partTwo()throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HTablePool pool = new

[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match

2011-04-01 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014748#comment-13014748
 ] 

Jonathan Gray commented on HBASE-3562:
--

Thanks for looking into this Evert.  This is definitely some tricky stuff.

A few comments on your patch...

- Our convention in conditionals is to put the variable first.  I find it a 
little tricky to read the code when the constant is first.  For example:
{code}
if (MatchCode.INCLUDE == mc)
{code}
should be
{code}
if (mc == MatchCode.INCLUDE)
{code}
(And all the other places where you have this type of logic)

- The unit test {{TestColumnMatchAndFilterOrder}} is clever how you check 
correctness, but I think it would be good to actually do a read query and 
verify the results for a few different combinations of the query to prove 
correctness of the overall algorithm.  Other changes to SQM down the road might 
change more behavior / order of operations, so this test may no longer apply or 
give full coverage for correctness.  Having some tests which don't rely on the 
precise server-side interactions but rather confirm the end results will be 
more applicable as we move forward.

- You have some lines that are  80 characters, especially in some of the 
javadoc.  Just wrap that so all lines are = 80 chars.

- There was a comment in SQM that described why the filter was checked first.  
Can you write some inline comments to describe how this works now?  There are a 
couple lines at the end but it will be useful to have some explanation on why 
this has changed and what the behavior is now.

- Is there any particular reason that you had includeLatestColumn take 
timestamp as a parameter?  The timestamp is passed in the check call, and we 
could just hang on to that.  It just feels a little strange to me since you 
should never pass a different timestamp, and the tracker can know which was the 
latest column.

Overall this is really solid!  Great work Evert!

 ValueFilter is being evaluated before performing the column match
 -

 Key: HBASE-3562
 URL: https://issues.apache.org/jira/browse/HBASE-3562
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.90.0
Reporter: Evert Arckens
 Attachments: HBASE-3562.patch


 When performing a Get operation where a both a column is specified and a 
 ValueFilter, the ValueFilter is evaluated before making the column match as 
 is indicated in the javadoc of Get.setFilter()  :  {@link 
 Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column 
 match, deletes and max versions have been run. 
 The is shown in the little test below, which uses a TestComparator extending 
 a WritableByteArrayComparable.
 public void testFilter() throws Exception {
   byte[] cf = Bytes.toBytes(cf);
   byte[] row = Bytes.toBytes(row);
   byte[] col1 = Bytes.toBytes(col1);
   byte[] col2 = Bytes.toBytes(col2);
   Put put = new Put(row);
   put.add(cf, col1, new byte[]{(byte)1});
   put.add(cf, col2, new byte[]{(byte)2});
   table.put(put);
   Get get = new Get(row);
   get.addColumn(cf, col2); // We only want to retrieve col2
   TestComparator testComparator = new TestComparator();
   Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator);
   get.setFilter(filter);
   Result result = table.get(get);
 }
 public class TestComparator extends WritableByteArrayComparable {
 /**
  * Nullary constructor, for Writable
  */
 public TestComparator() {
 super();
 }
 
 @Override
 public int compareTo(byte[] theirValue) {
 if (theirValue[0] == (byte)1) {
 // If the column match was done before evaluating the filter, we 
 should never get here.
 throw new RuntimeException(I only expect (byte)2 in col2, not 
 (byte)1 from col1);
 }
 if (theirValue[0] == (byte)2) {
 return 0;
 }
 else return 1;
 }
 }
 When only one column should be retrieved, this can be worked around by using 
 a SingleColumnValueFilter instead of the ValueFilter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match

2011-03-25 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011248#comment-13011248
 ] 

Jonathan Gray commented on HBASE-3562:
--

The counter in ColumnTracker is responsible for tracking setMaxVersions.  You 
may have queried for only the latest version, so once the ColumnTracker sees a 
given column, it will reject subsequent version of that columns.  Currently 
there's no way for the CT to know that subsequent filters actually prevented it 
from being returned so it should not be included in the count of returned 
versions.

We would need to introduce something like {{skippedPreviousKeyValue}} that 
could be sent back to the CT so it could undo the previous count.

 ValueFilter is being evaluated before performing the column match
 -

 Key: HBASE-3562
 URL: https://issues.apache.org/jira/browse/HBASE-3562
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.90.0
Reporter: Evert Arckens

 When performing a Get operation where a both a column is specified and a 
 ValueFilter, the ValueFilter is evaluated before making the column match as 
 is indicated in the javadoc of Get.setFilter()  :  {@link 
 Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column 
 match, deletes and max versions have been run. 
 The is shown in the little test below, which uses a TestComparator extending 
 a WritableByteArrayComparable.
 public void testFilter() throws Exception {
   byte[] cf = Bytes.toBytes(cf);
   byte[] row = Bytes.toBytes(row);
   byte[] col1 = Bytes.toBytes(col1);
   byte[] col2 = Bytes.toBytes(col2);
   Put put = new Put(row);
   put.add(cf, col1, new byte[]{(byte)1});
   put.add(cf, col2, new byte[]{(byte)2});
   table.put(put);
   Get get = new Get(row);
   get.addColumn(cf, col2); // We only want to retrieve col2
   TestComparator testComparator = new TestComparator();
   Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator);
   get.setFilter(filter);
   Result result = table.get(get);
 }
 public class TestComparator extends WritableByteArrayComparable {
 /**
  * Nullary constructor, for Writable
  */
 public TestComparator() {
 super();
 }
 
 @Override
 public int compareTo(byte[] theirValue) {
 if (theirValue[0] == (byte)1) {
 // If the column match was done before evaluating the filter, we 
 should never get here.
 throw new RuntimeException(I only expect (byte)2 in col2, not 
 (byte)1 from col1);
 }
 if (theirValue[0] == (byte)2) {
 return 0;
 }
 else return 1;
 }
 }
 When only one column should be retrieved, this can be worked around by using 
 a SingleColumnValueFilter instead of the ValueFilter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function

2011-03-25 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011452#comment-13011452
 ] 

Jonathan Gray commented on HBASE-3694:
--

Do we really want to put things like this into RegionServerMetrics?  That class 
is a mess and is currently only used for the publishing of our metrics (not 
used for internal state tracking).  And we should avoid the hadoop Metrics* 
classes like the plague... heavily synchronized and generally confusing.

My vote would be to add a new class, maybe {{RegionServerHeapManager}} or 
something like that... might be a good opportunity to cleanup and centralize 
the code related to that.  But could just hold this one AtomicLong for now.  
Agree that adding a new interface method just for the long is not ideal since 
it buys us nothing down the road.  Better to add something new that we can use 
later.

 high multiput latency due to checking global mem store size in a synchronized 
 function
 --

 Key: HBASE-3694
 URL: https://issues.apache.org/jira/browse/HBASE-3694
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, 
 Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch


 The problem is we found the multiput latency is very high.
 In our case, we have almost 22 Regions in each RS and there are no flush 
 happened during these puts.
 After investigation, we believe that the root cause is the function 
 getGlobalMemStoreSize, which is to check the high water mark of mem store. 
 This function takes almost 40% of total execution time of multiput when 
 instrumenting some metrics in the code.  
 The actual percentage may be more higher. The execution time is spent on 
 synchronize contention.
 One solution is to keep a static var in HRegion to keep the global MemStore 
 size instead of calculating them every time.
 Why using static variable?
 Since all the HRegion objects in the same JVM share the same memory heap, 
 they need to share fate as well.
 The static variable, globalMemStroeSize, naturally shows the total mem usage 
 in this shared memory heap for this JVM.
 If multiple RS need to run in the same JVM, they still need only one 
 globalMemStroeSize.
 If multiple RS run on different JVMs, everything is fine.
 After changing, in our cases, the avg multiput latency decrease from 60ms to 
 10ms.
 I will submit a patch based on the current trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3669) Region in PENDING_OPEN keeps being bounced between RS and master

2011-03-24 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010796#comment-13010796
 ] 

Jonathan Gray commented on HBASE-3669:
--

When I've seen this happen, there has been another RS cutting in and 
transferring to OPENING.

As someone in the other JIRA indicates, this kind of thing can happen when one 
of the RS is unable to open the region because it doesn't have the proper 
compression lib or some DFS error.

If the master successfully transfers to OFFLINE and the RS sees it as OPENING, 
then almost certainly there's another RS that has gotten in the way.

The contents of the RIT znode actually contains serverName, so we should 
probably add additional debug information when the state transfer fails.  
(Unable to go from OFFLINE to OPENING because already in OPENING by server 
#serverName#)

 Region in PENDING_OPEN keeps being bounced between RS and master
 

 Key: HBASE-3669
 URL: https://issues.apache.org/jira/browse/HBASE-3669
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.90.2


 After going crazy killing region servers after HBASE-3668, most of the 
 cluster recovered except for 3 regions that kept being refused by the region 
 servers.
 One the master I would see:
 {code}
 2011-03-17 22:23:14,828 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  state=PENDING_OPEN, ts=1300400554826
 2011-03-17 22:23:14,828 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_OPEN for too long, reassigning 
 region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  state=PENDING_OPEN, ts=1300400554826
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for 
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  so generated a random one; 
 hri=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.,
  src=, dest=sv2borg171,60020,1300399357135; 17 (online=17, exclude=null) 
 available servers
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  to sv2borg171,60020,1300399357135
 {code}
 Then on the region server:
 {code}
 2011-03-17 22:23:14,829 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x22d627c142707d2 Attempting to transition node 
 f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to 
 RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 regionserver:60020-0x22d627c142707d2 Retrieved 166 byte(s) of data from znode 
 /hbase/unassigned/f11849557c64c4efdbe0498f3fe97a21; 
 data=region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.,
  server=sv2borg180,60020,1300384550966, state=RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x22d627c142707d2 Attempt to transition the unassigned 
 node for f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to 
 RS_ZK_REGION_OPENING failed, the node existed but was in the state 
 RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
 transition from OFFLINE to OPENING for region=f11849557c64c4efdbe0498f3fe97a21
 {code}
 I'm not sure I fully understand what was going on... the master was suppose 
 to OFFLINE the znode but then that's not what the region server was seeing? 
 In any case, I was able to recover by doing a force unassign for each region 
 and then assign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3669) Region in PENDING_OPEN keeps being bounced between RS and master

2011-03-24 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3669:
-

Attachment: HBASE-3669-debug-v1.patch

Adds more debug

 Region in PENDING_OPEN keeps being bounced between RS and master
 

 Key: HBASE-3669
 URL: https://issues.apache.org/jira/browse/HBASE-3669
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.90.2

 Attachments: HBASE-3669-debug-v1.patch


 After going crazy killing region servers after HBASE-3668, most of the 
 cluster recovered except for 3 regions that kept being refused by the region 
 servers.
 One the master I would see:
 {code}
 2011-03-17 22:23:14,828 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  state=PENDING_OPEN, ts=1300400554826
 2011-03-17 22:23:14,828 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_OPEN for too long, reassigning 
 region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  state=PENDING_OPEN, ts=1300400554826
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for 
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  so generated a random one; 
 hri=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.,
  src=, dest=sv2borg171,60020,1300399357135; 17 (online=17, exclude=null) 
 available servers
 2011-03-17 22:23:14,828 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
  to sv2borg171,60020,1300399357135
 {code}
 Then on the region server:
 {code}
 2011-03-17 22:23:14,829 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x22d627c142707d2 Attempting to transition node 
 f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to 
 RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 regionserver:60020-0x22d627c142707d2 Retrieved 166 byte(s) of data from znode 
 /hbase/unassigned/f11849557c64c4efdbe0498f3fe97a21; 
 data=region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.,
  server=sv2borg180,60020,1300384550966, state=RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x22d627c142707d2 Attempt to transition the unassigned 
 node for f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to 
 RS_ZK_REGION_OPENING failed, the node existed but was in the state 
 RS_ZK_REGION_OPENING
 2011-03-17 22:23:14,832 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
 transition from OFFLINE to OPENING for region=f11849557c64c4efdbe0498f3fe97a21
 {code}
 I'm not sure I fully understand what was going on... the master was suppose 
 to OFFLINE the znode but then that's not what the region server was seeing? 
 In any case, I was able to recover by doing a force unassign for each region 
 and then assign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3627) NPE in EventHandler when region already reassigned

2011-03-24 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010807#comment-13010807
 ] 

Jonathan Gray commented on HBASE-3627:
--

looks good, +1

 NPE in EventHandler when region already reassigned
 --

 Key: HBASE-3627
 URL: https://issues.apache.org/jira/browse/HBASE-3627
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Critical
 Fix For: 0.90.2

 Attachments: 3627.txt


 When a region takes too long to open, it will try to update the unassigned 
 znode and will fail on an ugly NPE like this:
 {quote}
 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x22dc571dde04ca7 Attempting to transition node 
 0519dc3b62a569347526875048c37faa from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 regionserver:60020-0x22dc571dde04ca7 Unable to get data of znode 
 /hbase/unassigned/0519dc3b62a569347526875048c37faa because node does not 
 exist (not necessarily an error)
 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while 
 processing event M_RS_OPEN_REGION
 java.lang.NullPointerException
   at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
   at 
 org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.retransitionNodeOpening(ZKAssign.java:585)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.tickleOpening(OpenRegionHandler.java:322)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:97)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 {quote}
 I think the region server in this case should be closing the region ASAP.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3654) Weird blocking between getOnlineRegion and createRegionLoad

2011-03-24 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010811#comment-13010811
 ] 

Jonathan Gray commented on HBASE-3654:
--

I'm late to the conversation, but have also seen contention on the onlineRegion 
map.  Changing to CHM helped.

 Weird blocking between getOnlineRegion and createRegionLoad
 ---

 Key: HBASE-3654
 URL: https://issues.apache.org/jira/browse/HBASE-3654
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.2

 Attachments: ConcurrentHM, ConcurrentSKLM, CopyOnWrite, 
 HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_COWAL.patch,
  
 HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_COWAL1.patch,
  
 HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_ConcurrentHM.patch,
  TestOnlineRegions.java, hashmap


 Saw this when debugging something else:
 {code}
 regionserver60020 prio=10 tid=0x7f538c1c nid=0x4c7 runnable 
 [0x7f53931da000]
java.lang.Thread.State: RUNNABLE
   at 
 org.apache.hadoop.hbase.regionserver.Store.getStorefilesIndexSize(Store.java:1380)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:916)
   - locked 0x000672aa0a00 (a 
 java.util.concurrent.ConcurrentSkipListMap)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:767)
   - locked 0x000656f62710 (a java.util.HashMap)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:722)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:591)
   at java.lang.Thread.run(Thread.java:662)
 IPC Reader 9 on port 60020 prio=10 tid=0x7f538c1be000 nid=0x4c6 waiting 
 for monitor entry [0x7f53932db000]
java.lang.Thread.State: BLOCKED (on object monitor)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getFromOnlineRegions(HRegionServer.java:2295)
   - waiting to lock 0x000656f62710 (a java.util.HashMap)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getOnlineRegion(HRegionServer.java:2307)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2333)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.isMetaRegion(HRegionServer.java:379)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:422)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:361)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer.getQosLevel(HBaseServer.java:1126)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:982)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
   - locked 0x000656e60068 (a 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 ...
 IPC Reader 0 on port 60020 prio=10 tid=0x7f538c08b000 nid=0x4bd waiting 
 for monitor entry [0x7f5393be4000]
java.lang.Thread.State: BLOCKED (on object monitor)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getFromOnlineRegions(HRegionServer.java:2295)
   - waiting to lock 0x000656f62710 (a java.util.HashMap)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getOnlineRegion(HRegionServer.java:2307)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2333)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.isMetaRegion(HRegionServer.java:379)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:422)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:361)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer.getQosLevel(HBaseServer.java:1126)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:982)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
   at

[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function

2011-03-24 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010983#comment-13010983
 ] 

Jonathan Gray commented on HBASE-3694:
--

Neither of these seem right.  Issue with adding another method for this?

 high multiput latency due to checking global mem store size in a synchronized 
 function
 --

 Key: HBASE-3694
 URL: https://issues.apache.org/jira/browse/HBASE-3694
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 The problem is we found the multiput latency is very high.
 In our case, we have almost 22 Regions in each RS and there are no flush 
 happened during these puts.
 After investigation, we believe that the root cause is the function 
 getGlobalMemStoreSize, which is to check the high water mark of mem store. 
 This function takes almost 40% of total execution time of multiput when 
 instrumenting some metrics in the code.  
 The actual percentage may be more higher. The execution time is spent on 
 synchronize contention.
 One solution is to keep a static var in HRegion to keep the global MemStore 
 size instead of calculating them every time.
 Why using static variable?
 Since all the HRegion objects in the same JVM share the same memory heap, 
 they need to share fate as well.
 The static variable, globalMemStroeSize, naturally shows the total mem usage 
 in this shared memory heap for this JVM.
 If multiple RS need to run in the same JVM, they still need only one 
 globalMemStroeSize.
 If multiple RS run on different JVMs, everything is fine.
 After changing, in our cases, the avg multiput latency decrease from 60ms to 
 10ms.
 I will submit a patch based on the current trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3052) Add ability to have multiple ZK servers in a quorum in MiniZooKeeperCluster for test writing

2011-03-24 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011070#comment-13011070
 ] 

Jonathan Gray commented on HBASE-3052:
--

How the heck do you re-open a task on this new jira? :)

 Add ability to have multiple ZK servers in a quorum in MiniZooKeeperCluster 
 for test writing
 

 Key: HBASE-3052
 URL: https://issues.apache.org/jira/browse/HBASE-3052
 Project: HBase
  Issue Type: Improvement
  Components: test, zookeeper
Reporter: Jonathan Gray
Assignee: Liyin Tang
Priority: Minor
 Attachments: HBASE_3052[r1083993].patch, HBASE_3052[r1084033].patch


 Interesting things can happen when you have a ZK quorum of multiple servers 
 and one of them dies.  Doing testing here on clusters, this has turned up 
 some bugs with HBase interaction with ZK.
 Would be good to add the ability to have multiple ZK servers in unit tests 
 and be able to kill them individually.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3691) Add compressor support for 'snappy', google's compressor

2011-03-23 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010234#comment-13010234
]

Jonathan Gray commented on HBASE-3691:
--

It's slightly faster for both compression and decompression when compared to
LZO (169/434 vs. 250/500).

I'm unsure of the difference in compression ratios but we can ship with it, yay

Add compressor support for 'snappy', google's compressor

Key: HBASE-3691
URL: https://issues.apache.org/jira/browse/HBASE-3691
Project: HBase
Issue Type: Task
Reporter: stack
Priority: Critical
Fix For: 0.92.0

http://code.google.com/p/snappy/ is apache licensed.
bq. Snappy is a compression/decompression library. It does not aim for
maximum compression, or compatibility with any other compression library;
instead, it aims for very high speeds and reasonable compression. For
instance, compared to the fastest mode of zlib, Snappy is an order of
magnitude faster for most inputs, but the resulting compressed files are
anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in
64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses
at about 500 MB/sec or more.
bq. Snappy is widely used inside Google, in everything from BigTable and
MapReduce to our internal RPC systems. (Snappy has previously been referred
to as Zippy in some presentations and the likes.)
Lets get it in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3693) isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase

2011-03-23 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010252#comment-13010252
 ] 

Jonathan Gray commented on HBASE-3693:
--

+1 on caching this.  Good stuff!

 isMajorCompaction() check triggers lots of listStatus DFS RPC calls from HBase
 --

 Key: HBASE-3693
 URL: https://issues.apache.org/jira/browse/HBASE-3693
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Liyin Tang

 We noticed that are lots of listStatus calls on the ColumnFamily directories 
 within each regions, coming from this codepath:
 {code}
 compactionSelection()
  -- isMajorCompaction 
 -- getLowestTimestamp()
--  FileStatus[] stats = fs.listStatus(p);
 {code}
 So on every compactionSelection() we're taking this hit. While not 
 immediately an issue, just from log inspection, this accounts for quite a 
 large number of RPCs to namenode at the moment and seems like an unnecessary 
 load to be sending to the namenode.
 Seems like it would be easy to cache the timestamp for each opened/created 
 StoreFile, in memory, in the region server, and avoid going to DFS each time 
 for this information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3687) Bulk assign on startup should handle a ServerNotRunningException

2011-03-22 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009842#comment-13009842
 ] 

Jonathan Gray commented on HBASE-3687:
--

Shouldn't the RS not check in to the master with an RPC until it is available?

 Bulk assign on startup should handle a ServerNotRunningException
 

 Key: HBASE-3687
 URL: https://issues.apache.org/jira/browse/HBASE-3687
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.90.2

 Attachments: 3687.txt


 On startup, we do bulk assign.  At the moment, if any problem during bulk 
 assign, we consider startup failed and expectation is that you need to retry 
 (We need to make this better but that is not what this issue is about).  One 
 exception that we should handle is the case where a RS is slow coming up and 
 its rpc is not yet up listening.  In this case it will throw: 
 ServerNotRunningException.  We should retry at least this one exception 
 during bulk assign.
 We had this happen to us starting up a prod cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3687) Bulk assign on startup should handle a ServerNotRunningException

2011-03-22 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009843#comment-13009843
 ] 

Jonathan Gray commented on HBASE-3687:
--

and weren't we just saying that we should not be putting in Thread.sleeps ;)

 Bulk assign on startup should handle a ServerNotRunningException
 

 Key: HBASE-3687
 URL: https://issues.apache.org/jira/browse/HBASE-3687
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.90.2

 Attachments: 3687.txt


 On startup, we do bulk assign.  At the moment, if any problem during bulk 
 assign, we consider startup failed and expectation is that you need to retry 
 (We need to make this better but that is not what this issue is about).  One 
 exception that we should handle is the case where a RS is slow coming up and 
 its rpc is not yet up listening.  In this case it will throw: 
 ServerNotRunningException.  We should retry at least this one exception 
 during bulk assign.
 We had this happen to us starting up a prod cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3687) Bulk assign on startup should handle a ServerNotRunningException

2011-03-22 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009850#comment-13009850
 ] 

Jonathan Gray commented on HBASE-3687:
--

I think it's fine for now.  The real fix should be having the RS not check in 
the master until it is fully online (agree, outside scope of this jira).

 Bulk assign on startup should handle a ServerNotRunningException
 

 Key: HBASE-3687
 URL: https://issues.apache.org/jira/browse/HBASE-3687
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.90.2

 Attachments: 3687.txt


 On startup, we do bulk assign.  At the moment, if any problem during bulk 
 assign, we consider startup failed and expectation is that you need to retry 
 (We need to make this better but that is not what this issue is about).  One 
 exception that we should handle is the case where a RS is slow coming up and 
 its rpc is not yet up listening.  In this case it will throw: 
 ServerNotRunningException.  We should retry at least this one exception 
 during bulk assign.
 We had this happen to us starting up a prod cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1755) Putting 'Meta' table into ZooKeeper

2011-03-21 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009290#comment-13009290
 ] 

Jonathan Gray commented on HBASE-1755:
--

I generally agree that we should store temporary data in ZK, but I see META as 
largely temporary.

Table/region meta data is already persisted on HDFS (we don't properly update, 
but that can be fixed without much trouble).  And we have plans to move schema 
and configuration information into ZK for online changes, so at least on a 
running cluster, we'll be depending on ZK for region configuration.

Otherwise, META is largely for locations.

I also think the possibility exists to keep a META region but maintain region 
locations in ZK.

In general, the special casing and exception handling around the reading and 
updating of META is extraordinarily painful both in the master and in the 
regionservers.

 Putting 'Meta' table into ZooKeeper
 ---

 Key: HBASE-1755
 URL: https://issues.apache.org/jira/browse/HBASE-1755
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.0
Reporter: Erik Holstad
 Fix For: 0.92.0


 Moving to 0.22.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3322) HLog sync slowdown under heavy load with HBASE-2467

2011-03-21 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray resolved HBASE-3322.
--

Resolution: Won't Fix

There is an issue here but upon further investigation, it's not really a bug.

The issue is around heavy concurrency / high number of threads in HLog.  The 
current behavior is that each thread does a notify to the LogSyncer and then 
does a wait on a single object.  The LogSyncer waits to be notified, then syncs 
what is pending, and then does a notifyAll to all the threads waiting for their 
sync.

This is a straightforward and correct pattern but under heavy concurrency, the 
fact that all threads are waiting on a single object to be notified becomes a 
bottleneck.

Will open other JIRAs to deal with solutions to this.  Closing this one as this 
is not a blocking bug.

 HLog sync slowdown under heavy load with HBASE-2467
 ---

 Key: HBASE-3322
 URL: https://issues.apache.org/jira/browse/HBASE-3322
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Priority: Blocker
 Fix For: 0.92.0


 Testing HBASE-2467 and HDFS-895 on 100 node cluster w/ a heavy increment 
 workload we experienced significant slowdown.
 Stack traces show that most threads are on HLog.updateLock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2549) Review Trackers (column, delete, etc) on Trunk after 2248 goes in for correctness and optimal earlying-out

2011-03-21 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009425#comment-13009425
 ] 

Jonathan Gray commented on HBASE-2549:
--

punted from 0.92

 Review Trackers (column, delete, etc) on Trunk after 2248 goes in for 
 correctness and optimal earlying-out
 --

 Key: HBASE-2549
 URL: https://issues.apache.org/jira/browse/HBASE-2549
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical

 Once we move to all Scans, the trackers could use a refresh.  There are often 
 times where we return, for example, a MatchCode.SKIP (which just goes to the 
 next KV not including the current one) where we could be sending a more 
 optimal return code like MatchCode.SEEK_NEXT_ROW.
 This is a jira to review all of this code after 2248 goes in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2549) Review Trackers (column, delete, etc) on Trunk after 2248 goes in for correctness and optimal earlying-out

2011-03-21 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009428#comment-13009428
 ] 

Jonathan Gray commented on HBASE-2549:
--

(punting because this was largely done but would be good to do a full analysis 
at some point down the road)

 Review Trackers (column, delete, etc) on Trunk after 2248 goes in for 
 correctness and optimal earlying-out
 --

 Key: HBASE-2549
 URL: https://issues.apache.org/jira/browse/HBASE-2549
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical

 Once we move to all Scans, the trackers could use a refresh.  There are often 
 times where we return, for example, a MatchCode.SKIP (which just goes to the 
 next KV not including the current one) where we could be sending a more 
 optimal return code like MatchCode.SEEK_NEXT_ROW.
 This is a jira to review all of this code after 2248 goes in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-2832) Priorities and multi-threading for MemStore flushing

2011-03-21 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-2832:
-

Fix Version/s: (was: 0.92.0)

punting from 0.92.  still needs to be done but should not be tied to a version 
until work is being actively done

 Priorities and multi-threading for MemStore flushing
 

 Key: HBASE-2832
 URL: https://issues.apache.org/jira/browse/HBASE-2832
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical

 Similar to HBASE-1476 and HBASE-2646 which are for compactions, but do this 
 for flushes.
 Flushing when we hit the normal flush size is a low priority flush.  Other 
 types of flushes (heap pressure, blocking client requests, etc) are high 
 priority.
 Should have a tunable number of concurrent flushes.
 Will use the {{HBaseExecutorService}} and {{HBaseEventHandler}} introduced 
 from master/zk changes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-2375) Make decision to split based on aggregate size of all StoreFiles and revisit related config params

2011-03-21 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Gray updated HBASE-2375:
-

Fix Version/s: (was: 0.92.0)

punting from 0.92. still needs to be done but should not be tied to a version
until work is being actively done

Make decision to split based on aggregate size of all StoreFiles and revisit
related config params
--

Key: HBASE-2375
URL: https://issues.apache.org/jira/browse/HBASE-2375
Project: HBase
Issue Type: Improvement
Components: regionserver
Affects Versions: 0.20.3
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
Labels: moved_from_0_20_5
Attachments: HBASE-2375-v8.patch

Currently we will make the decision to split a region when a single StoreFile
in a single family exceeds the maximum region size. This issue is about
changing the decision to split to be based on the aggregate size of all
StoreFiles in a single family (but still not aggregating across families).
This would move a check to split after flushes rather than after compactions.
This issue should also deal with revisiting our default values for some
related configuration parameters.
The motivating factor for this change comes from watching the behavior of
RegionServers during heavy write scenarios.
Today the default behavior goes like this:
- We fill up regions, and as long as you are not under global RS heap
pressure, you will write out 64MB (hbase.hregion.memstore.flush.size)
StoreFiles.
- After we get 3 StoreFiles (hbase.hstore.compactionThreshold) we trigger a
compaction on this region.
- Compaction queues notwithstanding, this will create a 192MB file, not
triggering a split based on max region size (hbase.hregion.max.filesize).
- You'll then flush two more 64MB MemStores and hit the compactionThreshold
and trigger a compaction.
- You end up with 192 + 64 + 64 in a single compaction. This will create a
single 320MB and will trigger a split.
- While you are performing the compaction (which now writes out 64MB more
than the split size, so is about 5X slower than the time it takes to do a
single flush), you are still taking on additional writes into MemStore.
- Compaction finishes, decision to split is made, region is closed. The
region now has to flush whichever edits made it to MemStore while the
compaction ran. This flushing, in our tests, is by far the dominating factor
in how long data is unavailable during a split. We measured about 1 second
to do the region closing, master assignment, reopening. Flushing could take
5-6 seconds, during which time the region is unavailable.
- The daughter regions re-open on the same RS. Immediately when the
StoreFiles are opened, a compaction is triggered across all of their
StoreFiles because they contain references. Since we cannot currently split
a split, we need to not hang on to these references for long.
This described behavior is really bad because of how often we have to rewrite
data onto HDFS. Imports are usually just IO bound as the RS waits to flush
and compact. In the above example, the first cell to be inserted into this
region ends up being written to HDFS 4 times (initial flush, first compaction
w/ no split decision, second compaction w/ split decision, third compaction
on daughter region). In addition, we leave a large window where we take on
edits (during the second compaction of 320MB) and then must make the region
unavailable as we flush it.
If we increased the compactionThreshold to be 5 and determined splits based
on aggregate size, the behavior becomes:
- We fill up regions, and as long as you are not under global RS heap
pressure, you will write out 64MB (hbase.hregion.memstore.flush.size)
StoreFiles.
- After each MemStore flush, we calculate the aggregate size of all
StoreFiles. We can also check the compactionThreshold. For the first three
flushes, both would not hit the limit. On the fourth flush, we would see
total aggregate size = 256MB and determine to make a split.
- Decision to split is made, region is closed. This time, the region just
has to flush out whichever edits made it to the MemStore during the
snapshot/flush of the previous MemStore. So this time window has shrunk by
more than 75% as it was the time to write 64MB from memory not 320MB from
aggregating 5 hdfs files. This will greatly reduce the time data is
unavailable during splits.
- The daughter regions re-open on the same RS. Immediately when the
StoreFiles are opened, a compaction is triggered across all of their
StoreFiles because they contain references. This would stay the same.
In this example, we

[jira] [Updated] (HBASE-3641) LruBlockCache.CacheStats.getHitCount() is not using the correct variable

2011-03-21 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3641:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to branch and trunk.  Thanks stack.

 LruBlockCache.CacheStats.getHitCount() is not using the correct variable
 

 Key: HBASE-3641
 URL: https://issues.apache.org/jira/browse/HBASE-3641
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.90.1, 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.90.2, 0.92.0

 Attachments: HBASE-3641-v1.patch, HBASE-3641-v2.patch


 {code}
 public long getHitCount() {
   return hitCachingCount.get();
 }
 {code}
 This should be {{hitCount.get()}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1110) Distribute the master role to HRS after ZK integration

2011-03-21 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009435#comment-13009435
]

Jonathan Gray commented on HBASE-1110:
--

Is this really that important to do now? Seems simple enough to start master
processes on slave nodes if you want lots of backups. If each RS can become a
master, then you have to reserve heap in each to handle the master role (which
is a non-trivial amount).

I think this is a fine area to explore and always good to have options (this
could make sense on a small cluster). But I'd opt to move out of 0.92.

Distribute the master role to HRS after ZK integration
--

Key: HBASE-1110
URL: https://issues.apache.org/jira/browse/HBASE-1110
Project: HBase
Issue Type: Improvement
Reporter: Andrew Purtell
Fix For: 0.92.0

After ZK integration, the master role can be distributed out to the HRS as
group behaviors mediated by synchronization and rendezvous points in ZK.
- State sharing, for example load.
-- Load information can be shared with neighbors via ephemeral child
status znodes of a znode representing the cluster root.
-- Region servers can periodically walk the status nodes of their
neighbors. If they find themselves loaded relative to others, they can
release regions. If they find themselves less loaded relative to others, they
can be more aggressive about finding unassigned regions (see below).
- Ephemeral znodes for region ownership, e.g.
/hbase//region/ephemeral-node
-- Use a permanent child of region to serve as a 'dirty' flag, removed
during normal close.
- A distributed queue for region assignment.
-- When coming up, HRS can check the assignment queue for candidates.
-- HRS shutdown includes marking regions clean and moving them onto
assignment queue.
-- All/any HRS can do occasional random walks over region leases looking
for expired-dirty state (when timeout causes ZK to delete the ephemeral node
representing the lease), and can helpfully move them first to a queue (+
barrier) for splitting then onto the assignment queue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-03-21 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009449#comment-13009449
 ] 

Jonathan Gray commented on HBASE-3417:
--

Just verified that this is the same as what we have been running with in 
production (since the patch was put up in January).

I'm ready to commit if you want to +1 me :)

 CacheOnWrite is using the temporary output path for block names, need to use 
 a more consistent block naming scheme
 --

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch, 
 HBASE-3417-v5.patch


 Currently the block names used in the block cache are built using the 
 filesystem path.  However, for cache on write, the path is a temporary output 
 file.
 The original COW patch actually made some modifications to block naming stuff 
 to make it more consistent but did not do enough.  Should add a separate 
 method somewhere for generating block names using some more easily mocked 
 scheme (rather than just raw path as we generate a random unique file name 
 twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3052) Add ability to have multiple ZK servers in a quorum in MiniZooKeeperCluster for test writing

2011-03-21 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009454#comment-13009454
]

Jonathan Gray commented on HBASE-3052:
--

Patch is looking good but I'm confused by a few things.

Are you starting all the servers at the beginning? Or do the ZK servers only
actually start/run once you kill another one?

The idea for this is to create a ZK quorum of servers and then be able to kill
individual ones. Ideally, we'd also be able to specifically kill whichever
server is the quorum leader.

Also, I'm unclear on the meaning of candidate in this context. Is the
candidate server the active server? Does that mean it's online? Maybe
change the name or at least add some javadoc explaining what exactly is
happening.

Add ability to have multiple ZK servers in a quorum in MiniZooKeeperCluster
for test writing

Key: HBASE-3052
URL: https://issues.apache.org/jira/browse/HBASE-3052
Project: HBase
Issue Type: Improvement
Components: test, zookeeper
Reporter: Jonathan Gray
Assignee: Liyin Tang
Priority: Minor
Attachments: HBASE_3052[r1083993].patch

Interesting things can happen when you have a ZK quorum of multiple servers
and one of them dies. Doing testing here on clusters, this has turned up
some bugs with HBase interaction with ZK.
Would be good to add the ability to have multiple ZK servers in unit tests
and be able to kill them individually.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3658) Alert when heap is over committed

2011-03-17 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008052#comment-13008052
 ] 

Jonathan Gray commented on HBASE-3658:
--

+1 on refusing to start

 Alert when heap is over committed
 -

 Key: HBASE-3658
 URL: https://issues.apache.org/jira/browse/HBASE-3658
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0


 Something I just witnessed, the block cache setting was at 70% but the max 
 global memstore size was at the default of 40% meaning that 110% of the heap 
 can potentially be assigned and then you need more heap to do stuff like 
 flushing and compacting.
 We should run a configuration check that alerts the user when that happens 
 and maybe even refuse to start.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3663) The starvation problem in current load balance algorithm

2011-03-17 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008089#comment-13008089
]

Jonathan Gray commented on HBASE-3663:
--

I think this is an issue in the 0.20 / 0.89 version of the load balancer which
is no longer in any active branches.

The starvation problem in current load balance algorithm

Key: HBASE-3663
URL: https://issues.apache.org/jira/browse/HBASE-3663
Project: HBase
Issue Type: Bug
Reporter: Liyin Tang
Attachments: result_new_load_balance.txt, result_old_load_balance.txt

This is an interesting starvation case. There are 2 conditions to trigger
this problem.
Condition1: r/s - r/(s+1) 1
Let r: the number of regions
Let s: the number of servers
Condition2: for each server, the load of each server is less or equal the
ceil of avg load.
Here is the unit test to verify this problem:
For example, there are 16 servers and 62 regions. The avg load is
3.875. And setting the slot to 0 to keep the load of each server either 3 or
4.
When a new server is coming, no server needs to assign regions to this new
server, since no one is larger the ceil of the avg.
(Setting slot to 0 is to easily trigger this situation, otherwise it needs
much larger numbers)
Solutions is pretty straightforward. Just compare the floor of the avg
instead of the ceil. This solution will evenly balance the load from the
servers which is little more loaded than others.
I also attached the comparison result for the case mentioned above between
the old balance algorithm and new balance algorithm. (I set the slot = 0 when
testing)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Created: (HBASE-3641) LruBlockCache.CacheStats.getHitCount() is not using the correct variable

2011-03-14 Thread Jonathan Gray (JIRA)

LruBlockCache.CacheStats.getHitCount() is not using the correct variable


 Key: HBASE-3641
 URL: https://issues.apache.org/jira/browse/HBASE-3641
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.90.1, 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.90.2, 0.92.0


{code}
public long getHitCount() {
  return hitCachingCount.get();
}
{code}

This should be {{hitCount.get()}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3641) LruBlockCache.CacheStats.getHitCount() is not using the correct variable

2011-03-14 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3641:
-

Status: Patch Available  (was: Open)

 LruBlockCache.CacheStats.getHitCount() is not using the correct variable
 

 Key: HBASE-3641
 URL: https://issues.apache.org/jira/browse/HBASE-3641
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.90.1, 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.90.2, 0.92.0

 Attachments: HBASE-3641-v1.patch


 {code}
 public long getHitCount() {
   return hitCachingCount.get();
 }
 {code}
 This should be {{hitCount.get()}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3641) LruBlockCache.CacheStats.getHitCount() is not using the correct variable

2011-03-14 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3641:
-

Attachment: HBASE-3641-v1.patch

 LruBlockCache.CacheStats.getHitCount() is not using the correct variable
 

 Key: HBASE-3641
 URL: https://issues.apache.org/jira/browse/HBASE-3641
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.90.1, 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.90.2, 0.92.0

 Attachments: HBASE-3641-v1.patch


 {code}
 public long getHitCount() {
   return hitCachingCount.get();
 }
 {code}
 This should be {{hitCount.get()}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-03-11 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005821#comment-13005821
]

Jonathan Gray commented on HBASE-1364:
--

FYI, Prakash Khemani is working on this right now. Not sure when a patch will
be up but it's looking good so far. It is built on top of the new ZK stuff in
0.90 and above.

[performance] Distributed splitting of regionserver commit logs
---

Key: HBASE-1364
URL: https://issues.apache.org/jira/browse/HBASE-1364
Project: HBase
Issue Type: Improvement
Components: coprocessors
Reporter: stack
Assignee: Alex Newman
Priority: Critical
Fix For: 0.92.0

Attachments: HBASE-1364.patch

Time Spent: 8h
Remaining Estimate: 0h

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3622) Deadlock in HBaseServer (JVM bug?)

2011-03-10 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13005499#comment-13005499
 ] 

Jonathan Gray commented on HBASE-3622:
--

We run with +UseMembar at FB.  I ran experiments on CPU-bound workloads and 
there was no significant difference in performance either way.

 Deadlock in HBaseServer (JVM bug?)
 --

 Key: HBASE-3622
 URL: https://issues.apache.org/jira/browse/HBASE-3622
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3622.patch


 On Dmitriy's cluster:
 {code}
 IPC Reader 0 on port 60020 prio=10 tid=0x2aacb4a82800 nid=0x3a72 
 waiting on condition [0x429ba000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aaabf5fa6d0 (a 
 java.util.concurrent.locks.ReentrantLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
 at 
 java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
 at 
 java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
 at 
 java.util.concurrent.LinkedBlockingQueue.signalNotEmpty(LinkedBlockingQueue.java:103)
 at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:267)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:985)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
 - locked 0x2aaabf580fb0 (a 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 ...
 IPC Server handler 29 on 60020 daemon prio=10 tid=0x2aacbc163800 
 nid=0x3acc waiting on condition [0x462f3000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aaabf5e3800 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025)
 IPC Server handler 28 on 60020 daemon prio=10 tid=0x2aacbc161800 
 nid=0x3acb waiting on condition [0x461f2000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aaabf5e3800 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025
 ...
 {code}
 This region server stayed in this state for hours. The reader is waiting to 
 put and the handlers are waiting to take, and they wait on different lock 
 ids. It reminds me of the UseMembar thing about the JVM sometime missing to 
 notify waiters. In any case, that RS needed to be closed in order to get out 
 of that state. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3614) Expose per-region request rate metrics

2011-03-09 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004817#comment-13004817
 ] 

Jonathan Gray commented on HBASE-3614:
--

I'm not sure if there is a JIRA yet, but some guys at FB did a bunch of work on 
doing per-family metrics.  They did work to dynamically generate new metric 
names, etc.

I think we could work on this at the same time we start to think about using 
the info for better load balancing and such.  This could obviously come first.

 Expose per-region request rate metrics
 --

 Key: HBASE-3614
 URL: https://issues.apache.org/jira/browse/HBASE-3614
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Reporter: Gary Helmling
Priority: Minor

 We currently export metrics on request rates for each region server, and this 
 can help with identifying uneven load at a high level. But once you see a 
 given server under high load, you're forced to extrapolate based on your 
 application patterns and the data it's serving what the likely culprit is.  
 This can and should be much easier if we just exported request rate metrics 
 per-region on each server.
 Dynamically updating the metrics keys based on assigned regions may pose some 
 minor challenges, but this seems a very valuable diagnostic tool to have 
 available.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3573) Move shutdown messaging OFF hearbeat; prereq for fix of hbase-1502

2011-02-28 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000613#comment-13000613
 ] 

Jonathan Gray commented on HBASE-3573:
--

not sure if it matters, but one check returns true if it the server holds a 
catalog region.  then another check uses that check to determine that the last 
two server *only* hold catalogs.  so in that case, they could still be holding 
other user regions?

 Move shutdown messaging OFF hearbeat; prereq for fix of hbase-1502
 --

 Key: HBASE-3573
 URL: https://issues.apache.org/jira/browse/HBASE-3573
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 3573.txt, 3573.txt




-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3573) Move shutdown messaging OFF hearbeat; prereq for fix of hbase-1502

2011-02-28 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000654#comment-13000654
 ] 

Jonathan Gray commented on HBASE-3573:
--

Yeah, that all makes sense.  Just making sure that's what you intended.  +1 if 
tests pass and you tried it up on cluster.

 Move shutdown messaging OFF hearbeat; prereq for fix of hbase-1502
 --

 Key: HBASE-3573
 URL: https://issues.apache.org/jira/browse/HBASE-3573
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 3573.txt, 3573.txt




-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-2947) MultiIncrement (MultiGet functionality for increments)

2011-02-20 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12997163#comment-12997163
 ] 

Jonathan Gray commented on HBASE-2947:
--

HBASE-2814 seems to only be about thrift.  This is to make Increment a Row 
operation so it can be used with the existing MultiAction stuff.

 MultiIncrement (MultiGet functionality for increments)
 --

 Key: HBASE-2947
 URL: https://issues.apache.org/jira/browse/HBASE-2947
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Attachments: HBASE-2947-v1.patch


 HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
 operations.  We should add a way to do that with increments.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 3 4 5 6 7 8 >

1 - 100 of 702 matches

Mail list logo