[jira] [Created] (HBASE-4495) CatalogTracker has an identity crisis; needs to be cut-back in scope

2011-09-26 Thread stack (Created) (JIRA)
CatalogTracker has an identity crisis; needs to be cut-back in scope


 Key: HBASE-4495
 URL: https://issues.apache.org/jira/browse/HBASE-4495
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: stack


CT needs a good reworking.  I'd suggest its scope be cut way down to only deal 
in zk transactions rather than zk and reading meta location in hbase (over an 
HConnection) and being a purveyor of HRegionInterfaces on meta and root servers 
and being an Abortable and a verifier of catalog locations.  Once this is done, 
I would suggest it then better belongs over under the zk package and that the 
Meta* classes then move to client package.

Here's some messy notes I added to head of CT class in hbase-3446 where I spent 
some time trying to make out what it was CT did.

{code}
  // TODO: This class needs a rethink.  The original intent was that it would be
  // the one-stop-shop for root and meta locations and that it would get this
  // info from reading and watching zk state.  The class was to be used by
  // servers when they needed to know of root and meta movement but also by
  // client-side (inside in HTable) so rather than figure root and meta
  // locations on fault, the client would instead get notifications out of zk.
  // 
  // But this original intent is frustrated by the fact that this class has to
  // read an hbase table, the -ROOT- table, to figure out the .META. region
  // location which means we depend on an HConnection.  HConnection will do
  // retrying but also, it has its own mechanism for finding root and meta
  // locations (and for 'verifying'; it tries the location and if it fails, does
  // new lookup, etc.).  So, at least for now, HConnection (or HTable) can't
  // have a CT since CT needs a HConnection (Even then, do want HT to have a CT?
  // For HT keep up a session with ZK?  Rather, shouldn't we do like asynchbase
  // where we'd open a connection to zk, read what we need then let the
  // connection go?).  The 'fix' is make it so both root and meta addresses
  // are wholey up in zk -- not in zk (root) -- and in an hbase table (meta).
  //
  // But even then, this class does 'verification' of the location and it does
  // this by making a call over an HConnection (which will do its own root
  // and meta lookups).  Isn't this verification 'useless' since when we
  // return, whatever is dependent on the result of this call then needs to
  // use HConnection; what we have verified may change in meantime (HConnection
  // uses the CT primitives, the root and meta trackers finding root locations).
  //
  // When meta is moved to zk, this class may make more sense.  In the
  // meantime, it does not cohere.  It should just watch meta and root and
  // NOT do verification -- let that be out in HConnection since its going to
  // be done there ultimately anyways.
  //
  // This class has spread throughout the codebase.  It needs to be reigned in.
  // This class should be used server-side only, even if we move meta location
  // up into zk.  Currently its used over in the client package. Its used in
  // MetaReader and MetaEditor classes usually just to get the Configuration
  // its using (It does this indirectly by asking its HConnection for its
  // Configuration and even then this is just used to get an HConnection out on
  // the other end). St.Ack 10/23/2011.
  //
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-09-26 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3446:
-

Status: Patch Available  (was: Open)

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> 
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.0
>Reporter: Todd Lipcon
>Assignee: stack
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 
> 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 
> 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and 
> afterwards had LOTS of regions left orphaned. The issue appears to be that 
> ProcessServerShutdown failed because the server hosting META was restarted 
> around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-09-26 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115261#comment-13115261
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/
---

Review request for hbase and Jonathan Gray.


Summary
---

Make the Meta* operations against meta retry.  We do it by using HTable 
instances.
(HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
In 0.89, we had special RetryableMetaOperation class that was a
subclass of Callable which reproduced the guts of 
HConnection.getRegionServerWithRetries
with its retry loop.  Now we just use HTable instead (Costs some on setup but
otherwise, we avoid duplicating code).  Upped the retries on serverside too.

Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
heavily on CT methods getting proxy connections to meta and root servers.
CT needs to be cut back.  This patch closes down access on (unused) public
methods and removes being able to get an HRegionInterface on meta and root
-- this stuff is used internally to CT only now; use MetaEditor or
MetaReader if you want to update or read catalog tables.  Opening new issue
to cutback CT use over the code base.

A little off topic but couldn't help it since was in MetaReader and MetaEditor
trying to clean them up, I ended up moving meta migration code out to its
own class rather than have it in all inside in MetaEditor.

Here is some detail to help reviews.

M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
  Clean up.  Shutdown access on some of these unused methods.  Don't
  let out HRegionInterface instances in particular since we are going
  away from raw HRI use to instead use a connection with retries:
  i.e. HTable.

  Comments on state of this class. Javadoc edits.
  getZooKeeperWatcher on HConnection is deprecated so don't use it
  in constructor.  Override MetaNodeTracker and on node delete
  reset meta location (We used to do this over in MetaNodeTracker
  but to do that we had to have a CatalogTracker over in zk package
  which is silly -- bad package encapsulation).

  (waitForRootServer) Renamed getRootServerConnection and change it
  from public to package private.
  (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
  (getMetaServerConnection) Change from public to package private.
  Use MetaReader to read the meta location in root rather than a
  raw HRegionInterface so we get retrying.
  (remaining, timedout) Added utility methods.
  (waitForMetaServer) Changed from public to private.
  (resetMetaLocation) Made it synchronized on metaAvailable.
  Not all accesses were synchronized.

M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
  Refactor to use HTable instead of raw HRegionInterface so we get
  retrying.  For each operation we get an HTable, use it, then close it.
  (putToMetaTable, putsToMetaTable, etc) Utility methods.
  (updateRootWithMetaMigrationStatus, etc.) Moved out to own
  class since these classes are for a one-time migration only.

A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
  New class that holds all Meta* methods updating meta table used
  doing the one-time migration done to meta on startup.  This class
  is marked deprecated because its going to be dropped in 0.94.

M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
  Retrofit methods in here to use fullScan methods with Visitor.
  (getCatalogRegionInterface, getCatalogRegionNameForTable,
getCatalogRegionNameForRegion) Removed.
  (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
  (fullScanOfResults) Renamed as fullScan override.
  (fullScanOfRoot) Added as deprecated. We should be doing
  this against zk.
  (metaRowToRegionPair, getServerNameFromResult) Moved to Result
  (CollectAllVisitor) Added
M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
  Handle few cases where methods throw InterruptedException
  (Don't let it out on the HBaseAdmin public API)

M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
  Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
  on failure. Call ServerCallable connect AFTER beforeCall rather than
  ServerCallable.instantiateServer BEFORE beforeCall.

M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
  Add to DEBUG message the connection name we were using.

M src/main/java/org/apache/hadoop/hbase/client/Result.java
  (getServerNameFromCatalogResult, parseCatalogResult,
parseHRegionInfoFromCatalogResult) Added

M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
  Added new ThrowableWithExtra

[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115249#comment-13115249
 ] 

Ted Yu commented on HBASE-4492:
---

Found the following in output for the above timeout case:
{code}
2011-09-27 05:28:59,047 DEBUG 
[RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] 
zookeeper.ZooKeeperWatcher(233): regionserver:57539-0x132a95afa18000d Received 
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, 
path=/hbase/root-region-server2011-09-27 05:28:59,047 DEBUG 
[RegionServer:3;us.ciq.com,58748,131710132-EventThread] 
zookeeper.ZKUtil(226): regionserver:58748-0x132a95afa18000a 
/hbase/root-region-server does not exist. Watcher is set.2011-09-27 
05:28:59,047 DEBUG [Master:0;us.ciq.com,56327,1317101304726-EventThread] 
zookeeper.ZKUtil(226): hconnection-0x132a95afa180005 /hbase/root-region-server 
does not exist. Watcher is set.2011-09-27 05:28:59,048 DEBUG 
[Thread-1-EventThread] zookeeper.ZooKeeperWatcher(233): 
master:51567-0x132a95afa180008 Received ZooKeeper Event, 
type=NodeChildrenChanged, state=SyncConnected, path=/hbase/unassigned2011-09-27 
05:28:59,048 DEBUG [RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] 
zookeeper.ZKUtil(226): regionserver:57539-0x132a95afa18000d 
/hbase/root-region-server does not exist. Watcher is set.2011-09-27 
05:28:59,049 INFO  
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.AssignmentManager(1485): No previous transition plan was found (or we 
are ignoring an existing plan) for -ROOT-,,0.70236052 so generated a random 
one; hri=-ROOT-,,0.70236052, src=, dest=us.ciq.com,57500,1317101330748; 3 
(online=3, exclude=null) available servers2011-09-27 05:28:59,049 INFO  
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.AssignmentManager(1485): Assigning region -ROOT-,,0.70236052 to 
us.ciq.com,57500,13171013307482011-09-27 05:28:59,049 DEBUG 
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.ServerManager(448): New connection to 
us.ciq.com,57500,13171013307482011-09-27 05:28:59,049 DEBUG 
[Thread-1-EventThread] zookeeper.ZKUtil(224): master:51567-0x132a95afa180008 
Set watcher on existing znode /hbase/unassigned/702360522011-09-27 05:28:59,049 
FATAL [MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.HMaster(1181): Master server abort: loaded coprocessors are: 
[]2011-09-27 05:28:59,050 FATAL 
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.HMaster(1186): Unexpected state trying to OFFLINE; -ROOT-,,0.70236052 
state=PENDING_OPEN, ts=1317101339049, server=us.ciq.com,57500,1317101330748
java.lang.IllegalStateException
at 
org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1517)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1392)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1169)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1144)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1139)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1816)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:105)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:123)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:186)
{code}

> TestRollingRestart fails intermittently
> ---
>
> Key: HBASE-4492
> URL: https://issues.apache.org/jira/browse/HBASE-4492
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jonathan Gray
> Attachments: 4492.txt
>
>
> I got the following when running test suite on TRUNK:
> {code}
> testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
> Time elapsed: 300.28 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
> {code}
> I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
> wiped out test output file for the failed test.
> Similar failure can be found on Jenkins:
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This m

[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115242#comment-13115242
 ] 

Ted Yu commented on HBASE-4492:
---

But my patch wouldn't solve the following error:
{code}
  testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart): 
test timed out after 30 milliseconds
{code}
Basically this call hangs:
{code}
waitForRSShutdownToStartAndFinish(activeMaster,
metaServer.getRegionServer().getServerName());
{code}
When this happens, TestRollingRestart-output.txt gets much bigger than the case 
shown in build 19.

> TestRollingRestart fails intermittently
> ---
>
> Key: HBASE-4492
> URL: https://issues.apache.org/jira/browse/HBASE-4492
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jonathan Gray
> Attachments: 4492.txt
>
>
> I got the following when running test suite on TRUNK:
> {code}
> testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
> Time elapsed: 300.28 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
> {code}
> I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
> wiped out test output file for the failed test.
> Similar failure can be found on Jenkins:
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115233#comment-13115233
 ] 

Jonathan Gray commented on HBASE-4492:
--

+1 (let hudson run it as well).  We can dig to see if there's an actual bug but 
it does look like a race condition as you point out.

> TestRollingRestart fails intermittently
> ---
>
> Key: HBASE-4492
> URL: https://issues.apache.org/jira/browse/HBASE-4492
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jonathan Gray
> Attachments: 4492.txt
>
>
> I got the following when running test suite on TRUNK:
> {code}
> testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
> Time elapsed: 300.28 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
> {code}
> I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
> wiped out test output file for the failed test.
> Similar failure can be found on Jenkins:
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-26 Thread Jonathan Gray (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray resolved HBASE-4433.
--

   Resolution: Fixed
Fix Version/s: 0.94.0
 Hadoop Flags: Reviewed

Good stuff Kannan!  Thanks for review Ted.  I also looked and I'm +1.

Committed to trunk.

> avoid extra next (potentially a seek) if done with column/row
> -
>
> Key: HBASE-4433
> URL: https://issues.apache.org/jira/browse/HBASE-4433
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kannan Muthukkaruppan
>Assignee: Kannan Muthukkaruppan
> Fix For: 0.94.0
>
>
> [Noticed this in 89, but quite likely true of trunk as well.]
> When we are done with the requested column(s) the code still does an extra 
> next() call before it realizes that it is actually done. This extra next() 
> call could potentially result in an unnecessary extra block load. This is 
> likely to be especially bad for CFs where the KVs are large blobs where each 
> KV may be occupying a block of its own. So the next() can often load a new 
> unrelated block unnecessarily.
> --
> For the simple case of reading say the top-most column in a row in a single 
> file, where each column (KV) was say a block of its own-- it seems that we 
> are reading 3 blocks, instead of 1 block!
> I am working on a simple patch and with that the number of seeks is down to 
> 2. 
> [There is still an extra seek left.  I think there were two levels of 
> extra/unnecessary next() we were doing without actually confirming that the 
> next was needed. One at the StoreScanner/ScanQueryMatcher level which this 
> diff avoids. I think the other is at hfs.next() (at the storefile scanner 
> level) that's happening whenever a HFile scanner servers out a data-- and 
> perhaps that's the additional seek that we need to avoid. But I want to 
> tackle this optimization first as the two issues seem unrelated.]
> -- 
> The basic idea of the patch I am working on/testing is as follows. The 
> ExplicitColumnTracker currently returns "INCLUDE" to the ScanQueryMatcher if 
> the KV needs to be included and then if done, only in the the next call it 
> returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases 
> when ExplicitColumnTracker knows it is done with a particular column/row, the 
> patch attempts to combine the INCLUDE code and done hint into a single match 
> code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-26 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4492:
--

Attachment: 4492.txt

Here is what I proposed.
We should use a constant for the 60s timeout.

I ran TestRollingRestart#testBasicRollingRestart once which passed on MacBook.

More loops should be performed.

> TestRollingRestart fails intermittently
> ---
>
> Key: HBASE-4492
> URL: https://issues.apache.org/jira/browse/HBASE-4492
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jonathan Gray
> Attachments: 4492.txt
>
>
> I got the following when running test suite on TRUNK:
> {code}
> testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
> Time elapsed: 300.28 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
> {code}
> I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
> wiped out test output file for the failed test.
> Similar failure can be found on Jenkins:
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115221#comment-13115221
 ] 

Jonathan Gray commented on HBASE-4488:
--

Got it.  Nice investigation.

+1 for commit

> Store could miss rows during flush
> --
>
> Key: HBASE-4488
> URL: https://issues.apache.org/jira/browse/HBASE-4488
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4488.txt
>
>
> While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
> critical mistake:
> The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4488) Store could miss rows during flush

2011-09-26 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4488:
-

Priority: Major  (was: Critical)

> Store could miss rows during flush
> --
>
> Key: HBASE-4488
> URL: https://issues.apache.org/jira/browse/HBASE-4488
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4488.txt
>
>
> While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
> critical mistake:
> The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-26 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115220#comment-13115220
 ] 

Lars Hofhansl commented on HBASE-4488:
--

Looking at the StoreScanner/ScanQueryMatcher code, it seems that luckily this 
cannot be triggered.
In order to this to happen the ScanQueryMatcher in StoreScanner has to return 
one of DONE_SCAN or SEEK_NEXT_ROW.
The matcher only return DONE_SCAN for filters. In this case there are none, so 
that won't happen.
SEEK_NEXT_ROW can be return from filters (so also out) or if the next row is < 
the current row (not sure how to make that happen... Should never happen).
The column tracker used here is ScanWildcardColumnTracker, which will only ever 
return SKIP, SEEK_NEXT_COL, or INCLUDE.

So we were actually lucky here, and this bug cannot be triggered at all. Based 
on this I'll change the priority. Unless we rig it, this bug cannot be 
triggered.

It should still be changed, though, but it will be for readability, and future 
correctness if somebody changes Matcher/Tracker.


> Store could miss rows during flush
> --
>
> Key: HBASE-4488
> URL: https://issues.apache.org/jira/browse/HBASE-4488
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4488.txt
>
>
> While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
> critical mistake:
> The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115210#comment-13115210
 ] 

Ted Yu commented on HBASE-4492:
---

@Jonathan:
I just provided some observation. We can narrow the scope of test failure and 
make this test deterministic.

> TestRollingRestart fails intermittently
> ---
>
> Key: HBASE-4492
> URL: https://issues.apache.org/jira/browse/HBASE-4492
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jonathan Gray
>
> I got the following when running test suite on TRUNK:
> {code}
> testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
> Time elapsed: 300.28 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
> {code}
> I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
> wiped out test output file for the failed test.
> Similar failure can be found on Jenkins:
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115211#comment-13115211
 ] 

Jonathan Gray commented on HBASE-4489:
--

+1 that keyspace split should be 0x00..00 to 0xff..ff and not ascii or 0x7f.

> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115209#comment-13115209
 ] 

Jonathan Gray commented on HBASE-4488:
--

@Lars, patch seems fine.  Do you think there's a way to trigger the bug in a 
test so we can catch these edge cases in the future?

> Store could miss rows during flush
> --
>
> Key: HBASE-4488
> URL: https://issues.apache.org/jira/browse/HBASE-4488
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4488.txt
>
>
> While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
> critical mistake:
> The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115208#comment-13115208
 ] 

Jonathan Gray commented on HBASE-4492:
--

Nice idea, Ted.  I can work on that tomorrow if you don't get to a patch before 
then.

> TestRollingRestart fails intermittently
> ---
>
> Key: HBASE-4492
> URL: https://issues.apache.org/jira/browse/HBASE-4492
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jonathan Gray
>
> I got the following when running test suite on TRUNK:
> {code}
> testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
> Time elapsed: 300.28 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
> {code}
> I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
> wiped out test output file for the failed test.
> Similar failure can be found on Jenkins:
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115206#comment-13115206
 ] 

Ted Yu commented on HBASE-4492:
---

One possibility is to give the assertion below some grace period (within the 60 
sec limit) by calling blockUntilNoRIT() repeatedly:
{code}
blockUntilNoRIT(zkw, master);
log("Verifying there are " + numRegions + " assigned on cluster");
assertRegionsAssigned(cluster, regions);
{code}

> TestRollingRestart fails intermittently
> ---
>
> Key: HBASE-4492
> URL: https://issues.apache.org/jira/browse/HBASE-4492
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jonathan Gray
>
> I got the following when running test suite on TRUNK:
> {code}
> testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
> Time elapsed: 300.28 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
> {code}
> I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
> wiped out test output file for the failed test.
> Similar failure can be found on Jenkins:
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115191#comment-13115191
 ] 

Ted Yu commented on HBASE-4433:
---

+1 on patch.
Nice work.

> avoid extra next (potentially a seek) if done with column/row
> -
>
> Key: HBASE-4433
> URL: https://issues.apache.org/jira/browse/HBASE-4433
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kannan Muthukkaruppan
>Assignee: Kannan Muthukkaruppan
>
> [Noticed this in 89, but quite likely true of trunk as well.]
> When we are done with the requested column(s) the code still does an extra 
> next() call before it realizes that it is actually done. This extra next() 
> call could potentially result in an unnecessary extra block load. This is 
> likely to be especially bad for CFs where the KVs are large blobs where each 
> KV may be occupying a block of its own. So the next() can often load a new 
> unrelated block unnecessarily.
> --
> For the simple case of reading say the top-most column in a row in a single 
> file, where each column (KV) was say a block of its own-- it seems that we 
> are reading 3 blocks, instead of 1 block!
> I am working on a simple patch and with that the number of seeks is down to 
> 2. 
> [There is still an extra seek left.  I think there were two levels of 
> extra/unnecessary next() we were doing without actually confirming that the 
> next was needed. One at the StoreScanner/ScanQueryMatcher level which this 
> diff avoids. I think the other is at hfs.next() (at the storefile scanner 
> level) that's happening whenever a HFile scanner servers out a data-- and 
> perhaps that's the additional seek that we need to avoid. But I want to 
> tackle this optimization first as the two issues seem unrelated.]
> -- 
> The basic idea of the patch I am working on/testing is as follows. The 
> ExplicitColumnTracker currently returns "INCLUDE" to the ScanQueryMatcher if 
> the KV needs to be included and then if done, only in the the next call it 
> returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases 
> when ExplicitColumnTracker knows it is done with a particular column/row, the 
> patch attempts to combine the INCLUDE code and done hint into a single match 
> code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4488) Store could miss rows during flush

2011-09-26 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4488:
-

Issue Type: Sub-task  (was: Bug)
Parent: HBASE-4241

> Store could miss rows during flush
> --
>
> Key: HBASE-4488
> URL: https://issues.apache.org/jira/browse/HBASE-4488
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4488.txt
>
>
> While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
> critical mistake:
> The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115184#comment-13115184
 ] 

Ted Yu commented on HBASE-4492:
---

>From output of build 19 above:
{code}
2011-09-25 09:35:06,876 INFO  
[RS_CLOSE_REGION-hemera.apache.org,33646,1316943298695-0] 
regionserver.HRegion(738): Closed 
tableRestart,a,1316943285423.b4692b784743bbe7c57312d8b2f8539d.
2011-09-25 09:35:06,876 DEBUG 
[RS_CLOSE_REGION-hemera.apache.org,33646,1316943298695-0] 
handler.CloseRegionHandler(142): Closed region 
tableRestart,a,1316943285423.b4692b784743bbe7c57312d8b2f8539d.
...
2011-09-25 09:35:14,609 DEBUG [Thread-1] zookeeper.ZKAssign(892): ZK RIT -> 
70236052
2011-09-25 09:35:14,609 DEBUG [Thread-1] zookeeper.ZKAssign(892): ZK RIT -> 
1028785192
...
2011-09-25 09:35:14,710 DEBUG [Thread-1] master.TestRollingRestart(325): 

TRR: Expected to find 22 but only found 3

2011-09-25 09:35:14,711 DEBUG [Thread-1] master.TestRollingRestart(325): 

TRR: Missing region: 
tableRestart,a,1316943285423.b4692b784743bbe7c57312d8b2f8539d.
{code}
blockUntilNoRIT() has these calls:
{code}
ZKAssign.blockUntilNoRIT(zkw);
master.assignmentManager.waitUntilNoRegionsInTransition(6);
{code}
We can see that master.assignmentManager.waitUntilNoRegionsInTransition() 
waited at most 100 ms, far shorter than 60sec limit.
Should we wait longer ? I think using NoRIT criterion alone isn't enough.

> TestRollingRestart fails intermittently
> ---
>
> Key: HBASE-4492
> URL: https://issues.apache.org/jira/browse/HBASE-4492
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jonathan Gray
>
> I got the following when running test suite on TRUNK:
> {code}
> testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
> Time elapsed: 300.28 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
> {code}
> I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
> wiped out test output file for the failed test.
> Similar failure can be found on Jenkins:
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4493) book.xml - moving 2 entries to newly created RegionServer section

2011-09-26 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115175#comment-13115175
 ] 

Hudson commented on HBASE-4493:
---

Integrated in HBase-TRUNK #2260 (See 
[https://builds.apache.org/job/HBase-TRUNK/2260/])
HBASE-4493 book.xml

dmeil : 
Files : 
* /hbase/trunk/src/docbkx/book.xml


> book.xml - moving 2 entries to newly created RegionServer section
> -
>
> Key: HBASE-4493
> URL: https://issues.apache.org/jira/browse/HBASE-4493
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: book_HBASE_4493.xml.patch
>
>
> book.xml
> * Arch section.  Since RegionServer is now a top-level section under Arch, 
> moved existing BlockCache and WAL entries under RegionServer instead of being 
> under Regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4298) Support to drain RS nodes through ZK

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115141#comment-13115141
 ] 

Ted Yu commented on HBASE-4298:
---

https://reviews.apache.org/r/2063/  for trunk and
https://reviews.apache.org/r/2064/ for 0.90

> Support to drain RS nodes through ZK
> 
>
> Key: HBASE-4298
> URL: https://issues.apache.org/jira/browse/HBASE-4298
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.90.4
> Environment: all
>Reporter: Aravind Gottipati
>Priority: Critical
>  Labels: patch
> Fix For: 0.92.0, 0.90.5
>
> Attachments: 90_hbase.patch, trunk_hbase.patch
>
>
> HDFS currently has a way to exclude certain datanodes and prevent them from 
> getting new blocks.  HDFS goes one step further and even drains these nodes 
> for you.  This enhancement is a step in that direction.
> The idea is that we mark nodes in zookeeper as draining nodes.  This means 
> that they don't get any more new regions.  These draining nodes look exactly 
> the same as the corresponding nodes in /rs, except they live under /draining.
> Eventually, support for draining them can be added.  I am submitting two 
> patches for review - one for the 0.90 branch and one for trunk (in git).
> Here are the two patches
> 0.90 - 
> https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2
> trunk - 
> https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5
> I have tested both these patches and they work as advertised.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-26 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115123#comment-13115123
 ] 

Jesse Yates commented on HBASE-4448:


Hmm, interesting idea Ted. Definitely worth looking into in another patch. 
Would definitely have to look into the performance benefits of that. 
Interestingly, its probably going to highly correlated to the order of tests 
being run (which is really test naming).

> HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
> instances across unit tests
> -
>
> Key: HBASE-4448
> URL: https://issues.apache.org/jira/browse/HBASE-4448
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: HBaseTestingUtilityFactory.java, 
> hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch
>
>
> Setting up and tearing down HBaseTestingUtility instances in unit tests is 
> very expensive.  On my MacBook it takes about 10 seconds to set up a 
> MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
> test classes that use this facility, that's a lot of time in the build.
> This factory assumes that the JVM is being re-used across test classes in the 
> build, otherwise this pattern won't work. 
> I don't think this is appropriate for every use, but I think it can be 
> applicable in a great many cases - especially where developers just want a 
> simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4415) Add configuration script for setup HBase (hbase-setup-conf.sh)

2011-09-26 Thread Eric Yang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HBASE-4415:
-

Attachment: HBASE-4415-5.patch

Added file permission setup for key tab files.

> Add configuration script for setup HBase (hbase-setup-conf.sh)
> --
>
> Key: HBASE-4415
> URL: https://issues.apache.org/jira/browse/HBASE-4415
> Project: HBase
>  Issue Type: New Feature
>  Components: scripts
> Environment: Java 6, Linux
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: HBASE-4415-1.patch, HBASE-4415-2.patch, 
> HBASE-4415-3.patch, HBASE-4415-4.patch, HBASE-4415-5.patch, HBASE-4415.patch
>
>
> The goal of this jura is to provide a installation script for configuring 
> HBase environment and configuration.  By using the same pattern of 
> *-setup-conf.sh for all Hadoop related projects.  For HBase, the usage of the 
> script looks like this:
> {noformat}
> usage: ./hbase-setup-conf.sh 
>   Optional parameters:
> --hadoop-conf=/etc/hadoopSet Hadoop configuration directory 
> location
> --hadoop-home=/usr   Set Hadoop directory location
> --hadoop-namenode=localhost  Set Hadoop namenode hostname
> --hadoop-replication=3   Set HDFS replication
> --hbase-home=/usrSet HBase directory location
> --hbase-conf=/etc/hbase  Set HBase configuration 
> directory location
> --hbase-log=/var/log/hbase   Set HBase log directory location
> --hbase-pid=/var/run/hbase   Set HBase pid directory location
> --hbase-user=hbase   Set HBase user
> --java-home=/usr/java/defaultSet JAVA_HOME directory location
> --kerberos-realm=KERBEROS.EXAMPLE.COMSet Kerberos realm
> --kerberos-principal-id=_HOSTSet Kerberos principal ID 
> --keytab-dir=/etc/security/keytabs   Set keytab directory
> --regionservers=localhostSet regionservers hostnames
> --zookeeper-home=/usrSet ZooKeeper directory location
> --zookeeper-quorum=localhost Set ZooKeeper Quorum
> --zookeeper-snapshot=/var/lib/zookeeper  Set ZooKeeper snapshot location
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115115#comment-13115115
 ] 

Ted Yu commented on HBASE-4448:
---

I think the following method should be enhanced to deal with the test which 
uses more than one cluster whose numbers of slaves are the same:
{code}
+ protected synchronized HBaseTestingUtility getMiniClusterImpl(int slaves) 
throws Exception {
{code}
This can be done in another JIRA.

> HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
> instances across unit tests
> -
>
> Key: HBASE-4448
> URL: https://issues.apache.org/jira/browse/HBASE-4448
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: HBaseTestingUtilityFactory.java, 
> hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch
>
>
> Setting up and tearing down HBaseTestingUtility instances in unit tests is 
> very expensive.  On my MacBook it takes about 10 seconds to set up a 
> MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
> test classes that use this facility, that's a lot of time in the build.
> This factory assumes that the JVM is being re-used across test classes in the 
> build, otherwise this pattern won't work. 
> I don't think this is appropriate for every use, but I think it can be 
> applicable in a great many cases - especially where developers just want a 
> simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4298) Support to drain RS nodes through ZK

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115112#comment-13115112
 ] 

Ted Yu commented on HBASE-4298:
---

@Aravind:
Is it possible to come up with some unit test for this feature ?

Thanks

> Support to drain RS nodes through ZK
> 
>
> Key: HBASE-4298
> URL: https://issues.apache.org/jira/browse/HBASE-4298
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.90.4
> Environment: all
>Reporter: Aravind Gottipati
>Priority: Critical
>  Labels: patch
> Fix For: 0.92.0, 0.90.5
>
> Attachments: 90_hbase.patch, trunk_hbase.patch
>
>
> HDFS currently has a way to exclude certain datanodes and prevent them from 
> getting new blocks.  HDFS goes one step further and even drains these nodes 
> for you.  This enhancement is a step in that direction.
> The idea is that we mark nodes in zookeeper as draining nodes.  This means 
> that they don't get any more new regions.  These draining nodes look exactly 
> the same as the corresponding nodes in /rs, except they live under /draining.
> Eventually, support for draining them can be added.  I am submitting two 
> patches for review - one for the 0.90 branch and one for trunk (in git).
> Here are the two patches
> 0.90 - 
> https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2
> trunk - 
> https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5
> I have tested both these patches and they work as advertised.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-26 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115113#comment-13115113
 ] 

Kannan Muthukkaruppan commented on HBASE-4433:
--

ping. for code review.

test suite ran clean.

> avoid extra next (potentially a seek) if done with column/row
> -
>
> Key: HBASE-4433
> URL: https://issues.apache.org/jira/browse/HBASE-4433
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kannan Muthukkaruppan
>Assignee: Kannan Muthukkaruppan
>
> [Noticed this in 89, but quite likely true of trunk as well.]
> When we are done with the requested column(s) the code still does an extra 
> next() call before it realizes that it is actually done. This extra next() 
> call could potentially result in an unnecessary extra block load. This is 
> likely to be especially bad for CFs where the KVs are large blobs where each 
> KV may be occupying a block of its own. So the next() can often load a new 
> unrelated block unnecessarily.
> --
> For the simple case of reading say the top-most column in a row in a single 
> file, where each column (KV) was say a block of its own-- it seems that we 
> are reading 3 blocks, instead of 1 block!
> I am working on a simple patch and with that the number of seeks is down to 
> 2. 
> [There is still an extra seek left.  I think there were two levels of 
> extra/unnecessary next() we were doing without actually confirming that the 
> next was needed. One at the StoreScanner/ScanQueryMatcher level which this 
> diff avoids. I think the other is at hfs.next() (at the storefile scanner 
> level) that's happening whenever a HFile scanner servers out a data-- and 
> perhaps that's the additional seek that we need to avoid. But I want to 
> tackle this optimization first as the two issues seem unrelated.]
> -- 
> The basic idea of the patch I am working on/testing is as follows. The 
> ExplicitColumnTracker currently returns "INCLUDE" to the ScanQueryMatcher if 
> the KV needs to be included and then if done, only in the the next call it 
> returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases 
> when ExplicitColumnTracker knows it is done with a particular column/row, the 
> patch attempts to combine the INCLUDE code and done hint into a single match 
> code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-26 Thread Doug Meil (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115105#comment-13115105
 ] 

Doug Meil commented on HBASE-4448:
--

Jesse-  sounds like we're cool on #1 and #2.  Let me look at #3.

Ted- performance changes were in the attached spreadsheet (and earlier in the 
ticket), but that won't all come with this change... this is about getting the 
utility in place and then the changes identified in the spreadsheet will come 
in later tickets.  Probably 10 minutes just for the identified changes, with 
more possible. 

> HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
> instances across unit tests
> -
>
> Key: HBASE-4448
> URL: https://issues.apache.org/jira/browse/HBASE-4448
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: HBaseTestingUtilityFactory.java, 
> hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch
>
>
> Setting up and tearing down HBaseTestingUtility instances in unit tests is 
> very expensive.  On my MacBook it takes about 10 seconds to set up a 
> MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
> test classes that use this facility, that's a lot of time in the build.
> This factory assumes that the JVM is being re-used across test classes in the 
> build, otherwise this pattern won't work. 
> I don't think this is appropriate for every use, but I think it can be 
> applicable in a great many cases - especially where developers just want a 
> simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4415) Add configuration script for setup HBase (hbase-setup-conf.sh)

2011-09-26 Thread Giridharan Kesavan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115085#comment-13115085
 ] 

Giridharan Kesavan commented on HBASE-4415:
---

Eric,
Could you please add the following to the hbaese-setup-conf script?
set keytab directory permission to 700
set the service keytab owner to hbaseuser



> Add configuration script for setup HBase (hbase-setup-conf.sh)
> --
>
> Key: HBASE-4415
> URL: https://issues.apache.org/jira/browse/HBASE-4415
> Project: HBase
>  Issue Type: New Feature
>  Components: scripts
> Environment: Java 6, Linux
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: HBASE-4415-1.patch, HBASE-4415-2.patch, 
> HBASE-4415-3.patch, HBASE-4415-4.patch, HBASE-4415.patch
>
>
> The goal of this jura is to provide a installation script for configuring 
> HBase environment and configuration.  By using the same pattern of 
> *-setup-conf.sh for all Hadoop related projects.  For HBase, the usage of the 
> script looks like this:
> {noformat}
> usage: ./hbase-setup-conf.sh 
>   Optional parameters:
> --hadoop-conf=/etc/hadoopSet Hadoop configuration directory 
> location
> --hadoop-home=/usr   Set Hadoop directory location
> --hadoop-namenode=localhost  Set Hadoop namenode hostname
> --hadoop-replication=3   Set HDFS replication
> --hbase-home=/usrSet HBase directory location
> --hbase-conf=/etc/hbase  Set HBase configuration 
> directory location
> --hbase-log=/var/log/hbase   Set HBase log directory location
> --hbase-pid=/var/run/hbase   Set HBase pid directory location
> --hbase-user=hbase   Set HBase user
> --java-home=/usr/java/defaultSet JAVA_HOME directory location
> --kerberos-realm=KERBEROS.EXAMPLE.COMSet Kerberos realm
> --kerberos-principal-id=_HOSTSet Kerberos principal ID 
> --keytab-dir=/etc/security/keytabs   Set keytab directory
> --regionservers=localhostSet regionservers hostnames
> --zookeeper-home=/usrSet ZooKeeper directory location
> --zookeeper-quorum=localhost Set ZooKeeper Quorum
> --zookeeper-snapshot=/var/lib/zookeeper  Set ZooKeeper snapshot location
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers

2011-09-26 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115075#comment-13115075
 ] 

Lars Hofhansl commented on HBASE-4335:
--

Ran all test. There are three failures locally (TestHTablePool, 
TestDistributedLogSplitting, TestAdmin), and they fail with or without my 
change.


> Splits can create temporary holes in .META. that confuse clients and 
> regionservers
> --
>
> Key: HBASE-4335
> URL: https://issues.apache.org/jira/browse/HBASE-4335
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Joe Pallas
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4335-v2.txt, 4335.txt
>
>
> When a SplitTransaction is performed, three updates are done to .META.:
> 1. The parent region is marked as splitting (and hence offline)
> 2. The first daughter region is added (same start key as parent)
> 3. The second daughter region is added (split key is start key)
> (later, the original parent region is deleted, but that's not important to 
> this discussion)
> Steps 2 and 3 are actually done concurrently by 
> SplitTransaction.DaughterOpener threads.  While the master is notified when a 
> split is complete, the only visibility that clients have is whether the 
> daughter regions have appeared in .META.
> If the second daughter is added to .META. first, then .META. will contain the 
> (offline) parent region followed by the second daughter region.  If the 
> client looks up a key that is greater than (or equal to) the split, the 
> client will find the second daughter region and use it.  If the key is less 
> than the split key, the client will find the parent region and see that it is 
> offline, triggering a retry.
> If the first daughter is added to .META. before the second daughter, there is 
> a window during which .META. has a hole: the first daughter effectively hides 
> the parent region (same start key), but there is no entry for the second 
> daughter.  A region lookup will find the first daughter for all keys in the 
> parent's range, but the first daughter does not include keys at or beyond the 
> split key.
> See HBASE-4333 and HBASE-4334 for details on how this causes problems and 
> suggestions for mitigating this in the client and regionserver.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4298) Support to drain RS nodes through ZK

2011-09-26 Thread Aravind Gottipati (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravind Gottipati updated HBASE-4298:
-

Attachment: trunk_hbase.patch
90_hbase.patch

Patch files for trunk and 0.90.

> Support to drain RS nodes through ZK
> 
>
> Key: HBASE-4298
> URL: https://issues.apache.org/jira/browse/HBASE-4298
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.90.4
> Environment: all
>Reporter: Aravind Gottipati
>Priority: Critical
>  Labels: patch
> Fix For: 0.92.0, 0.90.5
>
> Attachments: 90_hbase.patch, trunk_hbase.patch
>
>
> HDFS currently has a way to exclude certain datanodes and prevent them from 
> getting new blocks.  HDFS goes one step further and even drains these nodes 
> for you.  This enhancement is a step in that direction.
> The idea is that we mark nodes in zookeeper as draining nodes.  This means 
> that they don't get any more new regions.  These draining nodes look exactly 
> the same as the corresponding nodes in /rs, except they live under /draining.
> Eventually, support for draining them can be added.  I am submitting two 
> patches for review - one for the 0.90 branch and one for trunk (in git).
> Here are the two patches
> 0.90 - 
> https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2
> trunk - 
> https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5
> I have tested both these patches and they work as advertised.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers

2011-09-26 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115071#comment-13115071
 ] 

Lars Hofhansl commented on HBASE-4335:
--

@Joe... Understood. :) And you are right about mocking. In this case it also 
makes sense to break execute into three parts: (1) the setup before opening the 
region (2) opening the regions (3) the post open stuff. A test can then call 1 
and 3 and mock 2 (for example by using the DaughterOpeners serially.
@Stack... Did you want to work on the test or should I?


> Splits can create temporary holes in .META. that confuse clients and 
> regionservers
> --
>
> Key: HBASE-4335
> URL: https://issues.apache.org/jira/browse/HBASE-4335
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Joe Pallas
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4335-v2.txt, 4335.txt
>
>
> When a SplitTransaction is performed, three updates are done to .META.:
> 1. The parent region is marked as splitting (and hence offline)
> 2. The first daughter region is added (same start key as parent)
> 3. The second daughter region is added (split key is start key)
> (later, the original parent region is deleted, but that's not important to 
> this discussion)
> Steps 2 and 3 are actually done concurrently by 
> SplitTransaction.DaughterOpener threads.  While the master is notified when a 
> split is complete, the only visibility that clients have is whether the 
> daughter regions have appeared in .META.
> If the second daughter is added to .META. first, then .META. will contain the 
> (offline) parent region followed by the second daughter region.  If the 
> client looks up a key that is greater than (or equal to) the split, the 
> client will find the second daughter region and use it.  If the key is less 
> than the split key, the client will find the parent region and see that it is 
> offline, triggering a retry.
> If the first daughter is added to .META. before the second daughter, there is 
> a window during which .META. has a hole: the first daughter effectively hides 
> the parent region (same start key), but there is no entry for the second 
> daughter.  A region lookup will find the first daughter for all keys in the 
> parent's range, but the first daughter does not include keys at or beyond the 
> split key.
> See HBASE-4333 and HBASE-4334 for details on how this causes problems and 
> suggestions for mitigating this in the client and regionserver.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4494) AvroServer:: get fails with NPE on a non-existent row

2011-09-26 Thread Kay Kay (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated HBASE-4494:
---

Status: Patch Available  (was: Open)

> AvroServer:: get fails with NPE on a non-existent row
> -
>
> Key: HBASE-4494
> URL: https://issues.apache.org/jira/browse/HBASE-4494
> Project: HBase
>  Issue Type: Bug
>  Components: avro
>Affects Versions: 0.90.4
>Reporter: Kay Kay
>Assignee: Kay Kay
> Fix For: 0.90.5
>
> Attachments: HBASE-4494.patch
>
>
> Try to submit a get request to the avro gateway. 
> If the row specified for a given table does not exist, the server request 
> fails with a NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4494) AvroServer:: get fails with NPE on a non-existent row

2011-09-26 Thread Kay Kay (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated HBASE-4494:
---

Attachment: HBASE-4494.patch

> AvroServer:: get fails with NPE on a non-existent row
> -
>
> Key: HBASE-4494
> URL: https://issues.apache.org/jira/browse/HBASE-4494
> Project: HBase
>  Issue Type: Bug
>  Components: avro
>Affects Versions: 0.90.4
>Reporter: Kay Kay
>Assignee: Kay Kay
> Fix For: 0.90.5
>
> Attachments: HBASE-4494.patch
>
>
> Try to submit a get request to the avro gateway. 
> If the row specified for a given table does not exist, the server request 
> fails with a NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4494) AvroServer:: get fails with NPE on a non-existent row

2011-09-26 Thread Kay Kay (Created) (JIRA)
AvroServer:: get fails with NPE on a non-existent row
-

 Key: HBASE-4494
 URL: https://issues.apache.org/jira/browse/HBASE-4494
 Project: HBase
  Issue Type: Bug
  Components: avro
Affects Versions: 0.90.4
Reporter: Kay Kay
Assignee: Kay Kay
 Fix For: 0.90.5
 Attachments: HBASE-4494.patch

Try to submit a get request to the avro gateway. 

If the row specified for a given table does not exist, the server request fails 
with a NPE.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115068#comment-13115068
 ] 

Ted Yu commented on HBASE-4489:
---

@Dave:
I meant that numRegions could be negative. This is minor.

> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers

2011-09-26 Thread Joe Pallas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115066#comment-13115066
 ] 

Joe Pallas commented on HBASE-4335:
---

@Lars: My experience with testing without an artificial delay suggests it is 
not a good option if you want to see the test actually fail without the fix.  
It just doesn’t happen frequently, although it does happen.

If I were working on this code, I would consider improving unit testability by 
making MetaEditor into an interface that can be mocked.  I realize that “If I 
were working on this code …” is not very helpful, but I don’t have permission 
from my employer to make any code contributions at the moment :-(.

> Splits can create temporary holes in .META. that confuse clients and 
> regionservers
> --
>
> Key: HBASE-4335
> URL: https://issues.apache.org/jira/browse/HBASE-4335
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Joe Pallas
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4335-v2.txt, 4335.txt
>
>
> When a SplitTransaction is performed, three updates are done to .META.:
> 1. The parent region is marked as splitting (and hence offline)
> 2. The first daughter region is added (same start key as parent)
> 3. The second daughter region is added (split key is start key)
> (later, the original parent region is deleted, but that's not important to 
> this discussion)
> Steps 2 and 3 are actually done concurrently by 
> SplitTransaction.DaughterOpener threads.  While the master is notified when a 
> split is complete, the only visibility that clients have is whether the 
> daughter regions have appeared in .META.
> If the second daughter is added to .META. first, then .META. will contain the 
> (offline) parent region followed by the second daughter region.  If the 
> client looks up a key that is greater than (or equal to) the split, the 
> client will find the second daughter region and use it.  If the key is less 
> than the split key, the client will find the parent region and see that it is 
> offline, triggering a retry.
> If the first daughter is added to .META. before the second daughter, there is 
> a window during which .META. has a hole: the first daughter effectively hides 
> the parent region (same start key), but there is no entry for the second 
> daughter.  A region lookup will find the first daughter for all keys in the 
> parent's range, but the first daughter does not include keys at or beyond the 
> split key.
> See HBASE-4333 and HBASE-4334 for details on how this causes problems and 
> suggestions for mitigating this in the client and regionserver.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115063#comment-13115063
 ] 

Ted Yu commented on HBASE-4448:
---

I haven't gone through every line of the patch.
Minor comment:
{code}
+  i = new Integer(ONE);
{code}
ONE is already an Integer, right ?

I wasn't involved in the early discussion of this ticket. So I hope Jesse and 
Doug can reach agreement and create other tickets if needed.

If improvement in running time of tests by using the patch can be shown, that 
would be more convincing.

> HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
> instances across unit tests
> -
>
> Key: HBASE-4448
> URL: https://issues.apache.org/jira/browse/HBASE-4448
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: HBaseTestingUtilityFactory.java, 
> hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch
>
>
> Setting up and tearing down HBaseTestingUtility instances in unit tests is 
> very expensive.  On my MacBook it takes about 10 seconds to set up a 
> MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
> test classes that use this facility, that's a lot of time in the build.
> This factory assumes that the JVM is being re-used across test classes in the 
> build, otherwise this pattern won't work. 
> I don't think this is appropriate for every use, but I think it can be 
> applicable in a great many cases - especially where developers just want a 
> simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-26 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115061#comment-13115061
 ] 

Jesse Yates commented on HBASE-4448:


Quick synopsis:
re:re: #1
So you are thinking that people would just the TestingUtility if they need to 
have their own unique cluster? I guess I was thinking that there would be a 
gain by reusing those objects, but thinking about it, I doubt it (also thinking 
everything would be brokered by the Factory, but it really needn't).

Also was thinking we need to do a review of mini-cluster usage on things like 
REST - had the same hunch when I was grepping through the tests.

re:re #2
+1

re:re #3
I don't think this actually runs multiple times - run() just has a try method 
and won't loop.
I'm ok with making it simple right now, lets just make a note to complicate it 
later ;)

Overall, I'm ok with it, except for the looping in #3

> HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
> instances across unit tests
> -
>
> Key: HBASE-4448
> URL: https://issues.apache.org/jira/browse/HBASE-4448
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: HBaseTestingUtilityFactory.java, 
> hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch
>
>
> Setting up and tearing down HBaseTestingUtility instances in unit tests is 
> very expensive.  On my MacBook it takes about 10 seconds to set up a 
> MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
> test classes that use this facility, that's a lot of time in the build.
> This factory assumes that the JVM is being re-used across test classes in the 
> build, otherwise this pattern won't work. 
> I don't think this is appropriate for every use, but I think it can be 
> applicable in a great many cases - especially where developers just want a 
> simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-26 Thread Dave Revell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115057#comment-13115057
 ] 

Dave Revell commented on HBASE-4489:


Ted: I'm not clear what you're suggesting. Do you want to see length checking 
of the arrays returned from Bytes.split()?

> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115055#comment-13115055
 ] 

Jonathan Gray commented on HBASE-4492:
--

Okay, so definitely after that build.  I will dig through the log more.

> TestRollingRestart fails intermittently
> ---
>
> Key: HBASE-4492
> URL: https://issues.apache.org/jira/browse/HBASE-4492
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jonathan Gray
>
> I got the following when running test suite on TRUNK:
> {code}
> testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
> Time elapsed: 300.28 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
> {code}
> I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
> wiped out test output file for the failed test.
> Similar failure can be found on Jenkins:
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115054#comment-13115054
 ] 

Ted Yu commented on HBASE-4489:
---

bq. since an MD5 hash is just a 128-bit number and can start with any digit.
The above was confirmed by the creator of MD5 hash.

> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-26 Thread Dave Revell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115052#comment-13115052
 ] 

Dave Revell commented on HBASE-4489:


The bug in MD5StringSplit I mentioned in my earlier comment occurs in the 
definition of the variable MAXMD5 in RegionSplitter.java.

> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-26 Thread Dave Revell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115051#comment-13115051
 ] 

Dave Revell commented on HBASE-4489:


Mingjie, I would agree with you if the existing behavior was sane, but it has 
some problems:

1. Using ASCII strings as keys is a poor choice, and to have it be a default in 
a builtin tool would send the wrong message. Since HFiles repeat the key for 
every cell in the table, small key size is very important.

2. The MD5StringSplit class contains a bug that makes the current behavior even 
less sane. It assumes that an ASCII hex representation of an MD5 hash begins 
with 0, 1, 2, 3, 4, 5, 6, or 7. This is incorrect, since an MD5 hash is just a 
128-bit number and can start with any digit. The result will be a single 
oversized region at the high end of the key space.

So as far as I can tell, the existing behavior does the wrong thing, and 
furthermore does it wrongly. We shouldn't preserve this situation. 

If I've misunderstood the situation I definitely welcome corrections.

> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115049#comment-13115049
 ] 

Ted Yu commented on HBASE-4492:
---

HBASE-4446 went into 0.92 build 17.
FYI

> TestRollingRestart fails intermittently
> ---
>
> Key: HBASE-4492
> URL: https://issues.apache.org/jira/browse/HBASE-4492
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jonathan Gray
>
> I got the following when running test suite on TRUNK:
> {code}
> testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
> Time elapsed: 300.28 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
> {code}
> I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
> wiped out test output file for the failed test.
> Similar failure can be found on Jenkins:
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4219) Add Per-Column Family Metrics

2011-09-26 Thread Jonathan Gray (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4219:
-

Attachment: HBASE-4219-v4.patch

This is the latest patch given to me by Nicolas and then rebased on tip of 
trunk.

Going to dig in now on why this is not working with public hadoop.

> Add Per-Column Family Metrics
> -
>
> Key: HBASE-4219
> URL: https://issues.apache.org/jira/browse/HBASE-4219
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.92.0
>Reporter: Nicolas Spiegelberg
>Assignee: David Goode
> Fix For: 0.92.0
>
> Attachments: 4219-v2.txt, 4219-v3.txt, HBASE-4219-v4.patch, 
> HBASE-4219_percfmetrics_1.patch
>
>
> Right now, we have region server level statistics.  However, the read/write 
> flow varies a lot based on the column family involved.  We should add 
> dynamic, per column family metrics to JMX so we can track each column family 
> individually.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115044#comment-13115044
 ] 

Jonathan Gray commented on HBASE-4492:
--

Or is that hudson run from after the commit of the root/meta availability 
changes?

> TestRollingRestart fails intermittently
> ---
>
> Key: HBASE-4492
> URL: https://issues.apache.org/jira/browse/HBASE-4492
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jonathan Gray
>
> I got the following when running test suite on TRUNK:
> {code}
> testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
> Time elapsed: 300.28 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
> {code}
> I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
> wiped out test output file for the failed test.
> Similar failure can be found on Jenkins:
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115043#comment-13115043
 ] 

Jonathan Gray commented on HBASE-4492:
--

Looking through this, my hope would be that the recent ROOT/META changes that 
went in will fix it.  I'll keep an eye on this.  If anyone sees if fail in 
hudson again, please holler in this jira.

Thanks for opening this Ted.

> TestRollingRestart fails intermittently
> ---
>
> Key: HBASE-4492
> URL: https://issues.apache.org/jira/browse/HBASE-4492
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Jonathan Gray
>
> I got the following when running test suite on TRUNK:
> {code}
> testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
> Time elapsed: 300.28 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
> at 
> org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
> {code}
> I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
> wiped out test output file for the failed test.
> Similar failure can be found on Jenkins:
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4493) book.xml - moving 2 entries to newly created RegionServer section

2011-09-26 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4493:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> book.xml - moving 2 entries to newly created RegionServer section
> -
>
> Key: HBASE-4493
> URL: https://issues.apache.org/jira/browse/HBASE-4493
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: book_HBASE_4493.xml.patch
>
>
> book.xml
> * Arch section.  Since RegionServer is now a top-level section under Arch, 
> moved existing BlockCache and WAL entries under RegionServer instead of being 
> under Regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4493) book.xml - moving 2 entries to newly created RegionServer section

2011-09-26 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4493:
-

Attachment: book_HBASE_4493.xml.patch

> book.xml - moving 2 entries to newly created RegionServer section
> -
>
> Key: HBASE-4493
> URL: https://issues.apache.org/jira/browse/HBASE-4493
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: book_HBASE_4493.xml.patch
>
>
> book.xml
> * Arch section.  Since RegionServer is now a top-level section under Arch, 
> moved existing BlockCache and WAL entries under RegionServer instead of being 
> under Regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4493) book.xml - moving 2 entries to newly created RegionServer section

2011-09-26 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4493:
-

Status: Patch Available  (was: Open)

> book.xml - moving 2 entries to newly created RegionServer section
> -
>
> Key: HBASE-4493
> URL: https://issues.apache.org/jira/browse/HBASE-4493
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: book_HBASE_4493.xml.patch
>
>
> book.xml
> * Arch section.  Since RegionServer is now a top-level section under Arch, 
> moved existing BlockCache and WAL entries under RegionServer instead of being 
> under Regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115039#comment-13115039
 ] 

Jonathan Gray commented on HBASE-4131:
--

Thanks Ted.  Will commit with your suggestion.

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: 4131-backedout.txt, replicationInterface1.txt, 
> replicationInterface2.txt, replicationInterface3.txt, 
> replicationInterface4.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4493) book.xml - moving 2 entries to newly created RegionServer section

2011-09-26 Thread Doug Meil (Created) (JIRA)
book.xml - moving 2 entries to newly created RegionServer section
-

 Key: HBASE-4493
 URL: https://issues.apache.org/jira/browse/HBASE-4493
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor


book.xml
* Arch section.  Since RegionServer is now a top-level section under Arch, 
moved existing BlockCache and WAL entries under RegionServer instead of being 
under Regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4298) Support to drain RS nodes through ZK

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115032#comment-13115032
 ] 

Ted Yu commented on HBASE-4298:
---

@Aravind:
Reading plain text version in my mailbox isn't hard at all.

Thanks for taking care of my review comments. Appreciate it.

Can you attach the two patches to this JIRA or publish them on reviewboard ?
That way you can get more helpful comments and I can run test suite over them.

Good job.

> Support to drain RS nodes through ZK
> 
>
> Key: HBASE-4298
> URL: https://issues.apache.org/jira/browse/HBASE-4298
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.90.4
> Environment: all
>Reporter: Aravind Gottipati
>Priority: Critical
>  Labels: patch
> Fix For: 0.92.0, 0.90.5
>
>
> HDFS currently has a way to exclude certain datanodes and prevent them from 
> getting new blocks.  HDFS goes one step further and even drains these nodes 
> for you.  This enhancement is a step in that direction.
> The idea is that we mark nodes in zookeeper as draining nodes.  This means 
> that they don't get any more new regions.  These draining nodes look exactly 
> the same as the corresponding nodes in /rs, except they live under /draining.
> Eventually, support for draining them can be added.  I am submitting two 
> patches for review - one for the 0.90 branch and one for trunk (in git).
> Here are the two patches
> 0.90 - 
> https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2
> trunk - 
> https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5
> I have tested both these patches and they work as advertised.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4491) HBase Locality Checker

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115017#comment-13115017
 ] 

Ted Yu commented on HBASE-4491:
---

@Liyin:
Is the followup task covered by HBASE-4191 ?

Good utility. Looking forward to it.

> HBase Locality Checker
> --
>
> Key: HBASE-4491
> URL: https://issues.apache.org/jira/browse/HBASE-4491
> Project: HBase
>  Issue Type: New Feature
>Reporter: Liyin Tang
>Assignee: Liyin Tang
>
> If we run data node and region server in the same physical machine, region 
> server will be benefit if the store files for its serving regions have a 
> local replica in the data node process.
> So for each regions, there exists a best locality region server which has 
> most local blocks for this region.
> The HBase Locality Checker will show how many regions is running on its best 
> locality region server. 
> The higher the number is, the more performance benefits HBase can get from 
> data locality.
> Also there would be a followup task to use these region locality information 
> for region assignment. Assignment manager will prefer assign regions to its 
> best locality region server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4477) Ability for an application to store metadata into the transaction log

2011-09-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115006#comment-13115006
 ] 

Jonathan Gray commented on HBASE-4477:
--

I think this will take a few changes to the Coprocessor API if we want to only 
use RegionObserver and WALObserver.

Andy, what should one do if additional arguments are needed to, say, the 
prePut() call?  Do we add multiple prePut() calls to the RegionObserver 
interface?  Potentially deprecate the old ones?

If things are built only on Coprocessor interfaces, do people see us including 
these in some kind of coprocessor contrib or should they be out on github or 
something?

> Ability for an application to store metadata into the transaction log
> -
>
> Key: HBASE-4477
> URL: https://issues.apache.org/jira/browse/HBASE-4477
> Project: HBase
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: hlogMetadata1.txt
>
>
> mySQL allows an application to store an arbitrary blob along with each 
> transaction in its transaction logs. This JIRA is to have a similar feature 
> request for HBASE.
> The use case is as follows: An application on one data center A stores a blob 
> of data along with each transaction. A replication software picks up these 
> blobs from the transaction logs in A and hands it to another instance of the 
> same application running on a remote data center B. The application in B is 
> responsible for applying this to the remote Hbase cluster (and also handle 
> conflict resolution if any).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4480) Testing script to simplfy local testing

2011-09-26 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4480:
--

Attachment: runtest.sh

Corrected a typo in previous script.
Also log the number of iterations at which the failure occurs.

> Testing script to simplfy local testing
> ---
>
> Key: HBASE-4480
> URL: https://issues.apache.org/jira/browse/HBASE-4480
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jesse Yates
>Priority: Minor
>  Labels: test
> Attachments: runtest.sh
>
>
> As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
> http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
> script that would handle more of the finer points of running/checking our 
> test suite.
> This script should:
> (1) Allow people to determine which tests are hanging/taking a long time to 
> run
> (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
> running the whole suite that caused the failure
> (3) Allow people to specify to run just unit tests or also integration tests 
> (essentially wrapping calls to 'maven test' and 'maven verify').
> This script should just be a convenience script - running tests directly from 
> maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4465) Lazy-seek optimization for StoreFile scanners

2011-09-26 Thread Mikhail Bautin (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4465:
--

Fix Version/s: (was: 0.92.0)

> Lazy-seek optimization for StoreFile scanners
> -
>
> Key: HBASE-4465
> URL: https://issues.apache.org/jira/browse/HBASE-4465
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
>  Labels: optimization, seek
> Fix For: 0.89.20100924, 0.94.0
>
>
> Previously, if we had several StoreFiles for a column family in a region, we 
> would seek in each of them and only then merge the results, even though the 
> row/column we are looking for might only be in the most recent (and the 
> smallest) file. Now we prioritize our reads from those files so that we check 
> the most recent file first. This is done by doing a "lazy seek" which 
> pretends that the next value in the StoreFile is (seekRow, seekColumn, 
> lastTimestampInStoreFile), which is earlier in the KV order than anything 
> that might actually occur in the file. So if we don't find the result in 
> earlier files, that fake KV will bubble up to the top of the KV heap and a 
> real seek will be done. This is expected to significantly reduce the amount 
> of disk IO (as of 09/22/2011 we are doing dark launch testing and 
> measurement).
> This is joint work with Liyin Tang -- huge thanks to him for many helpful 
> discussions on this and the idea of putting fake KVs with the highest 
> timestamp of the StoreFile in the scanner priority queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115000#comment-13115000
 ] 

Ted Yu commented on HBASE-4131:
---

@Jonathan:
Thanks for taking Apache build seriously.
I created HBASE-4492 with reference to a failure on Jenkins.

I am +1 on patch v4 with the minor modification as mentioned @ 26/Sep/11 20:52

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: 4131-backedout.txt, replicationInterface1.txt, 
> replicationInterface2.txt, replicationInterface3.txt, 
> replicationInterface4.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-26 Thread Ted Yu (Created) (JIRA)
TestRollingRestart fails intermittently
---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Jonathan Gray


I got the following when running test suite on TRUNK:
{code}
testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
Time elapsed: 300.28 sec  <<< ERROR!
java.lang.Exception: test timed out after 30 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
at 
org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
{code}
I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
wiped out test output file for the failed test.

Similar failure can be found on Jenkins:
https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4480) Testing script to simplfy local testing

2011-09-26 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4480:
--

Attachment: (was: runtest.sh)

> Testing script to simplfy local testing
> ---
>
> Key: HBASE-4480
> URL: https://issues.apache.org/jira/browse/HBASE-4480
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jesse Yates
>Priority: Minor
>  Labels: test
>
> As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
> http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
> script that would handle more of the finer points of running/checking our 
> test suite.
> This script should:
> (1) Allow people to determine which tests are hanging/taking a long time to 
> run
> (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
> running the whole suite that caused the failure
> (3) Allow people to specify to run just unit tests or also integration tests 
> (essentially wrapping calls to 'maven test' and 'maven verify').
> This script should just be a convenience script - running tests directly from 
> maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4491) HBase Locality Checker

2011-09-26 Thread Liyin Tang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115025#comment-13115025
 ] 

Liyin Tang commented on HBASE-4491:
---

@Ted: Yes, it looks like be covered HBASE-4191. I can follow up for HBASE-4191.

> HBase Locality Checker
> --
>
> Key: HBASE-4491
> URL: https://issues.apache.org/jira/browse/HBASE-4491
> Project: HBase
>  Issue Type: New Feature
>Reporter: Liyin Tang
>Assignee: Liyin Tang
>
> If we run data node and region server in the same physical machine, region 
> server will be benefit if the store files for its serving regions have a 
> local replica in the data node process.
> So for each regions, there exists a best locality region server which has 
> most local blocks for this region.
> The HBase Locality Checker will show how many regions is running on its best 
> locality region server. 
> The higher the number is, the more performance benefits HBase can get from 
> data locality.
> Also there would be a followup task to use these region locality information 
> for region assignment. Assignment manager will prefer assign regions to its 
> best locality region server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4491) HBase Locality Checker

2011-09-26 Thread Liyin Tang (Created) (JIRA)
HBase Locality Checker
--

 Key: HBASE-4491
 URL: https://issues.apache.org/jira/browse/HBASE-4491
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Liyin Tang


If we run data node and region server in the same physical machine, region 
server will be benefit if the store files for its serving regions have a local 
replica in the data node process.

So for each regions, there exists a best locality region server which has most 
local blocks for this region.
The HBase Locality Checker will show how many regions is running on its best 
locality region server. 
The higher the number is, the more performance benefits HBase can get from data 
locality.

Also there would be a followup task to use these region locality information 
for region assignment. Assignment manager will prefer assign regions to its 
best locality region server.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4485) Eliminate window of missing Data

2011-09-26 Thread Amitanand Aiyer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer updated HBASE-4485:
---

Description: 
After incorporating v11 of the 2856 fix, we discovered that we are still having 
some ACID violations.

This time, however, the problem is not about including "newer" updates; but, 
about missing older updates
that should be including. 

Here is what seems to be happening.

There is a race condition in the StoreScanner.getScanners()

  private List getScanners(Scan scan,
  final NavigableSet columns) throws IOException {
// First the store file scanners
List sfScanners = StoreFileScanner
  .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks,
isGet, false);
List scanners =
  new ArrayList(sfScanners.size()+1);

// include only those scan files which pass all filters
for (StoreFileScanner sfs : sfScanners) {
  if (sfs.shouldSeek(scan, columns)) {
scanners.add(sfs);
  }
}

// Then the memstore scanners
if (this.store.memstore.shouldSeek(scan)) {
  scanners.addAll(this.store.memstore.getScanners());
}
return scanners;
  }


If for example there is a call to Store.updateStorefiles() that happens between
the store.getStorefiles() and this.store.memstore.getScanners(); then
it is possible that there was a new HFile created, that is not seen by the
StoreScanner, and the data is not present in the Memstore.snapshot either.


  was:
After incorporating v11 of the 2856 fix, we discovered that we are still having 
some ACID violations.

This time, however, the problem is not about including "newer" updates; but, 
about missing older updates
that should be including. 

Here is what seems to be happing.


0 - Scanner starts scanning.

0 - MemStore.snapshot is called.

Scanner has access to kvHeap and snapshot

1-  Flush takes place. 
 1.1 KV's in the snapshot are written to the disk.
 1.2 HFile is ready. 

2   Store.updateStoreFiles() deletes the old snapshot.

 2.1 updateReaders will not be called until the end of the columnFamily 
seek.

3  For a brief window of time, scanner does not have access to certain 
KeyValues.
   a) Scanner has no longer access to the snapshot because it is flushed to the
disk. 
   b) It does not yet have access to the HFile because the updateReaders was
not called yet.



> Eliminate window of missing Data
> 
>
> Key: HBASE-4485
> URL: https://issues.apache.org/jira/browse/HBASE-4485
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.94.0
>
> Attachments: 4485-v1.diff
>
>
> After incorporating v11 of the 2856 fix, we discovered that we are still 
> having some ACID violations.
> This time, however, the problem is not about including "newer" updates; but, 
> about missing older updates
> that should be including. 
> Here is what seems to be happening.
> There is a race condition in the StoreScanner.getScanners()
>   private List getScanners(Scan scan,
>   final NavigableSet columns) throws IOException {
> // First the store file scanners
> List sfScanners = StoreFileScanner
>   .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks,
> isGet, false);
> List scanners =
>   new ArrayList(sfScanners.size()+1);
> // include only those scan files which pass all filters
> for (StoreFileScanner sfs : sfScanners) {
>   if (sfs.shouldSeek(scan, columns)) {
> scanners.add(sfs);
>   }
> }
> // Then the memstore scanners
> if (this.store.memstore.shouldSeek(scan)) {
>   scanners.addAll(this.store.memstore.getScanners());
> }
> return scanners;
>   }
> If for example there is a call to Store.updateStorefiles() that happens 
> between
> the store.getStorefiles() and this.store.memstore.getScanners(); then
> it is possible that there was a new HFile created, that is not seen by the
> StoreScanner, and the data is not present in the Memstore.snapshot either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4298) Support to drain RS nodes through ZK

2011-09-26 Thread Aravind Gottipati (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114995#comment-13114995
 ] 

Aravind Gottipati commented on HBASE-4298:
--

Well.. JIRA lost all my formatting in my last comment..  I hope it still makes 
sense.

The latest changesets are 

https://github.com/aravind/hbase/commit/46f3b58c60f4f1c81806fdad6e606badf84fc30c
 for trunk.

https://github.com/aravind/hbase/commit/e6cf9ecf78f8e0d6f46c2a77a524e6bccec45001
 for 0.90.

> Support to drain RS nodes through ZK
> 
>
> Key: HBASE-4298
> URL: https://issues.apache.org/jira/browse/HBASE-4298
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.90.4
> Environment: all
>Reporter: Aravind Gottipati
>Priority: Critical
>  Labels: patch
> Fix For: 0.92.0, 0.90.5
>
>
> HDFS currently has a way to exclude certain datanodes and prevent them from 
> getting new blocks.  HDFS goes one step further and even drains these nodes 
> for you.  This enhancement is a step in that direction.
> The idea is that we mark nodes in zookeeper as draining nodes.  This means 
> that they don't get any more new regions.  These draining nodes look exactly 
> the same as the corresponding nodes in /rs, except they live under /draining.
> Eventually, support for draining them can be added.  I am submitting two 
> patches for review - one for the 0.90 branch and one for trunk (in git).
> Here are the two patches
> 0.90 - 
> https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2
> trunk - 
> https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5
> I have tested both these patches and they work as advertised.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4298) Support to drain RS nodes through ZK

2011-09-26 Thread Aravind Gottipati (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114991#comment-13114991
 ] 

Aravind Gottipati commented on HBASE-4298:
--

@Ted: Thank you for the review.  I made some changes and updated my patch (in 
github).  Notes in line.

* I think we only need to log the number of draining servers.
* The javadoc should state that keys of the map are region servers.
* For DrainingServerTracker.java, please remove year.
* For nodeChildrenChanged(), please change the sentence for catch black of 
IOException, it mentioned zk exception.
* Should remove the createNode rather than just comment it out.
* serverManager methods is different between add() and remove(): one inside 
synchronized block, one outside.
* I think a better name maybe "zookeeper.znode.draining.rs"

- I agree with all of these and they are all fixed in my latest code push (on 
github).

* I wonder if Map is needed for drainingServers because it is private and 
getDrainingServersList() only returns the keySet.
- The map isn't required, but I followed the example of onlineServers and 
serverConnections.  For code in trunk, I have changed it to a ArrayList.  A 
similar change does not work (easily) in the 0.90 branch.  Code in 
AssignmentManager uses HServerInfo in 0.90, and changing drainingServers to an 
array list will mean key lookups etc.  I have left it as a Map in 0.90, but I 
changed it to a list in trunk.

* removeServerFromDrainList / addServerToDrainList should return a boolean.
- The remove and add methods are called from DrainingServerTracker.  The 
context is a ZK callback, and the corresponding remove and add functions there 
simply return voids.  I changed the code to return booleans in trunk, but left 
it as void in the 0.90 branch. I figured they might actually be used in trunk, 
but I doubt they will be in 0.90.

* Unit tests..
- I will work with Stack and get the tests to you.

* Share your experience from using this in your environment.
- To reboot the cluster, we currently drain one server at a time (using the 
graceful stop shell script).  This process takes forever to go through all the 
servers.  The goal here is to enable us to drain multiple servers 
simultaneously.  Doing this by keeping track of servers externally makes the 
programming painful, and we'd have to share state somehow between different 
scripts that all aim to drain different servers.  Leaving this list in ZK and 
having HBase keep them from getting new regions seems like the right way to go 
about it.  I have tested this in a test cluster of about 14 servers.  This code 
by itself only solves one part of our problem.  The rest of it will be solved 
by command line scripts that will create nodes to be shut down under 
/draining/rs in ZK, and then move regions out from them.

Please let me know if you have any other questions about this stuff.


> Support to drain RS nodes through ZK
> 
>
> Key: HBASE-4298
> URL: https://issues.apache.org/jira/browse/HBASE-4298
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.90.4
> Environment: all
>Reporter: Aravind Gottipati
>Priority: Critical
>  Labels: patch
> Fix For: 0.92.0, 0.90.5
>
>
> HDFS currently has a way to exclude certain datanodes and prevent them from 
> getting new blocks.  HDFS goes one step further and even drains these nodes 
> for you.  This enhancement is a step in that direction.
> The idea is that we mark nodes in zookeeper as draining nodes.  This means 
> that they don't get any more new regions.  These draining nodes look exactly 
> the same as the corresponding nodes in /rs, except they live under /draining.
> Eventually, support for draining them can be added.  I am submitting two 
> patches for review - one for the 0.90 branch and one for trunk (in git).
> Here are the two patches
> 0.90 - 
> https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2
> trunk - 
> https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5
> I have tested both these patches and they work as advertised.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114987#comment-13114987
 ] 

Jonathan Gray commented on HBASE-4131:
--

@Ted, so are you +1 to commit patch v4 now?

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: 4131-backedout.txt, replicationInterface1.txt, 
> replicationInterface2.txt, replicationInterface3.txt, 
> replicationInterface4.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-26 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114985#comment-13114985
 ] 

Jonathan Gray commented on HBASE-4131:
--

@Ted, I don't think this diff is related to TestRollingRestart in any way.  You 
might want to open a separate JIRA and put in your log file from the failed 
TestRollingRestart.  Assign to me if you'd like me to take a look.  

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: 4131-backedout.txt, replicationInterface1.txt, 
> replicationInterface2.txt, replicationInterface3.txt, 
> replicationInterface4.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3512) Coprocessors: Shell support for listing currently loaded coprocessor set

2011-09-26 Thread Mingjie Lai (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114986#comment-13114986
 ] 

Mingjie Lai commented on HBASE-3512:


A update here: this issue will be resolved by a patch at HBASE-4070. The patch 
will fix this one as a bonus. Will close this jira after 4070 getting in. 

> Coprocessors: Shell support for listing currently loaded coprocessor set
> 
>
> Key: HBASE-3512
> URL: https://issues.apache.org/jira/browse/HBASE-3512
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Reporter: Andrew Purtell
>Assignee: Mingjie Lai
> Fix For: 0.92.0
>
>
> Add support to the shell for listing the coprocessors loaded globally on the 
> regionserver and those loaded on a per-table basis.
> Perhaps by extending the 'status' command.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4490) Improve TestRollingRestart to cover complex cases

2011-09-26 Thread Ted Yu (Created) (JIRA)
Improve TestRollingRestart to cover complex cases
-

 Key: HBASE-4490
 URL: https://issues.apache.org/jira/browse/HBASE-4490
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu


HBASE-4455 fixed region server rolling restart scenario where ROOT and .META. 
regions could become invisible in AssignmentManager point of view.
This JIRA would create integration test(s) that simulate the above scenario and 
verify that the fix in HBASE-4455 indeed works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-26 Thread Mingjie Lai (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114982#comment-13114982
 ] 

Mingjie Lai commented on HBASE-4489:


Dave. 

Changing a default configuration may affect existing users:

{code}
 Class splitClass = conf.getClass(
-"split.algorithm", MD5StringSplit.class, SplitAlgorithm.class);
+"split.algorithm", UniformSplit.class, SplitAlgorithm.class);
 SplitAlgorithm splitAlgo;
{code}



> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4488) Store could miss rows during flush

2011-09-26 Thread Lars Hofhansl (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reassigned HBASE-4488:


Assignee: Lars Hofhansl

> Store could miss rows during flush
> --
>
> Key: HBASE-4488
> URL: https://issues.apache.org/jira/browse/HBASE-4488
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4488.txt
>
>
> While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
> critical mistake:
> The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114977#comment-13114977
 ] 

Ted Yu commented on HBASE-4131:
---

I got one test failure in the suite:
{code}
testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
Time elapsed: 300.28 sec  <<< ERROR!
java.lang.Exception: test timed out after 30 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
at 
org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
{code}
It passed when run standalone.

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: 4131-backedout.txt, replicationInterface1.txt, 
> replicationInterface2.txt, replicationInterface3.txt, 
> replicationInterface4.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114973#comment-13114973
 ] 

Ted Yu commented on HBASE-4489:
---

I looked at patch for TRUNK:
{code}
+  byte[][] splitKeysPlusEndpoints = Bytes.split(firstRowBytes, 
lastRowBytes,
+  numRegions-1);
{code}
It seems range checking is missing.

> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-26 Thread Dave Revell (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Revell updated HBASE-4489:
---

Affects Version/s: 0.90.4
   Status: Patch Available  (was: Open)

> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-26 Thread Dave Revell (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Revell updated HBASE-4489:
---

Attachment: HBASE-4489-trunk-v1.patch
HBASE-4489-branch0.90-v1.patch

> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers

2011-09-26 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114964#comment-13114964
 ] 

Lars Hofhansl commented on HBASE-4335:
--

@Stack... Not sure how you can test the actual problem unless you either (1) 
instrument the code to introduce an artificial delay triggered by test code or 
(2) artificially create holes in meta. Neither are good options IMHO.

Maybe a probabilistic test that just splits a table and calls get 
simultaneously.


> Splits can create temporary holes in .META. that confuse clients and 
> regionservers
> --
>
> Key: HBASE-4335
> URL: https://issues.apache.org/jira/browse/HBASE-4335
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Joe Pallas
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4335-v2.txt, 4335.txt
>
>
> When a SplitTransaction is performed, three updates are done to .META.:
> 1. The parent region is marked as splitting (and hence offline)
> 2. The first daughter region is added (same start key as parent)
> 3. The second daughter region is added (split key is start key)
> (later, the original parent region is deleted, but that's not important to 
> this discussion)
> Steps 2 and 3 are actually done concurrently by 
> SplitTransaction.DaughterOpener threads.  While the master is notified when a 
> split is complete, the only visibility that clients have is whether the 
> daughter regions have appeared in .META.
> If the second daughter is added to .META. first, then .META. will contain the 
> (offline) parent region followed by the second daughter region.  If the 
> client looks up a key that is greater than (or equal to) the split, the 
> client will find the second daughter region and use it.  If the key is less 
> than the split key, the client will find the parent region and see that it is 
> offline, triggering a retry.
> If the first daughter is added to .META. before the second daughter, there is 
> a window during which .META. has a hole: the first daughter effectively hides 
> the parent region (same start key), but there is no entry for the second 
> daughter.  A region lookup will find the first daughter for all keys in the 
> parent's range, but the first daughter does not include keys at or beyond the 
> split key.
> See HBASE-4333 and HBASE-4334 for details on how this causes problems and 
> suggestions for mitigating this in the client and regionserver.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-26 Thread Dave Revell (Created) (JIRA)
Better key splitting in RegionSplitter
--

 Key: HBASE-4489
 URL: https://issues.apache.org/jira/browse/HBASE-4489
 Project: HBase
  Issue Type: Improvement
Reporter: Dave Revell


The RegionSplitter utility allows users to create a pre-split table from the 
command line or do a rolling split on an existing table. It supports pluggable 
split algorithms that implement the SplitAlgorithm interface. The only/default 
SplitAlgorithm is one that assumes keys fall in the range from ASCII string 
"" to ASCII string "7FFF". This is not a sane default, and seems 
useless to most users. Users are likely to be surprised by the fact that all 
the region splits occur in in the byte range of ASCII characters.

A better default split algorithm would be one that evenly divides the space of 
all bytes, which is what this patch does. Making a table with five regions 
would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
\xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-26 Thread Dave Revell (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Revell reassigned HBASE-4489:
--

Assignee: Dave Revell

> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Reporter: Dave Revell
>Assignee: Dave Revell
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4480) Testing script to simplfy local testing

2011-09-26 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4480:
--

Attachment: runtest.sh

Script that runs multiple tests repeatedly.
Sample usage:
{code}
./runtest.sh 1 TestOperation TestAttributes
{code}

> Testing script to simplfy local testing
> ---
>
> Key: HBASE-4480
> URL: https://issues.apache.org/jira/browse/HBASE-4480
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jesse Yates
>Priority: Minor
>  Labels: test
> Attachments: runtest.sh
>
>
> As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
> http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
> script that would handle more of the finer points of running/checking our 
> test suite.
> This script should:
> (1) Allow people to determine which tests are hanging/taking a long time to 
> run
> (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
> running the whole suite that caused the failure
> (3) Allow people to specify to run just unit tests or also integration tests 
> (essentially wrapping calls to 'maven test' and 'maven verify').
> This script should just be a convenience script - running tests directly from 
> maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4488) Store could miss rows during flush

2011-09-26 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4488:
-

Description: 
While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
critical mistake:
The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.


  was:
While looked at HBASE-4344 I found that my change HBASE-4241 contains a 
critical mistake:
The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.



> Store could miss rows during flush
> --
>
> Key: HBASE-4488
> URL: https://issues.apache.org/jira/browse/HBASE-4488
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4488.txt
>
>
> While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
> critical mistake:
> The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4488) Store could miss rows during flush

2011-09-26 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4488:
-

Attachment: 4488.txt

Simple fix, safe to apply :)

> Store could miss rows during flush
> --
>
> Key: HBASE-4488
> URL: https://issues.apache.org/jira/browse/HBASE-4488
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4488.txt
>
>
> While looked at HBASE-4344 I found that my change HBASE-4241 contains a 
> critical mistake:
> The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-26 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114948#comment-13114948
 ] 

Lars Hofhansl commented on HBASE-4488:
--

Also removes two unused variables and an unused import.

> Store could miss rows during flush
> --
>
> Key: HBASE-4488
> URL: https://issues.apache.org/jira/browse/HBASE-4488
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4488.txt
>
>
> While looked at HBASE-4344 I found that my change HBASE-4241 contains a 
> critical mistake:
> The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-26 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114941#comment-13114941
 ] 

Lars Hofhansl commented on HBASE-4488:
--

The fact that this has not tripped any test (and we have quite a lot of tests 
that flush the store during the test) might indicate that the StoreScanner used 
this way only returns false from next(...) if there are no rows left. 
Regardless, this violates the InternalScanner contract and might cause problems 
in the future.

> Store could miss rows during flush
> --
>
> Key: HBASE-4488
> URL: https://issues.apache.org/jira/browse/HBASE-4488
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.94.0
>
>
> While looked at HBASE-4344 I found that my change HBASE-4241 contains a 
> critical mistake:
> The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4488) Store could miss rows during flush

2011-09-26 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4488:
-

  Description: 
While looked at HBASE-4344 I found that my change HBASE-4241 contains a 
critical mistake:
The while(scanner.next(kvs)) is incorrect and might miss the last edits.


  was:
While looked at HBASE-4344 I found that my change HBASE-4241 contains a 
critical mistake.


Fix Version/s: 0.94.0
   0.92.0

> Store could miss rows during flush
> --
>
> Key: HBASE-4488
> URL: https://issues.apache.org/jira/browse/HBASE-4488
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.94.0
>
>
> While looked at HBASE-4344 I found that my change HBASE-4241 contains a 
> critical mistake:
> The while(scanner.next(kvs)) is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4488) Store could miss rows during flush

2011-09-26 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4488:
-

Description: 
While looked at HBASE-4344 I found that my change HBASE-4241 contains a 
critical mistake:
The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.


  was:
While looked at HBASE-4344 I found that my change HBASE-4241 contains a 
critical mistake:
The while(scanner.next(kvs)) is incorrect and might miss the last edits.



> Store could miss rows during flush
> --
>
> Key: HBASE-4488
> URL: https://issues.apache.org/jira/browse/HBASE-4488
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.94.0
>
>
> While looked at HBASE-4344 I found that my change HBASE-4241 contains a 
> critical mistake:
> The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4488) Store could miss rows during flush

2011-09-26 Thread Lars Hofhansl (Created) (JIRA)
Store could miss rows during flush
--

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Priority: Critical


While looked at HBASE-4344 I found that my change HBASE-4241 contains a 
critical mistake.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114914#comment-13114914
 ] 

Ted Yu commented on HBASE-4131:
---

Minor comment for Replication.java:
{code}
+this.zkHelper = new ReplicationZookeeper(server, this.replicating);
+  } catch (KeeperException ke) {
+throw new IOException("Failed replication handler create", ke);
+  }
{code}
I think the exception message should include this.replicating

This can be done at time of integration.

Nice work Dhruba - looking forward to the follow-on JIRA(s)

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: 4131-backedout.txt, replicationInterface1.txt, 
> replicationInterface2.txt, replicationInterface3.txt, 
> replicationInterface4.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4352) Apply version of hbase-4015 to branch

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114907#comment-13114907
 ] 

Ted Yu commented on HBASE-4352:
---

+1 on latest patch.
No surprise in test suite.

> Apply version of hbase-4015 to branch
> -
>
> Key: HBASE-4352
> URL: https://issues.apache.org/jira/browse/HBASE-4352
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.90.5
>
> Attachments: HBASE-4352_0.90.patch, HBASE-4352_0.90_1.patch
>
>
> Consider adding a version of hbase-4015 to 0.90.  It changes HRegionInterface 
> so would need move change to end of the Interface and then test that it 
> doesn't break rolling restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114900#comment-13114900
 ] 

Ted Yu commented on HBASE-4131:
---

TestHTablePool took 4 minutes on Linux with latest patch and it passed.

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: 4131-backedout.txt, replicationInterface1.txt, 
> replicationInterface2.txt, replicationInterface3.txt, 
> replicationInterface4.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4326) Tests that use HBaseTestingUtility.startMiniCluster(n) should shutdown with HBaseTestingUtility.shutdownMiniCluster.

2011-09-26 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114892#comment-13114892
 ] 

Jonathan Hsieh commented on HBASE-4326:
---

Adding @AfterClass shutdown to TestHLog seems to make the testLogCleaning tests 
hang when it attempts to shutdown.

> Tests that use HBaseTestingUtility.startMiniCluster(n) should shutdown with 
> HBaseTestingUtility.shutdownMiniCluster.
> 
>
> Key: HBASE-4326
> URL: https://issues.apache.org/jira/browse/HBASE-4326
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Jonathan Hsieh
>
> Most tests that use mini clusters use this pattern
> {code}
>  private final static HBaseTestingUtility UTIL = new HBaseTestingUtility();
>   @BeforeClass
>   public static void beforeClass() throws Exception {
> UTIL.startMiniCluster(1);
>   }
>   @AfterClass
>   public static void afterClass() throws IOException {
> UTIL.shutdownMiniCluster();
>   }
> {code}
> Some tests (like hbase-4269)
> {code}
>   @BeforeClass
>   public static void beforeClass() throws Exception {
> UTIL.startMiniCluster(1);
>   }
>   @AfterClass
>   public static void afterClass() throws IOException {
> UTIL.getMiniCluster().shutdown();
> // or UTIL.shutdownMiniHBaseCluster();
> // and likely others.
>   }
> {code}
> There is a difference between the two shutdown -- the former deletes files 
> created during the tests while the latter does not.  This funny state 
> persisting (zk or hbase/mr data) may be the cause of strange inter-testcase 
> problems when full suites are run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4482) Race Condition Concerning Eviction in SlabCache

2011-09-26 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114888#comment-13114888
 ] 

jirapos...@reviews.apache.org commented on HBASE-4482:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2051/#review2071
---



src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java


Can you comment here on the change?  And should this use LOG instead of 
System.out?



src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java


add a reference to the JIRA # in this comment... and break this to two 
lines.



src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabItemEvictionWatcher.java


remove from javadoc



src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


Do we need this?  At the least should be a DEBUG (seems like RS logs will 
be filled with this though, is that intended?)


- Jonathan


On 2011-09-26 19:17:34, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2051/
bq.  ---
bq.  
bq.  (Updated 2011-09-26 19:17:34)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Bugfix, kills a race condition.
bq.  
bq.  Ignore r1, thats the wrong patch.
bq.  
bq.  
bq.  This addresses bug HBASE-4482.
bq.  https://issues.apache.org/jira/browse/HBASE-4482
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java 
3798a06 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java 
fe8b95a 
bq.
src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabItemEvictionWatcher.java
 91b1603 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
0c06f4f 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 
fd9e7ef 
bq.src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java 
0814f41 
bq.
src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSingleSizeCache.java 
e021780 
bq.src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlabCache.java 
8dd5159 
bq.  
bq.  Diff: https://reviews.apache.org/r/2051/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Looped tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> Race Condition Concerning Eviction in SlabCache
> ---
>
> Key: HBASE-4482
> URL: https://issues.apache.org/jira/browse/HBASE-4482
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Li Pi
>Assignee: Li Pi
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: hbase-4482v1.txt, hbase-4482v2.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4482) Race Condition Concerning Eviction in SlabCache

2011-09-26 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114880#comment-13114880
 ] 

jirapos...@reviews.apache.org commented on HBASE-4482:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2051/#review2070
---



src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java


This log would be expensive.


- Ted


On 2011-09-26 19:17:34, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2051/
bq.  ---
bq.  
bq.  (Updated 2011-09-26 19:17:34)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Bugfix, kills a race condition.
bq.  
bq.  Ignore r1, thats the wrong patch.
bq.  
bq.  
bq.  This addresses bug HBASE-4482.
bq.  https://issues.apache.org/jira/browse/HBASE-4482
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java 
3798a06 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java 
fe8b95a 
bq.
src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabItemEvictionWatcher.java
 91b1603 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
0c06f4f 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 
fd9e7ef 
bq.src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java 
0814f41 
bq.
src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSingleSizeCache.java 
e021780 
bq.src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlabCache.java 
8dd5159 
bq.  
bq.  Diff: https://reviews.apache.org/r/2051/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Looped tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> Race Condition Concerning Eviction in SlabCache
> ---
>
> Key: HBASE-4482
> URL: https://issues.apache.org/jira/browse/HBASE-4482
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Li Pi
>Assignee: Li Pi
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: hbase-4482v1.txt, hbase-4482v2.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4482) Race Condition Concerning Eviction in SlabCache

2011-09-26 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114887#comment-13114887
 ] 

jirapos...@reviews.apache.org commented on HBASE-4482:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2051/
---

(Updated 2011-09-26 19:16:53.815702)


Review request for hbase.


Changes
---

patch v3 - ted yu wanted to compare between the two.


Summary
---

Bugfix, kills a race condition.


This addresses bug HBASE-4482.
https://issues.apache.org/jira/browse/HBASE-4482


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java 
3798a06 
  src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java fe8b95a 
  
src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabItemEvictionWatcher.java
 91b1603 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 0c06f4f 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef 
  src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java 0814f41 
  src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSingleSizeCache.java 
e021780 
  src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlabCache.java 
8dd5159 

Diff: https://reviews.apache.org/r/2051/diff


Testing
---

Looped tests.


Thanks,

Li



> Race Condition Concerning Eviction in SlabCache
> ---
>
> Key: HBASE-4482
> URL: https://issues.apache.org/jira/browse/HBASE-4482
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Li Pi
>Assignee: Li Pi
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: hbase-4482v1.txt, hbase-4482v2.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4482) Race Condition Concerning Eviction in SlabCache

2011-09-26 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114885#comment-13114885
 ] 

jirapos...@reviews.apache.org commented on HBASE-4482:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2051/
---

Review request for hbase.


Summary
---

Bugfix, kills a race condition.


This addresses bug HBASE-4482.
https://issues.apache.org/jira/browse/HBASE-4482


Diffs
-

  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fe25a45 

Diff: https://reviews.apache.org/r/2051/diff


Testing
---

Looped tests.


Thanks,

Li



> Race Condition Concerning Eviction in SlabCache
> ---
>
> Key: HBASE-4482
> URL: https://issues.apache.org/jira/browse/HBASE-4482
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Li Pi
>Assignee: Li Pi
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: hbase-4482v1.txt, hbase-4482v2.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4482) Race Condition Concerning Eviction in SlabCache

2011-09-26 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114883#comment-13114883
 ] 

jirapos...@reviews.apache.org commented on HBASE-4482:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2051/
---

(Updated 2011-09-26 19:17:34.041967)


Review request for hbase.


Summary (updated)
---

Bugfix, kills a race condition.

Ignore r1, thats the wrong patch.


This addresses bug HBASE-4482.
https://issues.apache.org/jira/browse/HBASE-4482


Diffs
-

  src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java 
3798a06 
  src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java fe8b95a 
  
src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabItemEvictionWatcher.java
 91b1603 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 0c06f4f 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef 
  src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java 0814f41 
  src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSingleSizeCache.java 
e021780 
  src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlabCache.java 
8dd5159 

Diff: https://reviews.apache.org/r/2051/diff


Testing
---

Looped tests.


Thanks,

Li



> Race Condition Concerning Eviction in SlabCache
> ---
>
> Key: HBASE-4482
> URL: https://issues.apache.org/jira/browse/HBASE-4482
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Li Pi
>Assignee: Li Pi
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: hbase-4482v1.txt, hbase-4482v2.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4482) Race Condition Concerning Eviction in SlabCache

2011-09-26 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114884#comment-13114884
 ] 

jirapos...@reviews.apache.org commented on HBASE-4482:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2051/
---

(Updated 2011-09-26 19:16:31.129009)


Review request for hbase.


Changes
---

patch v2.


Summary
---

Bugfix, kills a race condition.


This addresses bug HBASE-4482.
https://issues.apache.org/jira/browse/HBASE-4482


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java 
3798a06 
  src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java fe8b95a 
  
src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabItemEvictionWatcher.java
 91b1603 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 0c06f4f 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef 
  src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSingleSizeCache.java 
e021780 
  src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlabCache.java 
8dd5159 

Diff: https://reviews.apache.org/r/2051/diff


Testing
---

Looped tests.


Thanks,

Li



> Race Condition Concerning Eviction in SlabCache
> ---
>
> Key: HBASE-4482
> URL: https://issues.apache.org/jira/browse/HBASE-4482
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Li Pi
>Assignee: Li Pi
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: hbase-4482v1.txt, hbase-4482v2.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4482) Race Condition Concerning Eviction in SlabCache

2011-09-26 Thread Li Pi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114843#comment-13114843
 ] 

Li Pi commented on HBASE-4482:
--

Nevermind - this is still broken. Gotta fix something else.

> Race Condition Concerning Eviction in SlabCache
> ---
>
> Key: HBASE-4482
> URL: https://issues.apache.org/jira/browse/HBASE-4482
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Li Pi
>Assignee: Li Pi
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: hbase-4482v1.txt, hbase-4482v2.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers

2011-09-26 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114846#comment-13114846
 ] 

Lars Hofhansl commented on HBASE-4335:
--

@Stack... I was wondering about that too. It is checked in the beginning of 
execute and then was rechecked inside DautherOpener, but without any additional 
locks held (so not sure what additional guarantees we get.)

Does it have to do with the order w.r.t. making zookeeper changes? When 
DaughterOpener is run we already made the zk changes, and that might be reason 
that now we have to go ahead and also write to .META. even if the RegionServer 
was stopped or is stopping... not sure...

To be safe I moved it out and kept it.

@Ted... Let me find out :)


> Splits can create temporary holes in .META. that confuse clients and 
> regionservers
> --
>
> Key: HBASE-4335
> URL: https://issues.apache.org/jira/browse/HBASE-4335
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Joe Pallas
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4335-v2.txt, 4335.txt
>
>
> When a SplitTransaction is performed, three updates are done to .META.:
> 1. The parent region is marked as splitting (and hence offline)
> 2. The first daughter region is added (same start key as parent)
> 3. The second daughter region is added (split key is start key)
> (later, the original parent region is deleted, but that's not important to 
> this discussion)
> Steps 2 and 3 are actually done concurrently by 
> SplitTransaction.DaughterOpener threads.  While the master is notified when a 
> split is complete, the only visibility that clients have is whether the 
> daughter regions have appeared in .META.
> If the second daughter is added to .META. first, then .META. will contain the 
> (offline) parent region followed by the second daughter region.  If the 
> client looks up a key that is greater than (or equal to) the split, the 
> client will find the second daughter region and use it.  If the key is less 
> than the split key, the client will find the parent region and see that it is 
> offline, triggering a retry.
> If the first daughter is added to .META. before the second daughter, there is 
> a window during which .META. has a hole: the first daughter effectively hides 
> the parent region (same start key), but there is no entry for the second 
> daughter.  A region lookup will find the first daughter for all keys in the 
> parent's range, but the first daughter does not include keys at or beyond the 
> split key.
> See HBASE-4333 and HBASE-4334 for details on how this causes problems and 
> suggestions for mitigating this in the client and regionserver.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-4482) Race Condition Concerning Eviction in SlabCache

2011-09-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114824#comment-13114824
 ] 

Ted Yu edited comment on HBASE-4482 at 9/26/11 5:38 PM:


Ran tests overnight, no looping detected. Put
{noformat}
while (backingStore.putIfAbsent(blockName, scache) != null) {
  int i = 0;
  Thread.yield();
  i++;
  if (i > 1000){
System.out.println("TEST FAILING AT REMOVE");
  }
}
{noformat}
in the two while loops.

Will put out patch v3 in a moment.

  was (Author: li):
Ran tests overnight, no looping detected. Putwhile 
(backingStore.putIfAbsent(blockName, scache) != null) {
  int i = 0;
  Thread.yield();
  i++;
  if (i > 1000){
System.out.println("TEST FAILING AT REMOVE");
  }
}
 
in the two while loops.

Will put out patch v3 in a moment.
  
> Race Condition Concerning Eviction in SlabCache
> ---
>
> Key: HBASE-4482
> URL: https://issues.apache.org/jira/browse/HBASE-4482
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Li Pi
>Assignee: Li Pi
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: hbase-4482v1.txt, hbase-4482v2.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers

2011-09-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114829#comment-13114829
 ] 

Ted Yu commented on HBASE-4335:
---

Do all unit tests pass ? :-)

> Splits can create temporary holes in .META. that confuse clients and 
> regionservers
> --
>
> Key: HBASE-4335
> URL: https://issues.apache.org/jira/browse/HBASE-4335
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Joe Pallas
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4335-v2.txt, 4335.txt
>
>
> When a SplitTransaction is performed, three updates are done to .META.:
> 1. The parent region is marked as splitting (and hence offline)
> 2. The first daughter region is added (same start key as parent)
> 3. The second daughter region is added (split key is start key)
> (later, the original parent region is deleted, but that's not important to 
> this discussion)
> Steps 2 and 3 are actually done concurrently by 
> SplitTransaction.DaughterOpener threads.  While the master is notified when a 
> split is complete, the only visibility that clients have is whether the 
> daughter regions have appeared in .META.
> If the second daughter is added to .META. first, then .META. will contain the 
> (offline) parent region followed by the second daughter region.  If the 
> client looks up a key that is greater than (or equal to) the split, the 
> client will find the second daughter region and use it.  If the key is less 
> than the split key, the client will find the parent region and see that it is 
> offline, triggering a retry.
> If the first daughter is added to .META. before the second daughter, there is 
> a window during which .META. has a hole: the first daughter effectively hides 
> the parent region (same start key), but there is no entry for the second 
> daughter.  A region lookup will find the first daughter for all keys in the 
> parent's range, but the first daughter does not include keys at or beyond the 
> split key.
> See HBASE-4333 and HBASE-4334 for details on how this causes problems and 
> suggestions for mitigating this in the client and regionserver.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >