date:20110923


[ 
https://issues.apache.org/jira/browse/HBASE-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113906#comment-13113906
 ] 

Hudson commented on HBASE-4468:
---

Integrated in HBase-TRUNK #2247 (See 
[https://builds.apache.org/job/HBase-TRUNK/2247/])
HBASE-4468 Wrong resource name in an error massage: webapps instead of 
hbase-webapps

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/InfoServer.java


> Wrong resource name in an error massage: webapps instead of hbase-webapps
> -
>
> Key: HBASE-4468
> URL: https://issues.apache.org/jira/browse/HBASE-4468
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Trivial
> Fix For: 0.92.0
>
> Attachments: 20110923_4468_InfoServer.patch
>
>
> org.apache.hadoop.hbase.util.InfoServer loads a resource in 'hbase-webapps' 
> but displays a message about 'webapps' when it does not find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4472) MiniHBaseCluster.shutdown() doesn't work if no active master


[ 
https://issues.apache.org/jira/browse/HBASE-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113904#comment-13113904
 ] 

Hudson commented on HBASE-4472:
---

Integrated in HBase-TRUNK #2247 (See 
[https://builds.apache.org/job/HBase-TRUNK/2247/])
HBASE-4472  MiniHBaseCluster.shutdown() doesn't work if no active master

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java


> MiniHBaseCluster.shutdown() doesn't work if no active master
> 
>
> Key: HBASE-4472
> URL: https://issues.apache.org/jira/browse/HBASE-4472
> Project: HBase
>  Issue Type: Bug
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Fix For: 0.92.0
>
> Attachments: HBASE-4472.delete-zk-cluster-node-on-master-abort.patch, 
> HBASE-4472.patch, jstack.txt
>
>
> Running tests over in HBASE-4014 brought up this issue.  If the active master 
> in a MiniHBaseCluster has aborted, then calling MiniHBaseCluster.shutdown() 
> will just hang in JVMClusterUtil.shutdown(), waiting to join each of the 
> region server threads.
> Seems like we should explicitly stop each region server instead of just 
> relying on an active master instance deleting the cluster status znode?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4280) [replication] ReplicationSink can deadlock itself via handlers


[ 
https://issues.apache.org/jira/browse/HBASE-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113905#comment-13113905
 ] 

Hudson commented on HBASE-4280:
---

Integrated in HBase-TRUNK #2247 (See 
[https://builds.apache.org/job/HBase-TRUNK/2247/])
HBASE-4280  [replication] ReplicationSink can deadlock itself via handlers

jdcryans : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


> [replication] ReplicationSink can deadlock itself via handlers
> --
>
> Key: HBASE-4280
> URL: https://issues.apache.org/jira/browse/HBASE-4280
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.90.5
>
> Attachments: HBASE-4280-0.90.patch
>
>
> I've experienced this problem a few times, ReplicationSink calls are received 
> through the normal handlers and potentially can call itself which, in certain 
> situations, call fill up all the handlers. For example, 10 handlers that are 
> all replication calls are all trying to talk to the local server at the same 
> time.
> HRS.replicateLogEntries should have @QosPriority(priority=HIGH_QOS) to use 
> the other set of handlers.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4434) seek optimization: don't do eager HFile Scanner next() unless the next KV is needed


[ 
https://issues.apache.org/jira/browse/HBASE-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113903#comment-13113903
 ] 

Hudson commented on HBASE-4434:
---

Integrated in HBase-TRUNK #2247 (See 
[https://builds.apache.org/job/HBase-TRUNK/2247/])
HBASE-4434 seek optimization: don't do eager HFile Scanner next() unless 
the next KV is needed

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java


> seek optimization: don't do eager HFile Scanner next() unless the next KV is 
> needed
> ---
>
> Key: HBASE-4434
> URL: https://issues.apache.org/jira/browse/HBASE-4434
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kannan Muthukkaruppan
>Assignee: Kannan Muthukkaruppan
> Fix For: 0.92.0
>
> Attachments: HBASE-4434.txt
>
>
> When a seek/reseek is done on StoreFileScanner, in addition to setting the 
> current KV, it also does a HFileScanner level next() ahead of time even if 
> the next KV is never actually required. This inefficiency can potentially 
> result in additional disk seeks and sub-optimal use of the block cache 
> (primarily for cases where the KVs are large and each occupies an HFile block 
> of its own).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms


[ 
https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113901#comment-13113901
 ] 

Hudson commented on HBASE-4449:
---

Integrated in HBase-TRUNK #2247 (See 
[https://builds.apache.org/job/HBase-TRUNK/2247/])
Added comment for clarity while reading code to review HBASE-4449

nspiegelberg : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java


> LoadIncrementalHFiles should be able to handle CFs with blooms
> --
>
> Key: HBASE-4449
> URL: https://issues.apache.org/jira/browse/HBASE-4449
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Fix For: 0.90.5
>
> Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, 
> HBASE-4449.patch
>
>
> When LoadIncrementalHFiles loads a store file that crosses region boundaries, 
> it will split the file at the boundary to create two store files. If the 
> store file is for a column family that has a bloom filter, then a 
> "java.lang.ArithmeticException: / by zero" will be raised because 
> ByteBloomFilter() is called with maxKeys of 0.
> The included patch assumes that the number of keys in each split child will 
> be equal to the number of keys in the parent's bloom filter (instead of 0). 
> This is an overestimate, but it's safe and easy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition


[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113902#comment-13113902
 ] 

Hudson commented on HBASE-4131:
---

Integrated in HBase-TRUNK #2247 (See 
[https://builds.apache.org/job/HBase-TRUNK/2247/])
HBASE-4131 Make the Replication Service pluggable via a standard interface 
definition; BACKED IT OUT -- WAS CAUSING TestReplication failures

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ReplicationService.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ReplicationSinkService.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ReplicationSourceService.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java


> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: 4131-backedout.txt, replicationInterface1.txt, 
> replicationInterface2.txt, replicationInterface3.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-4132) Extend the WALActionsListener API to accomodate log archival


[ 
https://issues.apache.org/jira/browse/HBASE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113895#comment-13113895
 ] 

Ted Yu edited comment on HBASE-4132 at 9/24/11 5:25 AM:


Here is the addendum.
TestLogRolling#testLogRollOnPipelineRestart passed on my MacBook.
Let me see if the addendum helps on Jenkins.

  was (Author: yuzhih...@gmail.com):
Here is the addendum.
It seems that number of times preLogRolled is called is not fixed.
  
> Extend the WALActionsListener API to accomodate log archival
> 
>
> Key: HBASE-4132
> URL: https://issues.apache.org/jira/browse/HBASE-4132
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: 4132.addendum, walArchive.txt, walArchive2.txt, 
> walArchive3.txt
>
>
> The WALObserver interface exposes the log roll events. It would be nice to 
> extend it to accomodate log archival events as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4132) Extend the WALActionsListener API to accomodate log archival


 [ 
https://issues.apache.org/jira/browse/HBASE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4132:
--

Attachment: 4132.addendum

Here is the addendum.
It seems that number of times preLogRolled is called is not fixed.

> Extend the WALActionsListener API to accomodate log archival
> 
>
> Key: HBASE-4132
> URL: https://issues.apache.org/jira/browse/HBASE-4132
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: 4132.addendum, walArchive.txt, walArchive2.txt, 
> walArchive3.txt
>
>
> The WALObserver interface exposes the log roll events. It would be nice to 
> extend it to accomodate log archival events as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4478) Improve AssignmentManager.handleRegion so that it can process certain ZK state in the case of RS offline

2011-09-23 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113894#comment-13113894
 ] 

ramkrishna.s.vasudevan commented on HBASE-4478:
---

Even i had faced some problems due to this.  +1 for this JIRA

> Improve AssignmentManager.handleRegion so that it can process certain ZK 
> state in the case of RS offline
> 
>
> Key: HBASE-4478
> URL: https://issues.apache.org/jira/browse/HBASE-4478
> Project: HBase
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
>
> Currently AssignmentManager.handleRegion skips processing of ZK event change 
> if the RS is offline. It relies on TimeoutMonitor and ServerShutdownHandler 
> to process RIT.
>   // Verify this is a known server
>   if (!serverManager.isServerOnline(sn) &&
>   !this.master.getServerName().equals(sn)) {
> LOG.warn("Attempted to handle region transition for server but " +
>   "server is not online: " + Bytes.toString(data.getRegionName()));
> return;
>   }
> For certain states like OPENED, OPENING, FAILED_OPEN, CLOSED, it can continue 
> the progressing even if the RS is offline.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4132) Extend the WALActionsListener API to accomodate log archival


[ 
https://issues.apache.org/jira/browse/HBASE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113893#comment-13113893
 ] 

Ted Yu commented on HBASE-4132:
---

Running TestLogRolling#testLogRollOnPipelineRestart in debugger, I got:
{code}
java.lang.AssertionError: preLogRolledCalled has size of 3
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnPipelineRestart(TestLogRolling.java:311)
{code}

The first two calls were:
{code}
TestLogRolling$1.preLogRoll(Path, Path) line: 245
HLog.rollWriter(boolean) line: 564  
LogRoller.run() line: 93
{code}
The third one was:
{code}
TestLogRolling$1.preLogRoll(Path, Path) line: 245   
HLog.rollWriter(boolean) line: 564  
TestLogRolling.testLogRollOnPipelineRestart() line: 310 
{code}

> Extend the WALActionsListener API to accomodate log archival
> 
>
> Key: HBASE-4132
> URL: https://issues.apache.org/jira/browse/HBASE-4132
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: walArchive.txt, walArchive2.txt, walArchive3.txt
>
>
> The WALObserver interface exposes the log roll events. It would be nice to 
> extend it to accomodate log archival events as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4478) Improve AssignmentManager.handleRegion so that it can process certain ZK state in the case of RS offline

2011-09-23 Thread Ming Ma (JIRA)

Improve AssignmentManager.handleRegion so that it can process certain ZK state 
in the case of RS offline


 Key: HBASE-4478
 URL: https://issues.apache.org/jira/browse/HBASE-4478
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma


Currently AssignmentManager.handleRegion skips processing of ZK event change if 
the RS is offline. It relies on TimeoutMonitor and ServerShutdownHandler to 
process RIT.

  // Verify this is a known server
  if (!serverManager.isServerOnline(sn) &&
  !this.master.getServerName().equals(sn)) {
LOG.warn("Attempted to handle region transition for server but " +
  "server is not online: " + Bytes.toString(data.getRegionName()));
return;
  }

For certain states like OPENED, OPENING, FAILED_OPEN, CLOSED, it can continue 
the progressing even if the RS is offline.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4014) Coprocessors: Flag the presence of coprocessors in logged exceptions


[ 
https://issues.apache.org/jira/browse/HBASE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113888#comment-13113888
 ] 

Ted Yu commented on HBASE-4014:
---

>From Eugene:
Every test passed except TestLogRolling:
http://pastebin.com/wcsRnnhW

TestLogRolling.testLogRollOnPipelineRestart failed in TRUNK build 2246.

So there was no regression.

+1 on latest patch.

> Coprocessors: Flag the presence of coprocessors in logged exceptions
> 
>
> Key: HBASE-4014
> URL: https://issues.apache.org/jira/browse/HBASE-4014
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Reporter: Andrew Purtell
>Assignee: Eugene Koontz
> Fix For: 0.92.0
>
> Attachments: HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch, 
> HBASE-4014.patch, HBASE-4014.patch
>
>
> For some initial triage of bug reports for core versus for deployments with 
> loaded coprocessors, we need something like the Linux kernel's taint flag, 
> and list of linked in modules that show up in the output of every OOPS, to 
> appear above or below exceptions that appear in the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager

2011-09-23 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113887#comment-13113887
 ] 

jirapos...@reviews.apache.org commented on HBASE-4455:
--



bq.  On 2011-09-23 08:17:29, Jonathan Gray wrote:
bq.  > Great stuff!  I have some questions throughout but seems like this will 
make everything more resilient to root/meta servers failing.  Is the general 
approach to always verify / always check rather than relying on cached 
locations or values?
bq.  > 
bq.  > Have you thought about any ways that we could add some better unit tests 
around this stuff?  There's a TestRollingRestart that is obviously not good 
enough :)

The repro of such bug depends on timing of events. Initially I thought perhaps 
we can inject timeout into various places in the code. At this point, it is 
easier to just do the testing on a small cluster and eventually the bug will 
appear. Perhaps something we can work on later.


bq.  On 2011-09-23 08:17:29, Jonathan Gray wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java,
 line 309
bq.  > 
bq.  >
bq.  > why log the cached META server here?  didn't we just verify that it 
was not valid?

Fixed.


bq.  On 2011-09-23 08:17:29, Jonathan Gray wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java,
 line 310
bq.  > 
bq.  >
bq.  > why log the cached meta location here?  it might be confusing since 
it doesn't log that we just found this meta location was invalid

Fixed.


bq.  On 2011-09-23 08:17:29, Jonathan Gray wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java,
 line 2532
bq.  > 
bq.  >
bq.  > add another * here, so: /**
bq.  > 
bq.  > that ensure this gets picked up as javadoc

Fixed.


bq.  On 2011-09-23 08:17:29, Jonathan Gray wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java,
 line 2558
bq.  > 
bq.  >
bq.  > this looks like a random debug statement, what does matchZK, sn: 
server mean?

Fixed.


bq.  On 2011-09-23 08:17:29, Jonathan Gray wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java,
 lines 62-69
bq.  > 
bq.  >
bq.  > why this change?  should this be rolled into the ZKNodeTracker 
rather than overriding the getData() behavior?

Fixed.


bq.  On 2011-09-23 08:17:29, Jonathan Gray wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java,
 line 90
bq.  > 
bq.  >
bq.  > it seems like you're covering up for bugs in the underlying 
ZKNodeTracker... can we fix that instead?  or if it's a matter of returning a 
cached value or not, can we just add a boolean flag for refresh/nocache?

Fixed.


bq.  On 2011-09-23 08:17:29, Jonathan Gray wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java,
 line 294
bq.  > 
bq.  >
bq.  > so we always verify the connection now?

Before the fix, all callers set it to "true". So there is no behavior change.


bq.  On 2011-09-23 08:17:29, Jonathan Gray wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java,
 line 368
bq.  > 
bq.  >
bq.  > why do we have two hard-coded timeouts in this area of code? :)
bq.  > 
bq.  > this code seems to always sleep 500ms at a time unless you set 
timeout=0 and then it loops every 50ms?  that doesn't seem right... i could set 
timeout to 100ms and it would sleep 500ms.  sleeping 50ms every time would be 
better but there's probably a solution with less overhead (doing remote read 
queries every 50ms in a loop)
bq.  > 
bq.  > could we just notifyAll() on metaAvailable whenever we relocate root?

Choose the min(500ms, timeout) at this point, given we might do more code 
cleanup around RootRegionTracker later. 


bq.  On 2011-09-23 08:17:29, Jonathan Gray wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java,
 line 215
bq

[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager

2011-09-23 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113886#comment-13113886
 ] 

jirapos...@reviews.apache.org commented on HBASE-4455:
--



bq.  On 2011-09-23 05:44:11, Michael Stack wrote:
bq.  > Really good stuff.  I just need a bit of it explained to me below 
because I'm being a little slow.  I think you've also nailed HBASE-3809 with 
this patch Ming (What you think?)  Good stuff.

That sounds right. HBASE-4245 could be the same as well.


bq.  On 2011-09-23 05:44:11, Michael Stack wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java,
 line 62
bq.  > 
bq.  >
bq.  > Whats this?

Fixed.


bq.  On 2011-09-23 05:44:11, Michael Stack wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java,
 line 299
bq.  > 
bq.  >
bq.  > Why this change? If we did not ask to refresh, why not return what 
we found?

No callers call the function with refresh==false, took it out for now.


bq.  On 2011-09-23 05:44:11, Michael Stack wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java,
 line 222
bq.  > 
bq.  >
bq.  > Why make this public?

Fixed.


bq.  On 2011-09-23 05:44:11, Michael Stack wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java,
 line 535
bq.  > 
bq.  >
bq.  > nit: No need of the parens

Fixed.


bq.  On 2011-09-23 05:44:11, Michael Stack wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java,
 line 215
bq.  > 
bq.  >
bq.  > Oh, here I see the don't split logs flag.
bq.  > 
bq.  > Funny.
bq.  > 
bq.  > We used to do something like this in the old days.
bq.  > 
bq.  > So, we come back in again, we don't split logs, but we are still a 
server that was carrying root or meta -- we reassign again?  I don't get how it 
works here.

If the server carries -ROOT- or .META, it will first do the log splitting and 
then submit another ServerShutDownHandler request without 
shouldSplitHLog==false.


- Ming


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2007/#review2035
---


On 2011-09-24 01:50:02, Ming Ma wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2007/
bq.  ---
bq.  
bq.  (Updated 2011-09-24 01:50:02)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  1. Add more logging.
bq.  2. Clean up CatalogTracker. waitForMeta waits for "timeout" value. When 
waitForMetaServerConnectionDefault is called by MetaNodeTracker, the timeout 
value is large. So it doesn't retry in case .ROOT. is updated; add the proper 
implementation for CatalogTracker.verifyMetaRegionLocation
bq.  4. Check for the latest -ROOT- and .META. region location during the 
handling of server shutdown.
bq.  5. Right after assigning the -ROOT- or .META. in ServerShutdownHandler, 
don't block and wait for .META. availability. Resubmit another 
ServerShutdownHandler for regular regions.
bq.  
bq.  
bq.  This addresses bug HBASE-4455.
bq.  https://issues.apache.org/jira/browse/HBASE-4455
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hb

[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager

2011-09-23 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113884#comment-13113884
 ] 

jirapos...@reviews.apache.org commented on HBASE-4455:
--

bq.  On 2011-09-23 03:02:39, Ted Yu wrote:
bq.  > 
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java,
 line 2555
bq.  > 
bq.  >
bq.  > addressFromZK != null can be omitted - it is the condition of if 
block.

Fixed.

- Ming

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2007/#review2033
---

On 2011-09-24 01:50:02, Ming Ma wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2007/
bq.  ---
bq.  
bq.  (Updated 2011-09-24 01:50:02)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  1. Add more logging.
bq.  2. Clean up CatalogTracker. waitForMeta waits for "timeout" value. When 
waitForMetaServerConnectionDefault is called by MetaNodeTracker, the timeout 
value is large. So it doesn't retry in case .ROOT. is updated; add the proper 
implementation for CatalogTracker.verifyMetaRegionLocation
bq.  4. Check for the latest -ROOT- and .META. region location during the 
handling of server shutdown.
bq.  5. Right after assigning the -ROOT- or .META. in ServerShutdownHandler, 
don't block and wait for .META. availability. Resubmit another 
ServerShutdownHandler for regular regions.
bq.  
bq.  
bq.  This addresses bug HBASE-4455.
bq.  https://issues.apache.org/jira/browse/HBASE-4455
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/MasterAddressTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
 1172205 
bq.  
bq.  Diff: https://reviews.apache.org/r/2007/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Keep Master up all the time, do rolling restart of RSs like this - stop 
RS1, wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, 
start RS2, wait for 2 seconds, etc. The program can run for couple hours until 
it stops. -ROOT- and .META. are available during that time.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ming
bq.  
bq.

> Rolling restart RSs scenario, -ROOT-, .META. regions are lost in 
> AssignmentManager
> --
>
> Key: HBASE-4455
> URL: https://issues.apache.org/jira/browse/HBASE-4455
> Project: HBase
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0
>
>
> Keep Master up all the time,

[jira] [Commented] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms


[ 
https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113882#comment-13113882
 ] 

Ted Yu commented on HBASE-4449:
---

See my comment @ 20/Sep/11 21:54

> LoadIncrementalHFiles should be able to handle CFs with blooms
> --
>
> Key: HBASE-4449
> URL: https://issues.apache.org/jira/browse/HBASE-4449
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Fix For: 0.90.5
>
> Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, 
> HBASE-4449.patch
>
>
> When LoadIncrementalHFiles loads a store file that crosses region boundaries, 
> it will split the file at the boundary to create two store files. If the 
> store file is for a column family that has a bloom filter, then a 
> "java.lang.ArithmeticException: / by zero" will be raised because 
> ByteBloomFilter() is called with maxKeys of 0.
> The included patch assumes that the number of keys in each split child will 
> be equal to the number of keys in the parent's bloom filter (instead of 0). 
> This is an overestimate, but it's safe and easy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4434) seek optimization: don't do eager HFile Scanner next() unless the next KV is needed


[ 
https://issues.apache.org/jira/browse/HBASE-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113877#comment-13113877
 ] 

Hudson commented on HBASE-4434:
---

Integrated in HBase-0.92 #18 (See 
[https://builds.apache.org/job/HBase-0.92/18/])
HBASE-4434: Don't do HFile Scanner next() unless the next KV is needed

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java


> seek optimization: don't do eager HFile Scanner next() unless the next KV is 
> needed
> ---
>
> Key: HBASE-4434
> URL: https://issues.apache.org/jira/browse/HBASE-4434
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kannan Muthukkaruppan
>Assignee: Kannan Muthukkaruppan
> Fix For: 0.92.0
>
> Attachments: HBASE-4434.txt
>
>
> When a seek/reseek is done on StoreFileScanner, in addition to setting the 
> current KV, it also does a HFileScanner level next() ahead of time even if 
> the next KV is never actually required. This inefficiency can potentially 
> result in additional disk seeks and sub-optimal use of the block cache 
> (primarily for cases where the KVs are large and each occupies an HFile block 
> of its own).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4280) [replication] ReplicationSink can deadlock itself via handlers


[ 
https://issues.apache.org/jira/browse/HBASE-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113879#comment-13113879
 ] 

Hudson commented on HBASE-4280:
---

Integrated in HBase-0.92 #18 (See 
[https://builds.apache.org/job/HBase-0.92/18/])
HBASE-4280  [replication] ReplicationSink can deadlock itself via handlers

jdcryans : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


> [replication] ReplicationSink can deadlock itself via handlers
> --
>
> Key: HBASE-4280
> URL: https://issues.apache.org/jira/browse/HBASE-4280
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.90.5
>
> Attachments: HBASE-4280-0.90.patch
>
>
> I've experienced this problem a few times, ReplicationSink calls are received 
> through the normal handlers and potentially can call itself which, in certain 
> situations, call fill up all the handlers. For example, 10 handlers that are 
> all replication calls are all trying to talk to the local server at the same 
> time.
> HRS.replicateLogEntries should have @QosPriority(priority=HIGH_QOS) to use 
> the other set of handlers.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4450) test for number of blocks read: to serve as baseline for expected blocks read and for catching regressions


[ 
https://issues.apache.org/jira/browse/HBASE-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113876#comment-13113876
 ] 

Hudson commented on HBASE-4450:
---

Integrated in HBase-0.92 #18 (See 
[https://builds.apache.org/job/HBase-0.92/18/])
HBASE-4450 test for number of blocks read: to serve as baseline for 
expected blocks read and for catching regressions

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java


> test for number of blocks read: to serve as baseline for expected blocks read 
> and for catching regressions
> --
>
> Key: HBASE-4450
> URL: https://issues.apache.org/jira/browse/HBASE-4450
> Project: HBase
>  Issue Type: Test
>Reporter: Kannan Muthukkaruppan
>Assignee: Kannan Muthukkaruppan
> Attachments: HBASE-4450.txt
>
>
> Add a simple test for number of blocks read. The tests intent is to serve as 
> baseline for expected blocks read and for catching regressions. As 
> optimizations for HBase-4433 or Hbase-4434 are committed, the test would need 
> to be updated to adjust the counts for expected blocks read in various cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4472) MiniHBaseCluster.shutdown() doesn't work if no active master


[ 
https://issues.apache.org/jira/browse/HBASE-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113878#comment-13113878
 ] 

Hudson commented on HBASE-4472:
---

Integrated in HBase-0.92 #18 (See 
[https://builds.apache.org/job/HBase-0.92/18/])
HBASE-4472  MiniHBaseCluster.shutdown() doesn't work if no active master

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java


> MiniHBaseCluster.shutdown() doesn't work if no active master
> 
>
> Key: HBASE-4472
> URL: https://issues.apache.org/jira/browse/HBASE-4472
> Project: HBase
>  Issue Type: Bug
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Fix For: 0.92.0
>
> Attachments: HBASE-4472.delete-zk-cluster-node-on-master-abort.patch, 
> HBASE-4472.patch, jstack.txt
>
>
> Running tests over in HBASE-4014 brought up this issue.  If the active master 
> in a MiniHBaseCluster has aborted, then calling MiniHBaseCluster.shutdown() 
> will just hang in JVMClusterUtil.shutdown(), waiting to join each of the 
> region server threads.
> Seems like we should explicitly stop each region server instead of just 
> relying on an active master instance deleting the cluster status znode?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4468) Wrong resource name in an error massage: webapps instead of hbase-webapps


[ 
https://issues.apache.org/jira/browse/HBASE-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113880#comment-13113880
 ] 

Hudson commented on HBASE-4468:
---

Integrated in HBase-0.92 #18 (See 
[https://builds.apache.org/job/HBase-0.92/18/])
HBASE-4468 Wrong resource name in an error massage: webapps instead of 
hbase-webapps

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/InfoServer.java


> Wrong resource name in an error massage: webapps instead of hbase-webapps
> -
>
> Key: HBASE-4468
> URL: https://issues.apache.org/jira/browse/HBASE-4468
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Trivial
> Fix For: 0.92.0
>
> Attachments: 20110923_4468_InfoServer.patch
>
>
> org.apache.hadoop.hbase.util.InfoServer loads a resource in 'hbase-webapps' 
> but displays a message about 'webapps' when it does not find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms

2011-09-23 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113872#comment-13113872
 ] 

Jonathan Gray commented on HBASE-4449:
--

It looks like the test change was committed but not the change to 
LoadIncrementalHFiles?

> LoadIncrementalHFiles should be able to handle CFs with blooms
> --
>
> Key: HBASE-4449
> URL: https://issues.apache.org/jira/browse/HBASE-4449
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Fix For: 0.90.5
>
> Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, 
> HBASE-4449.patch
>
>
> When LoadIncrementalHFiles loads a store file that crosses region boundaries, 
> it will split the file at the boundary to create two store files. If the 
> store file is for a column family that has a bloom filter, then a 
> "java.lang.ArithmeticException: / by zero" will be raised because 
> ByteBloomFilter() is called with maxKeys of 0.
> The included patch assumes that the number of keys in each split child will 
> be equal to the number of keys in the parent's bloom filter (instead of 0). 
> This is an overestimate, but it's safe and easy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms

2011-09-23 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113870#comment-13113870
 ] 

Jonathan Gray commented on HBASE-4449:
--

Is this done now?

> LoadIncrementalHFiles should be able to handle CFs with blooms
> --
>
> Key: HBASE-4449
> URL: https://issues.apache.org/jira/browse/HBASE-4449
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Fix For: 0.90.5
>
> Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, 
> HBASE-4449.patch
>
>
> When LoadIncrementalHFiles loads a store file that crosses region boundaries, 
> it will split the file at the boundary to create two store files. If the 
> store file is for a column family that has a bloom filter, then a 
> "java.lang.ArithmeticException: / by zero" will be raised because 
> ByteBloomFilter() is called with maxKeys of 0.
> The included patch assumes that the number of keys in each split child will 
> be equal to the number of keys in the parent's bloom filter (instead of 0). 
> This is an overestimate, but it's safe and easy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager

2011-09-23 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113869#comment-13113869
]

jirapos...@reviews.apache.org commented on HBASE-4455:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2007/
---

(Updated 2011-09-24 01:50:02.986731)

Review request for hbase.

Changes
---

Thanks folks for the review. I have fixes most of them. The below comments
explain why the rests aren't fixed and some general questions asked.

1. I wondered if RootRegionTracker and MetaNodeTracker are really needed.
Instead of waiting for ZK notification, checking with ZK directly should be ok.
This won't have much impact on performance given most of the time where there
isn't much regions movement. For now, we can keep RootRegionTracker and
MetaNodeTracker. We can open a separate jira if that is needed.
2. Took out "refresh" parameter in CatalogTracker.getMetaServerConnection. All
the callers of this function call the function will "true". So at this point, I
just took it out.
3. Per Jonathan suggestions, modified ZookeeperNodeTracker to add refresh flag
to getData, blockUntilAvailable.
4. About the question from Stack, yes, it looks like the same as 3809. it could
be the same as 4245.
5. I put a more detailed description in ServerShutdownHandler.java about why we
need to resubmit another ServerShutdownHandler request back to thread pool if
the server carries -ROOT- or .META.
6. Regarding Jonathan's suggestion about relying on notifyAll() from -ROOT-
inside waitForMeta, I just fixed the timeout value issue instead, in case later
we decide RootRegionTracker isn't that useful.
7. Regarding Stack's HLogSplitting question, if the shutdown server carries
-ROOT- or .META., it will first do HLogSplitting, and then resubmit another
ServerShutdownHandler request for the same server which doesn't do
HLogSplitting.

Summary
---

1. Add more logging.
2. Clean up CatalogTracker. waitForMeta waits for "timeout" value. When
waitForMetaServerConnectionDefault is called by MetaNodeTracker, the timeout
value is large. So it doesn't retry in case .ROOT. is updated; add the proper
implementation for CatalogTracker.verifyMetaRegionLocation
4. Check for the latest -ROOT- and .META. region location during the handling
of server shutdown.
5. Right after assigning the -ROOT- or .META. in ServerShutdownHandler, don't
block and wait for .META. availability. Resubmit another ServerShutdownHandler
for regular regions.

This addresses bug HBASE-4455.
https://issues.apache.org/jira/browse/HBASE-4455

Diffs (updated)
-

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
1172205

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
1172205

http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java
1172205

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
1172205

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
1172205

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java
1172205

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
1172205

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
1172205

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java
1172205

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
1172205

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
1172205

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
1172205

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
1172205

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/MasterAddressTracker.java
1172205

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
1172205

Diff: https://reviews.apache.org/r/2007/diff

Testing
---

Keep Master up all the time, do rolling restart of RSs like this - stop RS1,
wait for 2 seconds, stop RS2, sta

[jira] [Commented] (HBASE-4132) Extend the WALActionsListener API to accomodate log archival


[ 
https://issues.apache.org/jira/browse/HBASE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113866#comment-13113866
 ] 

Lars Hofhansl commented on HBASE-4132:
--

Any plan to also add this to org.apache.hadoop.hbase.coprocessor.WALObserver?
I'm happy to prepare a patch for that, maybe in a different jira.


> Extend the WALActionsListener API to accomodate log archival
> 
>
> Key: HBASE-4132
> URL: https://issues.apache.org/jira/browse/HBASE-4132
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: walArchive.txt, walArchive2.txt, walArchive3.txt
>
>
> The WALObserver interface exposes the log roll events. It would be nice to 
> extend it to accomodate log archival events as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4439) Move ClientScanner out of HTable


 [ 
https://issues.apache.org/jira/browse/HBASE-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4439:
-

Attachment: 4439-v1.txt

ClientScanner also copies the passed Scan now. Not pretty, but since this is 
going to do at least one roundtrip to HBase it should not be a problem.

iterator() and next(int) moved to an abstract helper class (that one makes not 
assumption about transport, RPC, etc).

> Move ClientScanner out of HTable
> 
>
> Key: HBASE-4439
> URL: https://issues.apache.org/jira/browse/HBASE-4439
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.94.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 4439-v1.txt, 4439.txt
>
>
> See HBASE-1935 for motivation.
> ClientScanner should be able to exist outside of HTable.
> While we're at it, we can also add an abstract client scanner to easy 
> development of new client side scanners (such as parallel scanners, or per 
> region scanners).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4209) The HBase hbase-daemon.sh SIGKILLs master when stopping it

2011-09-23 Thread Roman Shaposhnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113859#comment-13113859
 ] 

Roman Shaposhnik commented on HBASE-4209:
-

@stack

> At least log the exception.
> What is the exception we are suppressing?  Should we at least log it here too?

Basically, the only root cause of possible exceptions here is the code in 
suppressHdfsShutdownHook() (which is quite fragile and overly aggressive in
throwing exceptions if things don't look quite right). Now, if you look at 
how this code is executed in a non-standalone case -- HBase does bail on 
exceptions 
thrown from there. We can adopt the same approach in a standalone and unit 
testing
case, but I wasn't sure it was the right thing to do. After all, NOT bailing out
(and lets say simply logging these things) will be no worse than what we 
currently
have (not calling shutdown hooks at all). On the other hand -- if we do bail out
aggressively we get a better chance of catching incompatibilities between 
suppressHdfsShutdownHook() logic and future hadoop releases during unit tests.

So perhaps -- bailing out would make the most sense of all.

> The HBase hbase-daemon.sh SIGKILLs master when stopping it
> --
>
> Key: HBASE-4209
> URL: https://issues.apache.org/jira/browse/HBASE-4209
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
> Attachments: HBASE-4209.patch.txt
>
>
> There's a bit of code in hbase-daemon.sh that makes HBase master being 
> SIGKILLed when stopping it rather than trying SIGTERM (like it does for other 
> daemons). When HBase is executed in a standalone mode (and the only daemon 
> you need to run is master) that causes newly created tables to go missing as 
> unflushed data is thrown out. If there was not a good reason to kill master 
> with SIGKILL perhaps we can take that special case out and rely on SIGTERM.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4439) Move ClientScanner out of HTable


[ 
https://issues.apache.org/jira/browse/HBASE-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113858#comment-13113858
 ] 

Lars Hofhansl commented on HBASE-4439:
--

Hmm... The new ClientScanner still modifies the passed Scan, so somebody using 
ClientScanner directly would still get Scan modified. In order to avoid this, 
Scan would have to be copied twice (since HTable also needs to copy it, because 
it sets its caching on it, at least until we can remove {get|set}ScannerCaching 
from HTable).


> Move ClientScanner out of HTable
> 
>
> Key: HBASE-4439
> URL: https://issues.apache.org/jira/browse/HBASE-4439
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.94.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 4439.txt
>
>
> See HBASE-1935 for motivation.
> ClientScanner should be able to exist outside of HTable.
> While we're at it, we can also add an abstract client scanner to easy 
> development of new client side scanners (such as parallel scanners, or per 
> region scanners).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4477) Ability for an application to store metadata into the transaction log

2011-09-23 Thread dhruba borthakur (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HBASE-4477:


Attachment: hlogMetadata1.txt

The initial design I am considering here:

1. The Put call already has a map of arbitrary attributes (keys/values). The 
application will set a special key named "_hbaseWalMetadata". The value of this 
attribute is what needs to be stored in the HLog. From this perspective, the 
format of the Put call does not need any change.

2. The regionserver will store a metakv in the same Waledit record as the 
transaction. This kv will have  column family name as ""METAFAMILY" and column 
name as "METACOL".

3. Code already exists to ignore kvs with column family as ""METAFAMILY" from 
the log splitting process. The reason is because we already write such records 
to the HLog (see completeCacheFlush)


> Ability for an application to store metadata into the transaction log
> -
>
> Key: HBASE-4477
> URL: https://issues.apache.org/jira/browse/HBASE-4477
> Project: HBase
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: hlogMetadata1.txt
>
>
> mySQL allows an application to store an arbitrary blob along with each 
> transaction in its transaction logs. This JIRA is to have a similar feature 
> request for HBASE.
> The use case is as follows: An application on one data center A stores a blob 
> of data along with each transaction. A replication software picks up these 
> blobs from the transaction logs in A and hands it to another instance of the 
> same application running on a remote data center B. The application in B is 
> responsible for applying this to the remote Hbase cluster (and also handle 
> conflict resolution if any).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4344) Persist memstoreTS to disk


[ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113854#comment-13113854
 ] 

Lars Hofhansl commented on HBASE-4344:
--

+1
This is an important fix.
Nice team work on this one :)

> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v10.txt, 4344-v11.txt, 4344-v2.txt, 4344-v4.txt, 
> 4344-v5.txt, 4344-v6.txt, 4344-v7.txt, 4344-v8.txt, 4344-v9.txt, patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-23 Thread dhruba borthakur (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113850#comment-13113850
 ] 

dhruba borthakur commented on HBASE-4131:
-

Stack: thanks for taking the time to review this patch. I had not yet run the 
full test suite and was expecting the submitPatch request to run it (similar to 
Hadoop). But extremely sorry that it got committed much earlier.

I am working on this one now and will verify and post what I find.

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.94.0
>
> Attachments: 4131-backedout.txt, replicationInterface1.txt, 
> replicationInterface2.txt, replicationInterface3.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4477) Ability for an application to store metadata into the transaction log

2011-09-23 Thread dhruba borthakur (JIRA)

Ability for an application to store metadata into the transaction log
-

 Key: HBASE-4477
 URL: https://issues.apache.org/jira/browse/HBASE-4477
 Project: HBase
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur


mySQL allows an application to store an arbitrary blob along with each 
transaction in its transaction logs. This JIRA is to have a similar feature 
request for HBASE.

The use case is as follows: An application on one data center A stores a blob 
of data along with each transaction. A replication software picks up these 
blobs from the transaction logs in A and hands it to another instance of the 
same application running on a remote data center B. The application in B is 
responsible for applying this to the remote Hbase cluster (and also handle 
conflict resolution if any).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

2011-09-23 Thread Karthik Ranganathan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113847#comment-13113847
 ] 

Karthik Ranganathan commented on HBASE-4463:


@Stack - we can find the exact amount of data we are writing to the dfs (only 
hfile blocks will contribute to this during compactions). So adding a threshold 
like this is not too hard... but there could be disk iops pressure (instead of 
network bandwidth) and detecting that would be hard. So we would still need to 
set off-peak time.

I was trying to come up with a more generic solution but that involves setting 
up a feedback loop inside the regionserver - keep track of max, min and average 
latencies over the last k days (would have to store this in META or some other 
location as it needs to persist beyond restarts). Need to remove any spikes in 
the values. When we run an aggressive compaction, we need to make sure the 
latencies are still acceptable, otherwise dont run aggressive compactions. This 
is much harder to get right though.

> Run more aggressive compactions during off peak hours
> -
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization 
> at off peak hours is much lower than at peak hours depending on the 
> application usage pattern. We can utilize this knowledge to improve the 
> performance of the HBase cluster by increasing the compact selection ratio to 
> a much larger value during off-peak hours than otherwise - increasing 
> hbase.hstore.compaction.ratio (1.2 default) to 
> hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the 
> average number of files per store.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4344) Persist memstoreTS to disk


[ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113846#comment-13113846
 ] 

Ted Yu commented on HBASE-4344:
---

Based on a stable TRUNK tree, all tests passed, including TestAcidGuarantees.
More testing is planned before committing, possibly on Monday.

> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v10.txt, 4344-v11.txt, 4344-v2.txt, 4344-v4.txt, 
> 4344-v5.txt, 4344-v6.txt, 4344-v7.txt, 4344-v8.txt, 4344-v9.txt, patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4476) Compactions must fail if column tracker gets columns out of order

2011-09-23 Thread Mikhail Bautin (JIRA)

Compactions must fail if column tracker gets columns out of order
-

 Key: HBASE-4476
 URL: https://issues.apache.org/jira/browse/HBASE-4476
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.92.0, 0.94.0


We found this in ScanWildcardColumnTracker:

// new col < oldcol
// if (cmp < 0) {
// WARNING: This means that very likely an edit for some other family
// was incorrectly stored into the store for this one. Continue, but
// complain.
LOG.error("ScanWildcardColumnTracker.checkColumn ran " +
"into a column actually smaller than the previous column: " +

This went under the radar in our dark launch cluster when a column family name 
was misspelled first, but then was "renamed" by renaming directories in the 
HBase storage directory tree. We ended up with inconsistent data, but 
compactions still succeeded most of the time, likely discarding part of input 
data.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-09-23 Thread Jonathan Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113839#comment-13113839
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

I have a very *hacky* version that I've successfully recently used to rebuild a 
.META. table with over 10k regions. It can be found here:

https://github.com/jmhsieh/hbase/tree/hbase-4377

I've also hacked the hack to backport it onto an 0.90.x branch.

To run it build hbase and then use the following command line

{code}
bin/hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -base 
~/pathToHbase/hbase -details
{code}

The program will fail telling the user about any problems it encounters. It 
only succeed if all the info gathered from .regioninfo's  is clean after going 
through the regionsplit calculator.  

This code will take some time to clean up.  

I would like to do some refactoring of the current hbck and create a 
o.a.h.hbase.util.hbck or o.a.h.hbase.hbck package.  Any preferences or concerns 
there?

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jonathan Hsieh
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4475) When running an embedded ThriftServer, use User.runAs() to allow it to run as a separate principal from the embedding region server

2011-09-23 Thread Gary Helmling (JIRA)

When running an embedded ThriftServer, use User.runAs() to allow it to run as a 
separate principal from the embedding region server
---

 Key: HBASE-4475
 URL: https://issues.apache.org/jira/browse/HBASE-4475
 Project: HBase
  Issue Type: Improvement
  Components: security, thrift
Reporter: Gary Helmling


As discussed over in HBASE-4460, the current approach to ThriftServer 
authentication (provided in HBASE-4099) will not work in an embedded context, 
since the region server will already does a login for the process.

We could make the embedded thrift server still run as a separate user, though, 
by doing something like the following:

* add a {{User.loginAndReturnUser()}} variant that delegates to 
{{UserGroupInformation.loginUserFromKeytabAndReturnUGI()}}, then returns a 
wrapping {{User}} instance
* call this method on startup for the embedded thrift server to get the thrift 
user instance
* use {{User.runAs()}} to execute the body of {{HRegionThriftServer.run()}} as 
the logged in thrift user


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4474) Use variable length format to store the memstoreTS

Use variable length format to store the memstoreTS
--

 Key: HBASE-4474
 URL: https://issues.apache.org/jira/browse/HBASE-4474
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu
 Fix For: 0.92.0


HBASE-4344 introduced memstoreTS for KeyValues.

The following suggestion was from Kannan:
We should consider using variable length format to store the memstoreTS on 
disk. Also, at the start of the flush, we can probably prune most of these 
timestamps to 0 since only the ones that are higher than the current read point 
for all active scanners need to be maintained at the fine grain level. So like 
often times, for a majority of the KVs, we might be able to just write a 0. And 
hence, storing in varying width format would be an even bigger win.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4387) Error while syncing: DFSOutputStream is closed

2011-09-23 Thread Gary Helmling (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113832#comment-13113832
 ] 

Gary Helmling commented on HBASE-4387:
--

The TestLogRolling failure is on the assertion:
{code}
assertTrue(preLogRolledCalled.size() == 1);
{code}

So I would suspect it's more due to HBASE-4132.  Unless there's some 
interaction between this change and that.

> Error while syncing: DFSOutputStream is closed
> --
>
> Key: HBASE-4387
> URL: https://issues.apache.org/jira/browse/HBASE-4387
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.90.5
>
> Attachments: 4387.txt, errors-with-context.txt
>
>
> In a billion-row load on ~25 servers, I see "error while syncing" reasonable 
> often with the error "DFSOutputStream is closed" around a roll. We have some 
> race where a roll at the same time as heavy inserts causes a problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4387) Error while syncing: DFSOutputStream is closed


[ 
https://issues.apache.org/jira/browse/HBASE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113829#comment-13113829
 ] 

Lars Hofhansl commented on HBASE-4387:
--

Don't see how it could. I'll be on computer soon to check it out.




> Error while syncing: DFSOutputStream is closed
> --
>
> Key: HBASE-4387
> URL: https://issues.apache.org/jira/browse/HBASE-4387
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.90.5
>
> Attachments: 4387.txt, errors-with-context.txt
>
>
> In a billion-row load on ~25 servers, I see "error while syncing" reasonable 
> often with the error "DFSOutputStream is closed" around a roll. We have some 
> race where a roll at the same time as heavy inserts causes a problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4415) Add configuration script for setup HBase (hbase-setup-conf.sh)

2011-09-23 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113828#comment-13113828
 ] 

jirapos...@reviews.apache.org commented on HBASE-4415:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1925/
---

(Updated 2011-09-23 23:17:00.742946)


Review request for hbase.


Changes
---

Patch for finding location of hadoop jar files.


Summary
---

Create a post installation script to streamline configuration tasks for HBase.

usage: /usr/sbin/hbase-setup-conf.sh 

  Optional parameters:
--hadoop-conf=/etc/hadoopSet Hadoop configuration directory 
location
--hadoop-home=/usr   Set Hadoop directory location
--hadoop-namenode=localhost  Set Hadoop namenode hostname
--hadoop-replication=3   Set HDFS replication
--hbase-home=/usrSet HBase directory location
--hbase-conf=/etc/hbase  Set HBase configuration directory 
location
--hbase-log=/var/log/hbase   Set HBase log directory location
--hbase-pid=/var/run/hbase   Set HBase pid directory location
--hbase-user=hbase   Set HBase user
--java-home=/usr/java/defaultSet JAVA_HOME directory location
--kerberos-realm=KERBEROS.EXAMPLE.COMSet Kerberos realm
--kerberos-principal-id=_HOSTSet Kerberos principal ID 
--keytab-dir=/etc/security/keytabs   Set keytab directory
--regionservers=localhostSet regionservers hostnames
--zookeeper-home=/usrSet ZooKeeper directory location
--zookeeper-quorum=localhost Set ZooKeeper Quorum
--zookeeper-snapshot=/var/lib/zookeeper  Set ZooKeeper snapshot location


This addresses bug HBASE-4415.
https://issues.apache.org/jira/browse/HBASE-4415


Diffs (updated)
-

  /src/assembly/all.xml 1175049 
  /src/docbkx/getting_started.xml 1175049 
  /src/packages/hbase-setup-conf.sh PRE-CREATION 
  /src/packages/templates/conf/hbase-env.sh PRE-CREATION 
  /src/packages/templates/conf/hbase-site.xml PRE-CREATION 

Diff: https://reviews.apache.org/r/1925/diff


Testing
---


Thanks,

Eric



> Add configuration script for setup HBase (hbase-setup-conf.sh)
> --
>
> Key: HBASE-4415
> URL: https://issues.apache.org/jira/browse/HBASE-4415
> Project: HBase
>  Issue Type: New Feature
>  Components: scripts
> Environment: Java 6, Linux
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: HBASE-4415-1.patch, HBASE-4415-2.patch, 
> HBASE-4415-3.patch, HBASE-4415-4.patch, HBASE-4415.patch
>
>
> The goal of this jura is to provide a installation script for configuring 
> HBase environment and configuration.  By using the same pattern of 
> *-setup-conf.sh for all Hadoop related projects.  For HBase, the usage of the 
> script looks like this:
> {noformat}
> usage: ./hbase-setup-conf.sh 
>   Optional parameters:
> --hadoop-conf=/etc/hadoopSet Hadoop configuration directory 
> location
> --hadoop-home=/usr   Set Hadoop directory location
> --hadoop-namenode=localhost  Set Hadoop namenode hostname
> --hadoop-replication=3   Set HDFS replication
> --hbase-home=/usrSet HBase directory location
> --hbase-conf=/etc/hbase  Set HBase configuration 
> directory location
> --hbase-log=/var/log/hbase   Set HBase log directory location
> --hbase-pid=/var/run/hbase   Set HBase pid directory location
> --hbase-user=hbase   Set HBase user
> --java-home=/usr/java/defaultSet JAVA_HOME directory location
> --kerberos-realm=KERBEROS.EXAMPLE.COMSet Kerberos realm
> --kerberos-principal-id=_HOSTSet Kerberos principal ID 
> --keytab-dir=/etc/security/keytabs   Set keytab directory
> --regionservers=localhostSet regionservers hostnames
> --zookeeper-home=/usrSet ZooKeeper directory location
> --zookeeper-quorum=localhost Set ZooKeeper Quorum
> --zookeeper-snapshot=/var/lib/zookeeper  Set ZooKeeper snapshot location
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4446) Rolling restart RSs scenario, regions could stay in OPENING state


[ 
https://issues.apache.org/jira/browse/HBASE-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113825#comment-13113825
 ] 

Hudson commented on HBASE-4446:
---

Integrated in HBase-0.92 #17 (See 
[https://builds.apache.org/job/HBase-0.92/17/])
HBASE-4446 Rolling restart RSs scenario, regions could stay in OPENING state

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


> Rolling restart RSs scenario, regions could stay in OPENING state
> -
>
> Key: HBASE-4446
> URL: https://issues.apache.org/jira/browse/HBASE-4446
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0
>
> Attachments: HBASE-4446-trunk.patch
>
>
> Keep Master up all the time, do rolling restart of RSs like this - stop RS1, 
> wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start 
> RS2, wait for 2 seconds, etc. Region sometimes can just stay in OPENING state 
> even after timeoutmonitor period.
> 2011-09-19 08:10:33,131 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: While timing out a region 
> in state OPENING, found ZK node in unexpected state: RS_ZK_REGION_FAILED_OPEN
> The issue - RS was shutdown when a region is being opened, it was 
> transitioned to RS_ZK_REGION_FAILED_OPEN in ZK. In timeoutmonitor, it didn't 
> take care of RS_ZK_REGION_FAILED_OPEN.
> processOpeningState
> ...
>else if (dataInZNode.getEventType() != EventType.RS_ZK_REGION_OPENING &&
> LOG.warn("While timing out a region in state OPENING, "
> + "found ZK node in unexpected state: "
> + dataInZNode.getEventType());
> return;
>   }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4387) Error while syncing: DFSOutputStream is closed


[ 
https://issues.apache.org/jira/browse/HBASE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113823#comment-13113823
 ] 

Hudson commented on HBASE-4387:
---

Integrated in HBase-0.92 #17 (See 
[https://builds.apache.org/job/HBase-0.92/17/])
HBASE-4387 Error while syncing: DFSOutputStream is closed

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java


> Error while syncing: DFSOutputStream is closed
> --
>
> Key: HBASE-4387
> URL: https://issues.apache.org/jira/browse/HBASE-4387
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.90.5
>
> Attachments: 4387.txt, errors-with-context.txt
>
>
> In a billion-row load on ~25 servers, I see "error while syncing" reasonable 
> often with the error "DFSOutputStream is closed" around a roll. We have some 
> race where a roll at the same time as heavy inserts causes a problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4295) rowcounter does not return the correct number of rows in certain circumstances