date:20120719


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418752#comment-13418752
 ] 

Zhihong Ted Yu commented on HBASE-6389:
---

Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/2406/console, 
there was still some hanging test although I wasn't able to find which test 
hung.

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4470) ServerNotRunningException coming out of assignRootAndMeta kills the Master

[
https://issues.apache.org/jira/browse/HBASE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418753#comment-13418753
]

Hadoop QA commented on HBASE-4470:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12537246/HBASE-4470-v2-trunk.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 12 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2413//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2413//console

This message is automatically generated.

ServerNotRunningException coming out of assignRootAndMeta kills the Master
--

Key: HBASE-4470
URL: https://issues.apache.org/jira/browse/HBASE-4470
Project: HBase
Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: Gregory Chanan
Priority: Critical
Fix For: 0.90.7

Attachments: HBASE-4470-90.patch, HBASE-4470-v2-90.patch,
HBASE-4470-v2-92_94.patch, HBASE-4470-v2-trunk.patch

I'm surprised we still have issues like that and I didn't get a hit while
googling so forgive me if there's already a jira about it.
When the master starts it verifies the locations of root and meta before
assigning them, if the server is started but not running you'll get this:
{quote}
2011-09-23 04:47:44,859 WARN
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
RemoteException connecting to RS
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running
yet
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy6.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
at
org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:969)
at
org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:388)
at
org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:287)
at
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:484)
at
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:441)
at
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:388)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282)
{quote}
I hit that 3-4 times this week while debugging something else. The worst is
that when you restart the master it sees that as a failover, but none of the
regions are assigned so it takes an eternity to get back fully online.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Resolved] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock

2012-07-19 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-6319.
---

   Resolution: Fixed
Fix Version/s: (was: 0.90.8)
 Hadoop Flags: Reviewed

Committed to 0.92 and 0.94, skipping 0.90 like HBASE-6325. Trunk was already 
fixed.

 ReplicationSource can call terminate on itself and deadlock
 ---

 Key: HBASE-6319
 URL: https://issues.apache.org/jira/browse/HBASE-6319
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6319-0.92.patch


 In a few places in the ReplicationSource code calls terminate on itself which 
 is a problem since in terminate() we wait on that thread to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha

2012-07-19 Thread Gregory Chanan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-5966:
--

Attachment: HBASE-5966-94.patch

Attached patch for 0.94.  Ran TestTableMapReduce against both 1.0 and 2.0 
hadoop profiles, both passed:


mvn test -PlocalTests 
-Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce

---
 T E S T S
---
Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 188.087 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

mvn test -PlocalTests -Dhadoop.profile=2.0 
-Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce

---
 T E S T S
---
Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 167.49 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0


 MapReduce based tests broken on Hadoop 2.0.0-alpha
 --

 Key: HBASE-5966
 URL: https://issues.apache.org/jira/browse/HBASE-5966
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce, test
Affects Versions: 0.94.0, 0.96.0
 Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, 
 Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
Reporter: Andrew Purtell
Assignee: Jimmy Xiang
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, 
 HBASE-5966.patch, hbase-5966.patch


 Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test 
 rigging. Below is a representative error, can be easily reproduced with:
 {noformat}
 mvn -PlocalTests -Psecurity \
   -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \
   clean test \
   -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 {noformat}
 And the result:
 {noformat}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
  FAILURE!
 ---
 Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 ---
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
  FAILURE!
 testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce)  
 Time elapsed: 21.935 sec   ERROR!
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134)
   at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183)
   at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244)
   at 
 org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151)
   at 
 org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418766#comment-13418766
 ] 

stack commented on HBASE-6389:
--

@Aditya Makes sense.  You got what you needed from Ted?  Let us know.  Thanks.

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418767#comment-13418767
 ] 

Hadoop QA commented on HBASE-6389:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12537258/org.apache.hadoop.hbase.TestZooKeeper-output.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 10 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2415//console

This message is automatically generated.

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count

[jira] [Commented] (HBASE-6393) Decouple audit event creation from storage in AccessController

[
https://issues.apache.org/jira/browse/HBASE-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418768#comment-13418768
]

Hadoop QA commented on HBASE-6393:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12537256/hbase-6393-v1.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 15 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2414//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2414//console

This message is automatically generated.

Decouple audit event creation from storage in AccessController
--

Key: HBASE-6393
URL: https://issues.apache.org/jira/browse/HBASE-6393
Project: HBase
Issue Type: Brainstorming
Components: security
Affects Versions: 0.96.0
Reporter: Marcelo Vanzin
Attachments: hbase-6393-v1.patch

Currently, AccessControler takes care of both generating audit events (by
performing access checks) and storing them (by creating a log message and
writing it to the AUDITLOG logger).
This makes the logging system the only way to catch audit events. It means
that if someone wants to do something fancier (like writing these records to
a database somewhere), they need to hack through the logging system, and
parse the messages generated by AccessController, which is not optimal.
The attached patch decouples generation and storage by introducing a new
interface, used by AccessController, to log the audit events. The current,
log-based storage is kept in place so that current users won't be affected by
the change.
I'm filing this as an RFC at this point, so the patch is not totally clean;
it's on top of HBase 0.92 (which is easier for me to test) and doesn't have
any unit tests, for starters. But the changes should be very similar on trunk
- I don't remember changes in this particular area of the code between those
versions.

[jira] [Commented] (HBASE-6417) hbck merges .META. regions if there's an old leftover


[ 
https://issues.apache.org/jira/browse/HBASE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418769#comment-13418769
 ] 

Jonathan Hsieh commented on HBASE-6417:
---

Did you keep a copy of the hbck details before you ran the -repair option?  

 hbck merges .META. regions if there's an old leftover
 -

 Key: HBASE-6417
 URL: https://issues.apache.org/jira/browse/HBASE-6417
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
 Fix For: 0.96.0, 0.94.2

 Attachments: hbck.log


 Trying to see what caused HBASE-6310, one of the things I figured is that the 
 bad .META. row is actually one from the time that we were permitting meta 
 splitting and that folder had just been staying there for a while.
 So I tried to recreate the issue with -repair and it merged my good .META. 
 region with the one that's 3 years old that also has the same start key. I 
 ended up with a brand new .META. region!
 I'll be attaching the full log in a separate file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418771#comment-13418771
 ] 

Lars Hofhansl commented on HBASE-6389:
--

I'd like to leave this with 0.94.2. Unless you think this must go into 0.94.1

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha

2012-07-19 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418772#comment-13418772
 ] 

Jimmy Xiang commented on HBASE-5966:


looks good to me, will commit to 0.94 tonight if no objection.

 MapReduce based tests broken on Hadoop 2.0.0-alpha
 --

 Key: HBASE-5966
 URL: https://issues.apache.org/jira/browse/HBASE-5966
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce, test
Affects Versions: 0.94.0, 0.96.0
 Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, 
 Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
Reporter: Andrew Purtell
Assignee: Jimmy Xiang
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, 
 HBASE-5966.patch, hbase-5966.patch


 Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test 
 rigging. Below is a representative error, can be easily reproduced with:
 {noformat}
 mvn -PlocalTests -Psecurity \
   -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \
   clean test \
   -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 {noformat}
 And the result:
 {noformat}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
  FAILURE!
 ---
 Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 ---
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
  FAILURE!
 testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce)  
 Time elapsed: 21.935 sec   ERROR!
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134)
   at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183)
   at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244)
   at 
 org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151)
   at 
 org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at

[jira] [Commented] (HBASE-6417) hbck merges .META. regions if there's an old leftover

2012-07-19 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418774#comment-13418774
 ] 

Jean-Daniel Cryans commented on HBASE-6417:
---

No, but I can reproduce.

 hbck merges .META. regions if there's an old leftover
 -

 Key: HBASE-6417
 URL: https://issues.apache.org/jira/browse/HBASE-6417
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
 Fix For: 0.96.0, 0.94.2

 Attachments: hbck.log


 Trying to see what caused HBASE-6310, one of the things I figured is that the 
 bad .META. row is actually one from the time that we were permitting meta 
 splitting and that folder had just been staying there for a while.
 So I tried to recreate the issue with -repair and it merged my good .META. 
 region with the one that's 3 years old that also has the same start key. I 
 ended up with a brand new .META. region!
 I'll be attaching the full log in a separate file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6310) -ROOT- corruption when .META. is using the old encoding scheme


[ 
https://issues.apache.org/jira/browse/HBASE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418777#comment-13418777
 ] 

Jonathan Hsieh commented on HBASE-6310:
---

hbck writes directly to .META. but I don't think it ever writes to root unless 
you put the -metaonly flag on.  

It may be possible that if there were two .META. region dirs, hbck tried to 
pull in the old .META. dir.  This would probably write something goofy to .META 
though.  If you just used the -repair option, it would have first tried to 
merge regions before modifying meta. (but also would likely have not modified 
ROOT).

 -ROOT- corruption when .META. is using the old encoding scheme
 --

 Key: HBASE-6310
 URL: https://issues.apache.org/jira/browse/HBASE-6310
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0, 0.94.2


 We're still working the on the root cause here, but after the leap second 
 armageddon we had a hard time getting our 0.94 cluster back up. This is what 
 we saw in the logs until the master died by itself:
 {noformat}
 2012-07-01 23:01:52,149 DEBUG
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
 locateRegionInMeta parentTable=-ROOT-,
 metaLocation={region=-ROOT-,,0.70236052, hostname=sfor3s28,
 port=10304}, attempt=16 of 100 failed; retrying after sleep of 32000
 because: HRegionInfo was null or empty in -ROOT-,
 row=keyvalues={.META.,,1259448304806/info:server/1341124914705/Put/vlen=14/ts=0,
 .META.,,1259448304806/info:serverstartcode/1341124914705/Put/vlen=8/ts=0}
 {noformat}
 (it's strage that we retry this)
 This was really misleading because I could see the regioninfo in a scan:
 {noformat}
 hbase(main):002:0 scan '-ROOT-'
 ROW   COLUMN+CELL
  .META.,,1column=info:regioninfo,
 timestamp=1331755381142, value={NAME = '.META.,,1', STARTKEY = '',
 ENDKEY = '', ENCODED = 1028785192,}
  .META.,,1column=info:server,
 timestamp=1341183448693, value=sfor3s40:10304
  .META.,,1
 column=info:serverstartcode, timestamp=1341183448693,
 value=1341183444689
  .META.,,1column=info:v,
 timestamp=1331755419291, value=\x00\x00
  .META.,,1259448304806column=info:server,
 timestamp=1341124914705, value=sfor3s24:10304
  .META.,,1259448304806
 column=info:serverstartcode, timestamp=1341124914705,
 value=1341124455863
 {noformat}
 Except that the devil is in the details, .META.,,1 is not 
 .META.,,1259448304806. Basically something writes to .META. by directly 
 creating the row key without caring if the row is in the old format. I did a 
 deleteall in the shell and it fixed the issue... until some time later it was 
 stuck again because the edits reappeared (still not sure why). This time the 
 PostOpenDeployTasksThread were stuck in the RS trying to update .META. but 
 there was no logging (saw it with a jstack). I deleted the row again to make 
 it work.
 I'm marking this as a blocker against 0.94.2 since we're trying to get 0.94.1 
 out, but I wouldn't recommend upgrading to 0.94 if your cluster was created 
 before 0.89

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Aditya Kishore (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418789#comment-13418789
 ] 

Aditya Kishore commented on HBASE-6389:
---

My vote was for its inclusion for 2 reasons.

# This was a behavior change in 0.94.0 and I am not sure we have completely 
understood its impact.
# In a large MSLAB enabled cluster, I have repeatedly seen all the regions (in 
excess of 5K with *Σ*~i=1..n~(*R*~i~*CF*~i~)  8K; with MSLAB on, RS needs  
16G just to open) being assigned to a single region server leading it to OOM 
crash and creating quite a few HBCK inconsistencies on subsequent recovery.

Lastly, so far all the test failures seems to be due to errors in the test code 
unmasked by this change.

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval

[jira] [Commented] (HBASE-3432) [hbck] Add remove table switch


[ 
https://issues.apache.org/jira/browse/HBASE-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418790#comment-13418790
 ] 

Jonathan Hsieh commented on HBASE-3432:
---

[~vamshi] root and meta are special regions but regions nonetheless. They get 
assigned to arbitrary (possibly different) region servers, and are hit on every 
new client's read and write path.  
 
[~juneng603] /hbase/uassigned is where Regions-in-transitions informatin is 
kept.  These are modified as regions are being assigned to particular region 
servers.  They coordinate the state between the master assigning and then RS 
assignee.

 [hbck] Add remove table switch
 

 Key: HBASE-3432
 URL: https://issues.apache.org/jira/browse/HBASE-3432
 Project: HBase
  Issue Type: New Feature
  Components: util
Affects Versions: 0.89.20100924
Reporter: Lars George
Priority: Minor

 This happened before and I am not sure how the new Master improves on it 
 (this stuff is only available between the lines are buried in some peoples 
 heads - one other thing I wish was for a better place to communicate what 
 each path improves). Just so we do not miss it, there is an issue that 
 sometimes disabling large tables simply times out and the table gets stuck in 
 limbo. 
 From the CDH User list:
 {quote}
 On Fri, Jan 7, 2011 at 1:57 PM, Sean Sechrist ssechr...@gmail.com wrote:
 To get them out of META, you can just scan '.META.' for that table name, and 
 delete those rows. We had to do that a few months ago.
 -Sean
 That did it.  For the benefit of others, here's code.  Beware the literal 
 table names, run at your own peril.
 {quote}
 {code}
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.client.HTable;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.client.MetaScanner;
 import org.apache.hadoop.hbase.util.Bytes;
 public class CleanFromMeta {
 public static class Cleaner implements MetaScanner.MetaScannerVisitor {
 public HTable meta = null;
 public Cleaner(Configuration conf) throws IOException {
 meta = new HTable(conf, .META.);
 }
 public boolean processRow(Result rowResult) throws IOException {
 String r = new String(rowResult.getRow());
 if (r.startsWith(webtable,)) {
 meta.delete(new Delete(rowResult.getRow()));
 System.out.println(Deleting row  + rowResult);
 }
 return true;
 }
 }
 public static void main(String[] args) throws Exception {
 String tname = .META.;
 Configuration conf = HBaseConfiguration.create();
 MetaScanner.metaScan(conf, new Cleaner(conf), 
  Bytes.toBytes(webtable));
 }
 }
 {code}
 I suggest to move this into HBaseFsck. I do not like personally to have these 
 JRuby scripts floating around that may or may not help. This should be 
 available if a user gets stuck and knows what he is doing (they can delete 
 from .META. anyways). Maybe a \-\-disable-table tablename \-\-force or 
 so? But since disable is already in the shell we could add an \-\-force 
 there? Or add a \-\-delete-table tablename to the hbck?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3432) [hbck] Add remove table switch


[ 
https://issues.apache.org/jira/browse/HBASE-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418792#comment-13418792
 ] 

Jonathan Hsieh commented on HBASE-3432:
---

[juneng603] eventually, after region assignments are completed and the region 
is opened on the target RS, information is updated in the META table so that 
other clients can go to the proper RS.

 [hbck] Add remove table switch
 

 Key: HBASE-3432
 URL: https://issues.apache.org/jira/browse/HBASE-3432
 Project: HBase
  Issue Type: New Feature
  Components: util
Affects Versions: 0.89.20100924
Reporter: Lars George
Priority: Minor

 This happened before and I am not sure how the new Master improves on it 
 (this stuff is only available between the lines are buried in some peoples 
 heads - one other thing I wish was for a better place to communicate what 
 each path improves). Just so we do not miss it, there is an issue that 
 sometimes disabling large tables simply times out and the table gets stuck in 
 limbo. 
 From the CDH User list:
 {quote}
 On Fri, Jan 7, 2011 at 1:57 PM, Sean Sechrist ssechr...@gmail.com wrote:
 To get them out of META, you can just scan '.META.' for that table name, and 
 delete those rows. We had to do that a few months ago.
 -Sean
 That did it.  For the benefit of others, here's code.  Beware the literal 
 table names, run at your own peril.
 {quote}
 {code}
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.client.HTable;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.client.MetaScanner;
 import org.apache.hadoop.hbase.util.Bytes;
 public class CleanFromMeta {
 public static class Cleaner implements MetaScanner.MetaScannerVisitor {
 public HTable meta = null;
 public Cleaner(Configuration conf) throws IOException {
 meta = new HTable(conf, .META.);
 }
 public boolean processRow(Result rowResult) throws IOException {
 String r = new String(rowResult.getRow());
 if (r.startsWith(webtable,)) {
 meta.delete(new Delete(rowResult.getRow()));
 System.out.println(Deleting row  + rowResult);
 }
 return true;
 }
 }
 public static void main(String[] args) throws Exception {
 String tname = .META.;
 Configuration conf = HBaseConfiguration.create();
 MetaScanner.metaScan(conf, new Cleaner(conf), 
  Bytes.toBytes(webtable));
 }
 }
 {code}
 I suggest to move this into HBaseFsck. I do not like personally to have these 
 JRuby scripts floating around that may or may not help. This should be 
 available if a user gets stuck and knows what he is doing (they can delete 
 from .META. anyways). Maybe a \-\-disable-table tablename \-\-force or 
 so? But since disable is already in the shell we could add an \-\-force 
 there? Or add a \-\-delete-table tablename to the hbck?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha


[ 
https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418793#comment-13418793
 ] 

Lars Hofhansl commented on HBASE-5966:
--

+1

 MapReduce based tests broken on Hadoop 2.0.0-alpha
 --

 Key: HBASE-5966
 URL: https://issues.apache.org/jira/browse/HBASE-5966
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce, test
Affects Versions: 0.94.0, 0.96.0
 Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, 
 Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
Reporter: Andrew Purtell
Assignee: Jimmy Xiang
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, 
 HBASE-5966.patch, hbase-5966.patch


 Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test 
 rigging. Below is a representative error, can be easily reproduced with:
 {noformat}
 mvn -PlocalTests -Psecurity \
   -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \
   clean test \
   -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 {noformat}
 And the result:
 {noformat}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
  FAILURE!
 ---
 Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 ---
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
  FAILURE!
 testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce)  
 Time elapsed: 21.935 sec   ERROR!
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134)
   at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183)
   at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244)
   at 
 org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151)
   at 
 org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)

[jira] [Commented] (HBASE-4956) Control direct memory buffer consumption by HBaseClient


[ 
https://issues.apache.org/jira/browse/HBASE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418797#comment-13418797
 ] 

Hudson commented on HBASE-4956:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #100 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/100/])
HBASE-4956 Control direct memory buffer consumption by HBaseClient (Bob 
Copeland) (Revision 1363526)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Result.java


 Control direct memory buffer consumption by HBaseClient
 ---

 Key: HBASE-4956
 URL: https://issues.apache.org/jira/browse/HBASE-4956
 Project: HBase
  Issue Type: New Feature
Reporter: Ted Yu
Assignee: Bob Copeland
 Fix For: 0.96.0, 0.94.1

 Attachments: 4956.txt, thread_get.rb


 As Jonathan explained here 
 https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357?pli=1
  , standard hbase client inadvertently consumes large amount of direct memory.
 We should consider using netty for NIO-related tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6312) Make BlockCache eviction thresholds configurable


[ 
https://issues.apache.org/jira/browse/HBASE-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418798#comment-13418798
 ] 

Hudson commented on HBASE-6312:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #100 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/100/])
HBASE-6312 Make BlockCache eviction thresholds configurable (Jie Huang) 
(Revision 1363468)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestLruBlockCache.java


 Make BlockCache eviction thresholds configurable
 

 Key: HBASE-6312
 URL: https://issues.apache.org/jira/browse/HBASE-6312
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jie Huang
Assignee: Jie Huang
Priority: Minor
 Fix For: 0.96.0

 Attachments: hbase-6312.patch, hbase-6312_v2.patch, 
 hbase-6312_v3.patch


 Some of our customers found that tuning the BlockCache eviction thresholds 
 made test results different in their test environment. However, those 
 thresholds are not configurable in the current implementation. The only way 
 to change those values is to re-compile the HBase source code. We wonder if 
 it is possible to make them configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive


[ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418799#comment-13418799
 ] 

Hudson commented on HBASE-6325:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #100 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/100/])
HBASE-6325  [replication] Race in ReplicationSourceManager.init can 
initiate a failover even if the node is alive (Revision 1363573)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


 [replication] Race in ReplicationSourceManager.init can initiate a failover 
 even if the node is alive
 -

 Key: HBASE-6325
 URL: https://issues.apache.org/jira/browse/HBASE-6325
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch


 Yet another bug found during the leap second madness, it's possible to miss 
 the registration of new region servers so that in 
 ReplicationSourceManager.init we start the failover of a live and replicating 
 region server. I don't think there's data loss but the RS that's being failed 
 over will die on:
 {noformat}
 2012-07-01 06:25:15,604 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 sv4r23s48,10304,1341112194623: Writing replication status
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
 at 
 org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
 {noformat}
 It seems to me that just refreshing {{otherRegionServers}} after getting the 
 list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HBASE-6276) TestClassLoading is racy

2012-07-19 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reopened HBASE-6276:
---

  Assignee: (was: Andrew Purtell)

 TestClassLoading is racy
 

 Key: HBASE-6276
 URL: https://issues.apache.org/jira/browse/HBASE-6276
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Andrew Purtell
Priority: Minor
 Attachments: HBASE-6276-0.94.patch, HBASE-6276.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock


[ 
https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418802#comment-13418802
 ] 

Hudson commented on HBASE-6319:
---

Integrated in HBase-0.94 #343 (See 
[https://builds.apache.org/job/HBase-0.94/343/])
HBASE-6319  ReplicationSource can call terminate on itself and deadlock
HBASE-6325  [replication] Race in ReplicationSourceManager.init can initiate a 
failover even if the node is alive (Revision 1363570)

 Result = SUCCESS
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


 ReplicationSource can call terminate on itself and deadlock
 ---

 Key: HBASE-6319
 URL: https://issues.apache.org/jira/browse/HBASE-6319
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6319-0.92.patch


 In a few places in the ReplicationSource code calls terminate on itself which 
 is a problem since in terminate() we wait on that thread to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive


[ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418803#comment-13418803
 ] 

Hudson commented on HBASE-6325:
---

Integrated in HBase-0.94 #343 (See 
[https://builds.apache.org/job/HBase-0.94/343/])
HBASE-6319  ReplicationSource can call terminate on itself and deadlock
HBASE-6325  [replication] Race in ReplicationSourceManager.init can initiate a 
failover even if the node is alive (Revision 1363570)

 Result = SUCCESS
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


 [replication] Race in ReplicationSourceManager.init can initiate a failover 
 even if the node is alive
 -

 Key: HBASE-6325
 URL: https://issues.apache.org/jira/browse/HBASE-6325
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch


 Yet another bug found during the leap second madness, it's possible to miss 
 the registration of new region servers so that in 
 ReplicationSourceManager.init we start the failover of a live and replicating 
 region server. I don't think there's data loss but the RS that's being failed 
 over will die on:
 {noformat}
 2012-07-01 06:25:15,604 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 sv4r23s48,10304,1341112194623: Writing replication status
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
 at 
 org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
 {noformat}
 It seems to me that just refreshing {{otherRegionServers}} after getting the 
 list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha

2012-07-19 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HBASE-5966.


Resolution: Fixed

Integrated to 0.94. Thank Greg for the patch, Lars for the review.

 MapReduce based tests broken on Hadoop 2.0.0-alpha
 --

 Key: HBASE-5966
 URL: https://issues.apache.org/jira/browse/HBASE-5966
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce, test
Affects Versions: 0.94.0, 0.96.0
 Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, 
 Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
Reporter: Andrew Purtell
Assignee: Jimmy Xiang
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, 
 HBASE-5966.patch, hbase-5966.patch


 Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test 
 rigging. Below is a representative error, can be easily reproduced with:
 {noformat}
 mvn -PlocalTests -Psecurity \
   -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \
   clean test \
   -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 {noformat}
 And the result:
 {noformat}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
  FAILURE!
 ---
 Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 ---
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
  FAILURE!
 testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce)  
 Time elapsed: 21.935 sec   ERROR!
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134)
   at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183)
   at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244)
   at 
 org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151)
   at 
 org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418808#comment-13418808
 ] 

Lars Hofhansl commented on HBASE-6389:
--

@Aditya: I do agree. (see my comment about how I'm sure the logic of this 
change is correct).

It now seems, though, that it is the default timeout that is too short (4.5s).
Folks with 5k regions should know to increase the minToStart parameter and the 
timeout. We should document that better.
I can also see to change the timeout to failure condition (as discussed above).

I'm not opposed. It's just that 0.94.1 needs to go out because of HBASE-6311, I 
do not want to risk delaying this further. It also seems this can use further 
discussion.
(Sometimes it is amazing how much discussion as two change can cause :) )

@Ted and @Stack: What do you guys think? 


 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while

[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

[
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418808#comment-13418808
]

Lars Hofhansl edited comment on HBASE-6389 at 7/19/12 11:47 PM:

@Aditya: I do agree. (see my comment about how I'm sure the logic of this
change is correct).

It now seems, though, that it is the default timeout that is too short (4.5s).
Folks with 5k regions should know to increase the minToStart parameter and the
timeout. We should document that better.
I can also see to change the timeout to failure condition (as discussed above).

I'm not opposed. It's just that 0.94.1 needs to go out because of HBASE-6311, I
do not want to risk delaying this further. It also seems this can use further
discussion.
(Sometimes it is amazing how much discussion a two line change can cause :) )

@Ted and @Stack: What do you guys think?

Edit: Spelling.

was (Author: lhofhansl):
@Aditya: I do agree. (see my comment about how I'm sure the logic of this
change is correct).

@Ted and @Stack: What do you guys think?

Modify the conditions to ensure that Master waits for sufficient number of
Region Servers before starting region assignments

Key: HBASE-6389
URL: https://issues.apache.org/jira/browse/HBASE-6389
Project: HBase
Issue Type: Bug
Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
Fix For: 0.96.0, 0.94.2

Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch,
HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt

Continuing from HBASE-6375.
It seems I was mistaken in my assumption that changing the value of
hbase.master.wait.on.regionservers.mintostart to a sufficient number (from
default of 1) can help prevent assignment of all regions to one (or a small
number of) region server(s).
While this was the case in 0.90.x and 0.92.x, the behavior has changed in
0.94.0 onwards to address HBASE-4993.
From 0.94.0 onwards, Master will proceed immediately after the timeout has
lapsed, even if hbase.master.wait.on.regionservers.mintostart has not
reached.
Reading the current conditions of waitForRegionServers() clarifies it
{code:title=ServerManager.java (trunk rev:1360470)}

581 /**
582 * Wait for the region servers to report in.
583 * We will wait until one of this condition is met:
584 * - the master is stopped
585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached
586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of
587 *region servers is reached
588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached
AND
589 * there have been no new region server in for
590 * 'hbase.master.wait.on.regionservers.interval' time
591 *
592 * @throws InterruptedException
593 */
594 public void waitForRegionServers(MonitoredTask status)
595 throws InterruptedException {

612 while (
613 !this.master.isStopped()
614 slept timeout
615 count maxToStart
616 (lastCountChange+interval now || count minToStart)
617 ){

{code}
So with the current conditions, the wait will end as soon as timeout is
reached even lesser number of RS have checked-in with the Master and the
master will proceed with the region assignment among these RSes alone.
As mentioned in
-[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
and I concur, this could have disastrous effect in large cluster especially
now that MSLAB is turned on.
To enforce the required quorum as specified by
hbase.master.wait.on.regionservers.mintostart irrespective of timeout,
these conditions need to be modified as following
{code:title=ServerManager.java}
..
/**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
* - the master is stopped
* - the

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Aditya Kishore (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418812#comment-13418812
 ] 

Aditya Kishore commented on HBASE-6389:
---

@Lars

Completely agree and definitely would not want to hold 0.94.1 for this. (That's 
why My vote *was*... :) ).

Documentation can take care of this in 0.94.1

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive


[ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418813#comment-13418813
 ] 

Hudson commented on HBASE-6325:
---

Integrated in HBase-TRUNK #3154 (See 
[https://builds.apache.org/job/HBase-TRUNK/3154/])
HBASE-6325  [replication] Race in ReplicationSourceManager.init can 
initiate a failover even if the node is alive (Revision 1363573)

 Result = SUCCESS
jdcryans : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


 [replication] Race in ReplicationSourceManager.init can initiate a failover 
 even if the node is alive
 -

 Key: HBASE-6325
 URL: https://issues.apache.org/jira/browse/HBASE-6325
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch


 Yet another bug found during the leap second madness, it's possible to miss 
 the registration of new region servers so that in 
 ReplicationSourceManager.init we start the failover of a live and replicating 
 region server. I don't think there's data loss but the RS that's being failed 
 over will die on:
 {noformat}
 2012-07-01 06:25:15,604 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 sv4r23s48,10304,1341112194623: Writing replication status
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
 at 
 org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
 {noformat}
 It seems to me that just refreshing {{otherRegionServers}} after getting the 
 list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418815#comment-13418815
 ] 

Lars Hofhansl commented on HBASE-6389:
--

:) didn't pick up on the was

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6405) Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics


 [ 
https://issues.apache.org/jira/browse/HBASE-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6405:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Create Hadoop compatibilty modules and Metrics2 implementation of replication 
 metrics
 -

 Key: HBASE-6405
 URL: https://issues.apache.org/jira/browse/HBASE-6405
 Project: HBase
  Issue Type: Sub-task
Reporter: Zhihong Ted Yu
Assignee: Elliott Clark
 Fix For: 0.96.0

 Attachments: 6405.txt, HBASE-6405-ADD.patch, 
 hbase-6405-addendum-2-v2.patch, hbase-6405-addendum-2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6431) Some FilterList Constructors break addFilter

Alex Newman created HBASE-6431:
--

 Summary: Some FilterList Constructors break addFilter
 Key: HBASE-6431
 URL: https://issues.apache.org/jira/browse/HBASE-6431
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
Assignee: Alex Newman


Some of the constructors for FilterList set the internal list of filters to 
list types which don't support the add operation. As a result 

FilterList(final ListFilter rowFilters)
FilterList(final Filter... rowFilters)
FilterList(final Operator operator, final ListFilter rowFilters)
FilterList(final Operator operator, final Filter... rowFilters)

may init private ListFilter filters = new ArrayListFilter(); incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6431) Some FilterList Constructors break addFilter


 [ 
https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-6431:
---

Attachment: 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch

 Some FilterList Constructors break addFilter
 

 Key: HBASE-6431
 URL: https://issues.apache.org/jira/browse/HBASE-6431
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
Assignee: Alex Newman
 Attachments: 
 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch


 Some of the constructors for FilterList set the internal list of filters to 
 list types which don't support the add operation. As a result 
 FilterList(final ListFilter rowFilters)
 FilterList(final Filter... rowFilters)
 FilterList(final Operator operator, final ListFilter rowFilters)
 FilterList(final Operator operator, final Filter... rowFilters)
 may init private ListFilter filters = new ArrayListFilter(); incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6411) Move Master Metrics to metrics 2


 [ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6411:
-

Assignee: Elliott Clark  (was: Alex Baranau)
  Status: Patch Available  (was: Open)

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6411) Move Master Metrics to metrics 2


 [ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6411:
-

Attachment: HBASE-6411-0.patch

Here's a working implementation of master with metrics2.  It includes some 
tests but not a whole lot.  I plan to include a lot more once I am able to 
inject test metricsources (HBASE-6407).

It doesn't include histograms of the split size (HBASE-6409).

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Alex Baranau
 Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6431) Some FilterList Constructors break addFilter


 [ 
https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-6431:
---

Status: Patch Available  (was: Open)

 Some FilterList Constructors break addFilter
 

 Key: HBASE-6431
 URL: https://issues.apache.org/jira/browse/HBASE-6431
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
Assignee: Alex Newman
 Attachments: 
 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch


 Some of the constructors for FilterList set the internal list of filters to 
 list types which don't support the add operation. As a result 
 FilterList(final ListFilter rowFilters)
 FilterList(final Filter... rowFilters)
 FilterList(final Operator operator, final ListFilter rowFilters)
 FilterList(final Operator operator, final Filter... rowFilters)
 may init private ListFilter filters = new ArrayListFilter(); incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6429) Filter with filterRow() returning true is also incompatible with scan with limit

2012-07-19 Thread Jie Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418830#comment-13418830
 ] 

Jie Huang commented on HBASE-6429:
--

Oops.I will fix those 2 failures and regenerate the patch soon. Thanks Ted.

 Filter with filterRow() returning true is also incompatible with scan with 
 limit
 

 Key: HBASE-6429
 URL: https://issues.apache.org/jira/browse/HBASE-6429
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.96.0
Reporter: Jason Dai
 Attachments: hbase-6429_0_94_0.patch


 Currently if we scan with bot limit and a Filter with 
 filterRow(ListKeyValue) implemented, an  IncompatibleFilterException will 
 be thrown. The same exception should also be thrown if the filer has its 
 filterRow() implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6431) Some FilterList Constructors break addFilter


 [ 
https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-6431:
---

Priority: Minor  (was: Major)

 Some FilterList Constructors break addFilter
 

 Key: HBASE-6431
 URL: https://issues.apache.org/jira/browse/HBASE-6431
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
Assignee: Alex Newman
Priority: Minor
 Attachments: 
 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch


 Some of the constructors for FilterList set the internal list of filters to 
 list types which don't support the add operation. As a result 
 FilterList(final ListFilter rowFilters)
 FilterList(final Filter... rowFilters)
 FilterList(final Operator operator, final ListFilter rowFilters)
 FilterList(final Operator operator, final Filter... rowFilters)
 may init private ListFilter filters = new ArrayListFilter(); incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6431) Some FilterList Constructors break addFilter


 [ 
https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-6431:
---

  Component/s: filters
Affects Version/s: 0.92.1
   0.94.0

 Some FilterList Constructors break addFilter
 

 Key: HBASE-6431
 URL: https://issues.apache.org/jira/browse/HBASE-6431
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.92.1, 0.94.0
Reporter: Alex Newman
Assignee: Alex Newman
Priority: Minor
 Attachments: 
 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch


 Some of the constructors for FilterList set the internal list of filters to 
 list types which don't support the add operation. As a result 
 FilterList(final ListFilter rowFilters)
 FilterList(final Filter... rowFilters)
 FilterList(final Operator operator, final ListFilter rowFilters)
 FilterList(final Operator operator, final Filter... rowFilters)
 may init private ListFilter filters = new ArrayListFilter(); incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha


[ 
https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418842#comment-13418842
 ] 

Hudson commented on HBASE-5966:
---

Integrated in HBase-0.94 #344 (See 
[https://builds.apache.org/job/HBase-0.94/344/])
HBASE-5966 MapReduce based tests broken on Hadoop 2.0.0-alpha (Gregory 
Chanan) (Revision 1363586)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java


 MapReduce based tests broken on Hadoop 2.0.0-alpha
 --

 Key: HBASE-5966
 URL: https://issues.apache.org/jira/browse/HBASE-5966
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce, test
Affects Versions: 0.94.0, 0.96.0
 Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, 
 Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
Reporter: Andrew Purtell
Assignee: Jimmy Xiang
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, 
 HBASE-5966.patch, hbase-5966.patch


 Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test 
 rigging. Below is a representative error, can be easily reproduced with:
 {noformat}
 mvn -PlocalTests -Psecurity \
   -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \
   clean test \
   -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 {noformat}
 And the result:
 {noformat}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
  FAILURE!
 ---
 Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 ---
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
  FAILURE!
 testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce)  
 Time elapsed: 21.935 sec   ERROR!
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134)
   at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183)
   at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244)
   at 
 org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151)
   at 
 org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at

[jira] [Commented] (HBASE-6386) Audit log messages do not include column family / qualifier information consistently

2012-07-19 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418844#comment-13418844
 ] 

Marcelo Vanzin commented on HBASE-6386:
---

Other methods also seem to suffer from similar issues; for example, 
preIncrementColumnValue does this:

{code}
requirePermission(TablePermission.Action.WRITE, c.getEnvironment(),
Arrays.asList(new byte[][]{family}));
{code}

Even though there is a qualifier argument; so the qualifier information never 
makes it to the audit log. It also kinda sucks that there's no standard family 
map type for all these operations, so to come up with one common type for 
auditing, you'd have to make copies of that data (or use ugly wrapper objects).


 Audit log messages do not include column family / qualifier information 
 consistently
 

 Key: HBASE-6386
 URL: https://issues.apache.org/jira/browse/HBASE-6386
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Marcelo Vanzin

 The code related to this issue is in 
 AccessController.java:permissionGranted().
 When creating audit logs, that method will do one of the following:
 * grant access, create audit log with table name only
 * deny access because of table permission, create audit log with table name 
 only
 * deny access because of column family / qualifier permission, create audit 
 log with specific family / qualifier
 So, in the case where more than one column family and/or qualifier are in the 
 same request, there will be a loss of information. Even in the case where 
 only one column family and/or qualifier is involved, information may be lost.
 It would be better if this behavior consistently included all the information 
 in the request; regardless of access being granted or denied, and regardless 
 which permission caused the denial, the column family and qualifier info 
 should be part of the audit log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5843) Improve HBase MTTR - Mean Time To Recover

2012-07-19 Thread Gregory Chanan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418863#comment-13418863
 ] 

Gregory Chanan commented on HBASE-5843:
---

Looks great so far, nkeywal.

Some questions:

{quote}
2) Kill -9 of a RS; wait for all regions to become online again:
0.92: 980s
0.96: ~13s
= The 180s gap comes from HBASE-5844. For master, HBASE-5926 is not tested but 
should bring similar results.
{quote}

I'm confused as to what the 180s gap refers to.  I see 980 (test 2) - 800 
(test1) = 180, but that is against 0.92, which doesn't have HBASE-5970, right?  
Could you clarify?

{quote}
3) Start of the cluster after a clean stop; wait for all regions to
become online.
0.92: ~1020s
0.94: ~1023s (tested once only)
0.96: ~31s
= The benefit is visible at startup
= This does not come from something implemented for 0.94
{quote}

Awesome.. We think this is also due to HBASE-5970 and HBASE-6109? (since I 
assume HBASE-5844 and HBASE-5926 do not apply in this case).

{quote}
7) With 2 RS, Insert 20M simple puts; then kill -9 the second one. See how long 
it takes to have all the regions available.
0.92) 180s detection time+ then hangs twice out of 2 tests.
0.96) 14s (hangs once out of 3)
= There's a bug 
{quote}
Has a JIRA been filed?

{quote}
Test to be changed to get a real difference when we need to replay the wal.
{quote}
Could you clarify what you mean here?


 Improve HBase MTTR - Mean Time To Recover
 -

 Key: HBASE-5843
 URL: https://issues.apache.org/jira/browse/HBASE-5843
 Project: HBase
  Issue Type: Umbrella
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal

 A part of the approach is described here: 
 https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit
 The ideal target is:
 - failure impact client applications only by an added delay to execute a 
 query, whatever the failure.
 - this delay is always inferior to 1 second.
 We're not going to achieve that immediately...
 Priority will be given to the most frequent issues.
 Short term:
 - software crash
 - standard administrative tasks as stop/start of a cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418866#comment-13418866
 ] 

Zhihong Ted Yu commented on HBASE-6389:
---

I ran test suite with latest patch on trunk and got:
{code}
Running org.apache.hadoop.hbase.client.TestHCM
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec  
FAILURE!
--
Running org.apache.hadoop.hbase.client.TestAdmin
Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec 
 FAILURE!
--
Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec  
FAILURE!
--
Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine
Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec 
 FAILURE!
{code}
There was one hanging test:
{code}
at 
org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183)
{code}

BTW what do R sub i, C and F sub i represent in the formula above ?

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart =

[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6389:
--

Attachment: testReplication.jstack

jstack for the hanging TestReplication

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
 testReplication.jstack


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418866#comment-13418866
 ] 

Zhihong Ted Yu edited comment on HBASE-6389 at 7/20/12 1:37 AM:


I ran test suite with latest patch on trunk and got:
{code}
Running org.apache.hadoop.hbase.client.TestHCM
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec  
FAILURE!
--
Running org.apache.hadoop.hbase.client.TestAdmin
Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec 
 FAILURE!
--
Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec  
FAILURE!
--
Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine
Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec 
 FAILURE!
{code}
There was one hanging test:
{code}
at 
org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183)
{code}

BTW what do R~i~, C and F~i~ represent in the formula above ?

  was (Author: zhi...@ebaysf.com):
I ran test suite with latest patch on trunk and got:
{code}
Running org.apache.hadoop.hbase.client.TestHCM
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec  
FAILURE!
--
Running org.apache.hadoop.hbase.client.TestAdmin
Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec 
 FAILURE!
--
Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec  
FAILURE!
--
Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine
Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec 
 FAILURE!
{code}
There was one hanging test:
{code}
at 
org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183)
{code}

BTW what do R sub i, C and F sub i represent in the formula above ?
  
 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
 testReplication.jstack


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To

[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418866#comment-13418866
 ] 

Zhihong Ted Yu edited comment on HBASE-6389 at 7/20/12 1:41 AM:


I ran test suite with latest patch on trunk and got:
{code}
Running org.apache.hadoop.hbase.client.TestHCM
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec  
FAILURE!
--
Running org.apache.hadoop.hbase.client.TestAdmin
Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec 
 FAILURE!
--
Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec  
FAILURE!
--
Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine
Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec 
 FAILURE!
{code}
There was one hanging test:
{code}
at 
org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183)
{code}

BTW what do *R*~i~, C and *F*~i~ represent in the formula above ?

  was (Author: zhi...@ebaysf.com):
I ran test suite with latest patch on trunk and got:
{code}
Running org.apache.hadoop.hbase.client.TestHCM
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec  
FAILURE!
--
Running org.apache.hadoop.hbase.client.TestAdmin
Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec 
 FAILURE!
--
Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec  
FAILURE!
--
Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine
Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec 
 FAILURE!
{code}
There was one hanging test:
{code}
at 
org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183)
{code}

BTW what do R~i~, C and F~i~ represent in the formula above ?
  
 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
 testReplication.jstack


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To

[jira] [Updated] (HBASE-6363) HBaseConfiguration can carry a main method that dumps XML output for debug purposes

2012-07-19 Thread Shengsheng Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shengsheng Huang updated HBASE-6363:


Attachment: HBASE-6363.2.patch

Updated the patch according to @Harsh's comments. Actually we did the patch for 
automation purposes. Http master/dump contains much more information than we 
needed. 

 HBaseConfiguration can carry a main method that dumps XML output for debug 
 purposes
 ---

 Key: HBASE-6363
 URL: https://issues.apache.org/jira/browse/HBASE-6363
 Project: HBase
  Issue Type: Improvement
  Components: util
Affects Versions: 0.94.0
Reporter: Harsh J
Priority: Trivial
  Labels: conf, newbie, noob
 Attachments: HBASE-6363.2.patch, HBASE-6363.patch


 Just like the Configuration class carries a main() method in it, that simply 
 loads itself and writes XML out to System.out, HBaseConfiguration can use the 
 same kinda method.
 That way we can do hbase org.apache.hadoop.….HBaseConfiguration to get an 
 Xml dump of things HBaseConfiguration has properly loaded. Nifty in checking 
 app classpaths sometimes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive


[ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418880#comment-13418880
 ] 

Hudson commented on HBASE-6325:
---

Integrated in HBase-0.92 #480 (See 
[https://builds.apache.org/job/HBase-0.92/480/])
HBASE-6319  ReplicationSource can call terminate on itself and deadlock
HBASE-6325  [replication] Race in ReplicationSourceManager.init can initiate a 
failover even if the node is alive (Revision 1363571)

 Result = FAILURE
jdcryans : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


 [replication] Race in ReplicationSourceManager.init can initiate a failover 
 even if the node is alive
 -

 Key: HBASE-6325
 URL: https://issues.apache.org/jira/browse/HBASE-6325
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch


 Yet another bug found during the leap second madness, it's possible to miss 
 the registration of new region servers so that in 
 ReplicationSourceManager.init we start the failover of a live and replicating 
 region server. I don't think there's data loss but the RS that's being failed 
 over will die on:
 {noformat}
 2012-07-01 06:25:15,604 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 sv4r23s48,10304,1341112194623: Writing replication status
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
 at 
 org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
 {noformat}
 It seems to me that just refreshing {{otherRegionServers}} after getting the 
 list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock


[ 
https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418879#comment-13418879
 ] 

Hudson commented on HBASE-6319:
---

Integrated in HBase-0.92 #480 (See 
[https://builds.apache.org/job/HBase-0.92/480/])
HBASE-6319  ReplicationSource can call terminate on itself and deadlock
HBASE-6325  [replication] Race in ReplicationSourceManager.init can initiate a 
failover even if the node is alive (Revision 1363571)

 Result = FAILURE
jdcryans : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


 ReplicationSource can call terminate on itself and deadlock
 ---

 Key: HBASE-6319
 URL: https://issues.apache.org/jira/browse/HBASE-6319
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6319-0.92.patch


 In a few places in the ReplicationSource code calls terminate on itself which 
 is a problem since in terminate() we wait on that thread to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6363) HBaseConfiguration can carry a main method that dumps XML output for debug purposes

2012-07-19 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418882#comment-13418882
]

Harsh J commented on HBASE-6363:

Thanks again Shengsheng.

The /dump servlet is more verbose than the simple XML given by /conf servlet.
If its just config you need, /conf is where you need to go to, not /dump. But
for the sake of debuggability, suggesting /dump in the javadoc does seem fine
to do for HBase.

I think the patch looks good. If needed, we can switch /dump with /conf (since
we're discussing just configs, not env. info as well), but otherwise I think it
does what the goal of this report was. Thanks again!

HBaseConfiguration can carry a main method that dumps XML output for debug
purposes
---

Key: HBASE-6363
URL: https://issues.apache.org/jira/browse/HBASE-6363
Project: HBase
Issue Type: Improvement
Components: util
Affects Versions: 0.94.0
Reporter: Harsh J
Priority: Trivial
Labels: conf, newbie, noob
Attachments: HBASE-6363.2.patch, HBASE-6363.patch

Just like the Configuration class carries a main() method in it, that simply
loads itself and writes XML out to System.out, HBaseConfiguration can use the
same kinda method.
That way we can do hbase org.apache.hadoop.….HBaseConfiguration to get an
Xml dump of things HBaseConfiguration has properly loaded. Nifty in checking
app classpaths sometimes.

[jira] [Commented] (HBASE-6417) hbck merges .META. regions if there's an old leftover


[ 
https://issues.apache.org/jira/browse/HBASE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418887#comment-13418887
 ] 

Jonathan Hsieh commented on HBASE-6417:
---

Feels like we could add an option to not do repairs on META unless forced to.

 hbck merges .META. regions if there's an old leftover
 -

 Key: HBASE-6417
 URL: https://issues.apache.org/jira/browse/HBASE-6417
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
 Fix For: 0.96.0, 0.94.2

 Attachments: hbck.log


 Trying to see what caused HBASE-6310, one of the things I figured is that the 
 bad .META. row is actually one from the time that we were permitting meta 
 splitting and that folder had just been staying there for a while.
 So I tried to recreate the issue with -repair and it merged my good .META. 
 region with the one that's 3 years old that also has the same start key. I 
 ended up with a brand new .META. region!
 I'll be attaching the full log in a separate file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-19 Thread Alex Baranau (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418891#comment-13418891
]

Alex Baranau commented on HBASE-6411:
-

Glanced over your patch. I like this way better (over initial patch at 4050):
exposing the real interface of MetricsSource (in this case master metrics).
I.e. with methods defines, not empty + hashmap.

1. What do you think about having MasterMetricsFactory available through
compat module (created by CompatibilitySingletonFactory?) which is creating
MetricsSource, like this:

interface MasterMetricsFactory {
MasterMetricsSource create(final String name, final String sessionId);
}

This way we could pass parameters and control creation of metrics source.

2. Independent on the above: how about removing BaseMetricsSource interface
from compat as we don't really need it with explicit definition of metrics in
sources? This way current BaseMetricsSourceImpl could be renamed to
MetricsRegistry and used via composition (as a field) in metrics sources
instead of realization. Thus, creating initializing of the sources which
might be different for each could stay in metrics source implementation itself.
Including deciding on using JvmMetricsSource (I assume not every source should
create it), etc.
This way they would look as normal metricsSources from hadoop codebase, just
that they will use hbase's MetricsRegistry which allows metrics removals.

Thoughts?

Move Master Metrics to metrics 2

Key: HBASE-6411
URL: https://issues.apache.org/jira/browse/HBASE-6411
Project: HBase
Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Elliott Clark
Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch

Move Master Metrics to metrics 2

[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-19 Thread ShiXing (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418892#comment-13418892
 ] 

ShiXing commented on HBASE-3725:


@Ted
bq.  I generate a region with 3 store files. The increment slows from 1810 tps 
to 1020 tps, it slows 43.6%, .
The tps is increment the same rowkey.


The performance depends on how frequently the memstore flushed to the store 
file. If I also do the same test case, the latest patch's performance is same 
as the orig, because the increment rowkey is always in the memstore, and we do 
not need to read the store file. 

The difference is only for the rowKey that can't get the value from memstore, 
it need do a more read from memstore , compared to the 0.92 trunk: read only 
from store file.

You must know, the orig's high performance is just benefit by only read from 
the memstore.

 HBase increments from old value after delete and write to disk
 --

 Key: HBASE-3725
 URL: https://issues.apache.org/jira/browse/HBASE-3725
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.1
Reporter: Nathaniel Cook
Assignee: Jonathan Gray
 Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
 HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
 HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch


 Deleted row values are sometimes used for starting points on new increments.
 To reproduce:
 Create a row r. Set column x to some default value.
 Force hbase to write that value to the file system (such as restarting the 
 cluster).
 Delete the row.
 Call table.incrementColumnValue with some_value
 Get the row.
 The returned value in the column was incremented from the old value before 
 the row was deleted instead of being initialized to some_value.
 Code to reproduce:
 {code}
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.HColumnDescriptor;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Get;
 import org.apache.hadoop.hbase.client.HBaseAdmin;
 import org.apache.hadoop.hbase.client.HTableInterface;
 import org.apache.hadoop.hbase.client.HTablePool;
 import org.apache.hadoop.hbase.client.Increment;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.util.Bytes;
 public class HBaseTestIncrement
 {
   static String tableName  = testIncrement;
   static byte[] infoCF = Bytes.toBytes(info);
   static byte[] rowKey = Bytes.toBytes(test-rowKey);
   static byte[] newInc = Bytes.toBytes(new);
   static byte[] oldInc = Bytes.toBytes(old);
   /**
* This code reproduces a bug with increment column values in hbase
* Usage: First run part one by passing '1' as the first arg
*Then restart the hbase cluster so it writes everything to disk
*Run part two by passing '2' as the first arg
*
* This will result in the old deleted data being found and used for 
 the increment calls
*
* @param args
* @throws IOException
*/
   public static void main(String[] args) throws IOException
   {
   if(1.equals(args[0]))
   partOne();
   if(2.equals(args[0]))
   partTwo();
   if (both.equals(args[0]))
   {
   partOne();
   partTwo();
   }
   }
   /**
* Creates a table and increments a column value 10 times by 10 each 
 time.
* Results in a value of 100 for the column
*
* @throws IOException
*/
   static void partOne()throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HBaseAdmin admin = new HBaseAdmin(conf);
   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
   tableDesc.addFamily(new HColumnDescriptor(infoCF));
   if(admin.tableExists(tableName))
   {
   admin.disableTable(tableName);
   admin.deleteTable(tableName);
   }
   admin.createTable(tableDesc);
   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
   //Increment unitialized column
   for (int j = 0; j  10; j++)
   {
   table.incrementColumnValue(rowKey, infoCF, oldInc, 
 (long)10);
   Increment inc = new

[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418866#comment-13418866
 ] 

Zhihong Ted Yu edited comment on HBASE-6389 at 7/20/12 2:53 AM:


I ran test suite with latest patch on trunk and got:
{code}
Failed tests:   
testRunThriftServer[12](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine):
 expected:1 but was:0
  
testRunThriftServer[14](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine):
 expected:1 but was:0
  
testRunThriftServer[15](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine):
 expected:1 but was:0
  
testRunThriftServer[16](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine):
 expected:1 but was:0
  
testRunThriftServer[17](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine):
 expected:1 but was:0

Tests in error:
  testRegionCaching(org.apache.hadoop.hbase.client.TestHCM): 
org.apache.hadoop.hbase.UnknownRegionException: bd992463917ba68fe5389c5bf9e94a3a
  
testCloseRegionThatFetchesTheHRIFromMeta(org.apache.hadoop.hbase.client.TestAdmin):
 -1
  testTableExists(org.apache.hadoop.hbase.catalog.TestMetaReaderEditor): 
org.apache.hadoop.hbase.TableNotEnabledException: testTableExists
  
testRunThriftServer[11](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine):
 test timed out after 6 milliseconds
  
testRunThriftServer[13](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine):
 test timed out after 6 milliseconds
{code}
There was one hanging test:
{code}
at 
org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183)
{code}

BTW what do *R*~i~, C and *F*~i~ represent in the formula above ?

  was (Author: zhi...@ebaysf.com):
I ran test suite with latest patch on trunk and got:
{code}
Running org.apache.hadoop.hbase.client.TestHCM
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec  
FAILURE!
--
Running org.apache.hadoop.hbase.client.TestAdmin
Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec 
 FAILURE!
--
Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec  
FAILURE!
--
Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine
Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec 
 FAILURE!
{code}
There was one hanging test:
{code}
at 
org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183)
{code}

BTW what do *R*~i~, C and *F*~i~ represent in the formula above ?
  
 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
 testReplication.jstack


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418895#comment-13418895
 ] 

Hadoop QA commented on HBASE-6389:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12537286/testReplication.jstack
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2416//console

This message is automatically generated.

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
 testReplication.jstack


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count

[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-19 Thread Alex Baranau (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418898#comment-13418898
 ] 

Alex Baranau commented on HBASE-6411:
-

Looks like you reassigned the task, so I should probably not touch the patch to 
avoid intersection, right?

Was going to add actual metrics tests (which test metrics values changes in 
addition to testing factories/classes loading) and perhaps apply the 2nd point 
above, if it makes sense to you.

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments


 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6389:
--

Status: Open  (was: Patch Available)

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
 testReplication.jstack


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6363) HBaseConfiguration can carry a main method that dumps XML output for debug purposes

2012-07-19 Thread Shengsheng Huang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418900#comment-13418900
]

Shengsheng Huang commented on HBASE-6363:
-

Thanks very much for clarification Harsh. It seems /conf is only added into
Hadoop since release 0.21 (HADOOP-6408). As we're using hadoop v1 it didn't
work at our local cluster. We would consider adding HADOOP-6408 patch into our
local hadoop branch. After all, servlet config dump would contain all the
configuration changes in code. Anyway, do you think it worth a seperate servlet
to dump configuration as xml only? Or reorganize the dump output into more
consistent format to make it easier for automatic parsing?

HBaseConfiguration can carry a main method that dumps XML output for debug
purposes
---

[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk


[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418904#comment-13418904
 ] 

Zhihong Ted Yu commented on HBASE-3725:
---

Looking at existing code:
{code}
  private ListKeyValue getLastIncrement(final Get get) throws IOException {
InternalScan iscan = new InternalScan(get);
{code}
iscan was assigned at the beginning. Looks like the assignment in else block is 
redundant.

TestHRegion#testIncrementWithFlushAndDelete passed without that assignment.

 HBase increments from old value after delete and write to disk
 --

 Key: HBASE-3725
 URL: https://issues.apache.org/jira/browse/HBASE-3725
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.1
Reporter: Nathaniel Cook
Assignee: Jonathan Gray
 Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
 HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
 HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch


 Deleted row values are sometimes used for starting points on new increments.
 To reproduce:
 Create a row r. Set column x to some default value.
 Force hbase to write that value to the file system (such as restarting the 
 cluster).
 Delete the row.
 Call table.incrementColumnValue with some_value
 Get the row.
 The returned value in the column was incremented from the old value before 
 the row was deleted instead of being initialized to some_value.
 Code to reproduce:
 {code}
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.HColumnDescriptor;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Get;
 import org.apache.hadoop.hbase.client.HBaseAdmin;
 import org.apache.hadoop.hbase.client.HTableInterface;
 import org.apache.hadoop.hbase.client.HTablePool;
 import org.apache.hadoop.hbase.client.Increment;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.util.Bytes;
 public class HBaseTestIncrement
 {
   static String tableName  = testIncrement;
   static byte[] infoCF = Bytes.toBytes(info);
   static byte[] rowKey = Bytes.toBytes(test-rowKey);
   static byte[] newInc = Bytes.toBytes(new);
   static byte[] oldInc = Bytes.toBytes(old);
   /**
* This code reproduces a bug with increment column values in hbase
* Usage: First run part one by passing '1' as the first arg
*Then restart the hbase cluster so it writes everything to disk
*Run part two by passing '2' as the first arg
*
* This will result in the old deleted data being found and used for 
 the increment calls
*
* @param args
* @throws IOException
*/
   public static void main(String[] args) throws IOException
   {
   if(1.equals(args[0]))
   partOne();
   if(2.equals(args[0]))
   partTwo();
   if (both.equals(args[0]))
   {
   partOne();
   partTwo();
   }
   }
   /**
* Creates a table and increments a column value 10 times by 10 each 
 time.
* Results in a value of 100 for the column
*
* @throws IOException
*/
   static void partOne()throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HBaseAdmin admin = new HBaseAdmin(conf);
   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
   tableDesc.addFamily(new HColumnDescriptor(infoCF));
   if(admin.tableExists(tableName))
   {
   admin.disableTable(tableName);
   admin.deleteTable(tableName);
   }
   admin.createTable(tableDesc);
   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
   //Increment unitialized column
   for (int j = 0; j  10; j++)
   {
   table.incrementColumnValue(rowKey, infoCF, oldInc, 
 (long)10);
   Increment inc = new Increment(rowKey);
   inc.addColumn(infoCF, newInc, (long)10);
   table.increment(inc);
   }
   Get get = new Get(rowKey);
   Result r = table.get(get);
   System.out.println(initial values: new  + 
 Bytes.toLong(r.getValue(infoCF, newInc)) +  old  +

[jira] [Resolved] (HBASE-6345) Utilize fault injection in testing using AspectJ


 [ 
https://issues.apache.org/jira/browse/HBASE-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu resolved HBASE-6345.
---

Resolution: Won't Fix

There was not enough incentive to pursue fault injection using AspectJ.

 Utilize fault injection in testing using AspectJ
 

 Key: HBASE-6345
 URL: https://issues.apache.org/jira/browse/HBASE-6345
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Ted Yu

 HDFS uses fault injection to test pipeline failure in addition to mock, spy. 
 HBase uses mock, spy. But there are cases where mock, spy aren't convenient.
 Some example from DFSClientAspects.aj :
 {code}
   pointcut pipelineInitNonAppend(DataStreamer datastreamer):
 callCreateBlockOutputStream(datastreamer)
  cflow(execution(* nextBlockOutputStream(..)))
  within(DataStreamer);
   after(DataStreamer datastreamer) returning : 
 pipelineInitNonAppend(datastreamer) {
 LOG.info(FI: after pipelineInitNonAppend: hasError=
 + datastreamer.hasError +  errorIndex= + datastreamer.errorIndex);
 if (datastreamer.hasError) {
   DataTransferTest dtTest = DataTransferTestUtil.getDataTransferTest();
   if (dtTest != null)
 dtTest.fiPipelineInitErrorNonAppend.run(datastreamer.errorIndex);
 }
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6363) HBaseConfiguration can carry a main method that dumps XML output for debug purposes

2012-07-19 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418917#comment-13418917
 ] 

Harsh J commented on HBASE-6363:


Sorry, I didn't notice 1.x didn't have it! (I checked only against my 2.x 
installation, and CDH3 here seems to have had it backported at some point too). 
Instead of working around, I think we can rather backport it to a v1 future 
release, via: HADOOP-8567.

 HBaseConfiguration can carry a main method that dumps XML output for debug 
 purposes
 ---

 Key: HBASE-6363
 URL: https://issues.apache.org/jira/browse/HBASE-6363
 Project: HBase
  Issue Type: Improvement
  Components: util
Affects Versions: 0.94.0
Reporter: Harsh J
Priority: Trivial
  Labels: conf, newbie, noob
 Attachments: HBASE-6363.2.patch, HBASE-6363.patch


 Just like the Configuration class carries a main() method in it, that simply 
 loads itself and writes XML out to System.out, HBaseConfiguration can use the 
 same kinda method.
 That way we can do hbase org.apache.hadoop.….HBaseConfiguration to get an 
 Xml dump of things HBaseConfiguration has properly loaded. Nifty in checking 
 app classpaths sometimes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5843) Improve HBase MTTR - Mean Time To Recover

2012-07-19 Thread nkeywal (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418919#comment-13418919
]

nkeywal commented on HBASE-5843:

bq. I'm confused as to what the 180s gap refers to. I see 980 (test 2) - 800
(test1) = 180, but that is against 0.92, which doesn't have HBASE-5970, right?
Could you clarify?
Yes, it's because with a clean stop, the RS unregisters itself in ZK, so the
recovery starts immediately. With a kill -9, the RS remains registered in ZK.
So if you don't have HBASE-5844 or HBASE-5926, you wait for the ZK timeout.

bq. Awesome.. We think this is also due to HBASE-5970 and HBASE-6109?
Yes.

bq. Has a JIRA been filed?
Not yet. I'm writing specific unit tests for this, I found issues that I have
not yet fully analyzed, and I need to create the jiras. Also, may be my test
was not good for this part: as I was doing the test without a datanode, it
could be that the recovery was not working for this reason (I wonder if the
sync works with the local file system for example).

bq. Test to be changed to get a real difference when we need to replay the wal.
bq. Could you clarify what you mean here?
It's does not last long enough, so I won't be able to see much difference even
if there is one. So I need to redo the work with a real datanode, check that it
recovers, then check that I measure something meaningful.
I will also redo the first tests with a DN to see if there is still a gap.

Improve HBase MTTR - Mean Time To Recover
-

Key: HBASE-5843
URL: https://issues.apache.org/jira/browse/HBASE-5843
Project: HBase
Issue Type: Umbrella
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal

A part of the approach is described here:
https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit
The ideal target is:
- failure impact client applications only by an added delay to execute a
query, whatever the failure.
- this delay is always inferior to 1 second.
We're not going to achieve that immediately...
Priority will be given to the most frequent issues.
Short term:
- software crash
- standard administrative tasks as stop/start of a cluster.

[jira] [Created] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf

Francis Liu created HBASE-6432:
--

 Summary: HRegionServer doesn't properly set clusterId in conf
 Key: HBASE-6432
 URL: https://issues.apache.org/jira/browse/HBASE-6432
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Francis Liu
Assignee: Francis Liu
 Fix For: 0.96.0


ClusterId is normally set into the passed conf during instantiation of an 
HTable class. In the case of a HRegionServer this is bypassed and set to 
default since getMaster() bypasses the class which sets clusterID clusterId 
since it uses HBaseRPC to create the proxy to create the proxy directly. 

This becomes a problem with clients (ie within a coprocessor) using delegation 
tokens for authentication. Since the token's service will be the correct 
clusterId and while the TokenSelector is looking for one with service default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf

[
https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Francis Liu updated HBASE-6432:
---

Attachment: HBASE-6432_94.patch

a patch for 0.94 to get feedback on the approach. Things changed significant
enough in trunk to need a separate patch. I'm hoping to get this backported to
0.94 since it is needed for security.

HRegionServer doesn't properly set clusterId in conf

Key: HBASE-6432
URL: https://issues.apache.org/jira/browse/HBASE-6432
Project: HBase
Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Francis Liu
Assignee: Francis Liu
Fix For: 0.96.0

Attachments: HBASE-6432_94.patch

ClusterId is normally set into the passed conf during instantiation of an
HTable class. In the case of a HRegionServer this is bypassed and set to
default since getMaster() bypasses the class which sets clusterID clusterId
since it uses HBaseRPC to create the proxy to create the proxy directly.
This becomes a problem with clients (ie within a coprocessor) using
delegation tokens for authentication. Since the token's service will be the
correct clusterId and while the TokenSelector is looking for one with service
default.

[jira] [Commented] (HBASE-6427) Pluggable policy for smallestReadPoint in HRegion


[ 
https://issues.apache.org/jira/browse/HBASE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418922#comment-13418922
 ] 

Lars Hofhansl commented on HBASE-6427:
--

Let me clarify what I mean by this:
If I wanted to implement an MVCC based optimistic transaction engine on top of 
HBase I would naturally want to use HBase's built in versioning (where 
possible).
In that case it is not clear a priori how many versions to keep or for how long 
(i.e. specifying VERSION/TTL is too static). The outside engine would need to 
determine that.
The simplest of all approaches would be to do that via the smallestReadpoint in 
each region, by making its determination pluggable.


 Pluggable policy for smallestReadPoint in HRegion
 -

 Key: HBASE-6427
 URL: https://issues.apache.org/jira/browse/HBASE-6427
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Priority: Minor

 When implementing higher level stores on top of HBase it is necessary to 
 allow dynamic control over how long KVs must be kept around.
 Semi-static config options for ColumnFamilies (# of version or TTL) is not 
 sufficient.
 The simplest way to achieve this is to have a pluggable class to determine 
 the smallestReadpoint for Region. That way outside code can control what KVs 
 to retain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2


[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418923#comment-13418923
 ] 

Elliott Clark commented on HBASE-6411:
--

Sorry didn't mean to re-assign.  I must have done that when submitting to 
hadoop qa.  Sorry I didn't mean to step on any toes.

I agree that a metrics factory or something like it could be very useful.  
However like I said above I was hoping to take a crack using guice to do most 
of the factory stuff.  However maybe until I get that up it would be useful.

On #2 I don't think removing them interface completely is really the way to go 
since both the replication metrics and the region server metrics are mostly 
dynamic metrics; ie they aren't pre-created like the master metrics. I think it 
still makes sense to have a source that's mostly focused on those map based 
metrics.

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6428) Pluggable Compaction policies


[ 
https://issues.apache.org/jira/browse/HBASE-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418924#comment-13418924
 ] 

Lars Hofhansl commented on HBASE-6428:
--

Another way of looking at this is a possible policy that considers all HFile in 
terms of a baseline + changes on top of that baseline.

(For the record: I am not saying that I will do this any time soon, just 
recording this as an idea).


 Pluggable Compaction policies
 -

 Key: HBASE-6428
 URL: https://issues.apache.org/jira/browse/HBASE-6428
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl

 For some usecases is useful to allow more control over how KVs get compacted.
 For example one could envision storing old versions of a KV separate HFiles, 
 which then rarely have to be touched/cached by queries querying for new data.
 In addition these date ranged HFile can be easily used for backups while 
 maintaining historical data.
 This would be a major change, allowing compactions to provide multiple 
 targets (not just a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently


 [ 
https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6406:
-

Fix Version/s: (was: 0.94.2)
   0.94.1
   0.96.0

 TestReplicationPeer.testResetZooKeeperSession and 
 TestZooKeeper.testClientSessionExpired fail frequently
 

 Key: HBASE-6406
 URL: https://issues.apache.org/jira/browse/HBASE-6406
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.1
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.96.0, 0.94.1

 Attachments: 6406.txt, testReplication.jstack, testZooKeeper.jstack


 Looking back through the 0.94 test runs these two tests accounted for 11 of 
 34 failed tests.
 They should be fixed or (temporarily) disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5498) Secure Bulk Load

[
https://issues.apache.org/jira/browse/HBASE-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Francis Liu updated HBASE-5498:
---

Attachment: HBASE-5498_draft_94.patch

Laxman, here's a working patch. It incorporates HBASE-6432 which took some time
debugging. I still have to address the other comments, some cleanup and TODOs.
Let me know if this works for you.

Secure Bulk Load

Key: HBASE-5498
URL: https://issues.apache.org/jira/browse/HBASE-5498
Project: HBase
Issue Type: Improvement
Components: mapred, security
Reporter: Francis Liu
Assignee: Francis Liu
Fix For: 0.96.0

Attachments: HBASE-5498_draft.patch, HBASE-5498_draft_94.patch

Design doc:
https://cwiki.apache.org/confluence/display/HCATALOG/HBase+Secure+Bulk+Load
Short summary:
Security as it stands does not cover the bulkLoadHFiles() feature. Users
calling this method will bypass ACLs. Also loading is made more cumbersome in
a secure setting because of hdfs privileges. bulkLoadHFiles() moves the data
from user's directory to the hbase directory, which would require certain
write access privileges set.
Our solution is to create a coprocessor which makes use of AuthManager to
verify if a user has write access to the table. If so, launches a MR job as
the hbase user to do the importing (ie rewrite from text to hfiles). One
tricky part this job will have to do is impersonate the calling user when
reading the input files. We can do this by expecting the user to pass an hdfs
delegation token as part of the secureBulkLoad() coprocessor call and extend
an inputformat to make use of that token. The output is written to a
temporary directory accessible only by hbase and then bulkloadHFiles() is
called.

[jira] [Commented] (HBASE-6431) Some FilterList Constructors break addFilter

[
https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418931#comment-13418931
]

Hadoop QA commented on HBASE-6431:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12537269/0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 4 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 12 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2417//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2417//console

This message is automatically generated.

Some FilterList Constructors break addFilter

Key: HBASE-6431
URL: https://issues.apache.org/jira/browse/HBASE-6431
Project: HBase
Issue Type: Bug
Components: filters
Affects Versions: 0.92.1, 0.94.0
Reporter: Alex Newman
Assignee: Alex Newman
Priority: Minor
Attachments:
0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch

Some of the constructors for FilterList set the internal list of filters to
list types which don't support the add operation. As a result
FilterList(final ListFilter rowFilters)
FilterList(final Filter... rowFilters)
FilterList(final Operator operator, final ListFilter rowFilters)
FilterList(final Operator operator, final Filter... rowFilters)
may init private ListFilter filters = new ArrayListFilter(); incorrectly.

[jira] [Updated] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf


 [ 
https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Liu updated HBASE-6432:
---

Description: 
ClusterId is normally set into the passed conf during instantiation of an 
HTable class. In the case of a HRegionServer this is bypassed and set to 
default since getMaster() since it uses HBaseRPC to create the proxy directly 
and bypasses the class which retrieves and sets the correct clusterId. 

This becomes a problem with clients (ie within a coprocessor) using delegation 
tokens for authentication. Since the token's service will be the correct 
clusterId and while the TokenSelector is looking for one with service default.

  was:
ClusterId is normally set into the passed conf during instantiation of an 
HTable class. In the case of a HRegionServer this is bypassed and set to 
default since getMaster() bypasses the class which sets clusterID clusterId 
since it uses HBaseRPC to create the proxy to create the proxy directly. 

This becomes a problem with clients (ie within a coprocessor) using delegation 
tokens for authentication. Since the token's service will be the correct 
clusterId and while the TokenSelector is looking for one with service default.


 HRegionServer doesn't properly set clusterId in conf
 

 Key: HBASE-6432
 URL: https://issues.apache.org/jira/browse/HBASE-6432
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Francis Liu
Assignee: Francis Liu
 Fix For: 0.96.0

 Attachments: HBASE-6432_94.patch


 ClusterId is normally set into the passed conf during instantiation of an 
 HTable class. In the case of a HRegionServer this is bypassed and set to 
 default since getMaster() since it uses HBaseRPC to create the proxy 
 directly and bypasses the class which retrieves and sets the correct 
 clusterId. 
 This becomes a problem with clients (ie within a coprocessor) using 
 delegation tokens for authentication. Since the token's service will be the 
 correct clusterId and while the TokenSelector is looking for one with service 
 default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-19 Thread ShiXing (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418933#comment-13418933
 ] 

ShiXing commented on HBASE-3725:


@Ted, the reassignment is because there is no interface to set the iscan back 
to both memstore and filestore, because at the begining, the iscan is set 
memstore
{code}
// memstore scan
iscan.checkOnlyMemStore();
{code}

 HBase increments from old value after delete and write to disk
 --

 Key: HBASE-3725
 URL: https://issues.apache.org/jira/browse/HBASE-3725
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.1
Reporter: Nathaniel Cook
Assignee: Jonathan Gray
 Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
 HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
 HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch


 Deleted row values are sometimes used for starting points on new increments.
 To reproduce:
 Create a row r. Set column x to some default value.
 Force hbase to write that value to the file system (such as restarting the 
 cluster).
 Delete the row.
 Call table.incrementColumnValue with some_value
 Get the row.
 The returned value in the column was incremented from the old value before 
 the row was deleted instead of being initialized to some_value.
 Code to reproduce:
 {code}
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.HColumnDescriptor;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Get;
 import org.apache.hadoop.hbase.client.HBaseAdmin;
 import org.apache.hadoop.hbase.client.HTableInterface;
 import org.apache.hadoop.hbase.client.HTablePool;
 import org.apache.hadoop.hbase.client.Increment;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.util.Bytes;
 public class HBaseTestIncrement
 {
   static String tableName  = testIncrement;
   static byte[] infoCF = Bytes.toBytes(info);
   static byte[] rowKey = Bytes.toBytes(test-rowKey);
   static byte[] newInc = Bytes.toBytes(new);
   static byte[] oldInc = Bytes.toBytes(old);
   /**
* This code reproduces a bug with increment column values in hbase
* Usage: First run part one by passing '1' as the first arg
*Then restart the hbase cluster so it writes everything to disk
*Run part two by passing '2' as the first arg
*
* This will result in the old deleted data being found and used for 
 the increment calls
*
* @param args
* @throws IOException
*/
   public static void main(String[] args) throws IOException
   {
   if(1.equals(args[0]))
   partOne();
   if(2.equals(args[0]))
   partTwo();
   if (both.equals(args[0]))
   {
   partOne();
   partTwo();
   }
   }
   /**
* Creates a table and increments a column value 10 times by 10 each 
 time.
* Results in a value of 100 for the column
*
* @throws IOException
*/
   static void partOne()throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HBaseAdmin admin = new HBaseAdmin(conf);
   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
   tableDesc.addFamily(new HColumnDescriptor(infoCF));
   if(admin.tableExists(tableName))
   {
   admin.disableTable(tableName);
   admin.deleteTable(tableName);
   }
   admin.createTable(tableDesc);
   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
   //Increment unitialized column
   for (int j = 0; j  10; j++)
   {
   table.incrementColumnValue(rowKey, infoCF, oldInc, 
 (long)10);
   Increment inc = new Increment(rowKey);
   inc.addColumn(infoCF, newInc, (long)10);
   table.increment(inc);
   }
   Get get = new Get(rowKey);
   Result r = table.get(get);
   System.out.println(initial values: new  + 
 Bytes.toLong(r.getValue(infoCF, newInc)) +  old  + 
 Bytes.toLong(r.getValue(infoCF, oldInc)));
   }
   /**
* First deletes the data then increments the column 10 times by 1

[jira] [Commented] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently