date:20120111

[jira] [Commented] (HBASE-5163) TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.")

2012-01-11 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184788#comment-13184788
 ] 

Hudson commented on HBASE-5163:
---

Integrated in HBase-0.92-security #72 (See 
[https://builds.apache.org/job/HBase-0.92-security/72/])
HBASE-5163  TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on 
Jenkins or hadoop QA
   ("The directory is already locked.") (N Keywal)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java


> TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or 
> hadoop QA ("The directory is already locked.")
> --
>
> Key: HBASE-5163
> URL: https://issues.apache.org/jira/browse/HBASE-5163
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 5163-92.txt, 5163.patch
>
>
> The stack is typically:
> {noformat}
>  type="java.io.IOException">java.io.IOException: Cannot lock storage 
> /tmp/19e3e634-8980-4923-9e72-a5b900a71d63/dfscluster_32a46f7b-24ef-488f-bd33-915959e001f4/dfs/data/data3.
>  The directory is already locked.
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:290)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1467)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:460)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:470)
> // ...
> {noformat}
> It can be reproduced without parallelization or without executing the other 
> tests in the class. It seems to fail about 5% of the time.
> This comes from the naming policy for the directories in 
> MiniDFSCluster#startDataNode. It depends on the number of nodes *currently* 
> in the cluster, and does not take into account previous starts/stops:
> {noformat}
>for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) {
>   if (manageDfsDirs) {
> File dir1 = new File(data_dir, "data"+(2*i+1));
> File dir2 = new File(data_dir, "data"+(2*i+2));
> dir1.mkdirs();
> dir2.mkdirs();
>   // [...]
> {noformat}
> This means that it if we want to stop/start a datanode, we should always stop 
> the last one, if not the names will conflict. This test exhibits the behavior:
> {noformat}
>   @Test
>   public void testMiniDFSCluster_startDataNode() throws Exception {
> assertTrue( dfsCluster.getDataNodes().size() == 2 );
> // Works, as we kill the last datanode, we can now start a datanode
> dfsCluster.stopDataNode(1);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
> // Fails, as it's not the last datanode, the directory will conflict on
> //  creation
> dfsCluster.stopDataNode(0);
> try {
>   dfsCluster
> .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
>   fail("There should be an exception because the directory already 
> exists");
> } catch (IOException e) {
>   assertTrue( e.getMessage().contains("The directory is already 
> locked."));
>   LOG.info("Expected (!) exception caught " + e.getMessage());
> }
> // Works, as we kill the last datanode, we can now restart 2 datanodes
> // This makes us back with 2 nodes
> dfsCluster.stopDataNode(0);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 2, true, null, null);
>   }
> {noformat}
> And then this behavior is randomly triggered in testLogRollOnDatanodeDeath 
> because when we do
> {noformat}
> DatanodeInfo[] pipeline = getPipeline(log);
> assertTrue(pipeline.length == fs.getDefaultReplication());
> {noformat}
> and then kill the datanodes in the pipeline, we will have

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184772#comment-13184772
 ] 

stack commented on HBASE-5179:
--

Sure.  Do what you fellas think best. 

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, 
> 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread ramkrishna.s.vasudevan (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184755#comment-13184755
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-5179 at 1/12/12 6:49 AM:


@Ted, @Stack @Chunhui

I think we may have to combine the change in HBASE-4748 as Chunhui suggested
12/Jan/12 03:23. Is it ok to combine it? Because only then the processFailOver  
and SSH problem can be solved totally.  Pls suggest.

Sorry for the typo

  was (Author: ram_krish):
@Ted, @Stack @Chunhui

I think we may have to combine the change in HBASE-4879 as Chunhui suggested
12/Jan/12 03:23. Is it ok to combine it? Because only then the processFailOver  
and SSH problem can be solved totally.  Pls suggest.
  
> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, 
> 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread chunhui shen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184764#comment-13184764
 ] 

chunhui shen commented on HBASE-5179:
-

I think so too

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, 
> 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184760#comment-13184760
 ] 

Zhihong Yu commented on HBASE-5179:
---

You mean hbase-4748, right ?
I think we should combine the two. 

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, 
> 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184755#comment-13184755
 ] 

ramkrishna.s.vasudevan commented on HBASE-5179:
---

@Ted, @Stack @Chunhui

I think we may have to combine the change in HBASE-4879 as Chunhui suggested
12/Jan/12 03:23. Is it ok to combine it? Because only then the processFailOver  
and SSH problem can be solved totally.  Pls suggest.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, 
> 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184754#comment-13184754
 ] 

stack commented on HBASE-5179:
--

I think getDeadServersInProgress is better than getDeadServersBeingProcessed 
since it relates to areDeadServersInProgress (I can fix this on commit -- would 
also change name of the Collection in DeadServers so its inProgress).

Yeah, would be interested in notion that we do this server checking inside in 
ServerManager so when you ask for onlineServers, this stuff has been done for 
you already... or is thought that ServerManager need not know about 
'handlers' that HMaster only should have to know whats running under it (A 
ServerManager and handlers such as ServerShutdownHandler).

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, 
> 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5163) TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.")

2012-01-11 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184741#comment-13184741
 ] 

Hudson commented on HBASE-5163:
---

Integrated in HBase-0.92 #241 (See 
[https://builds.apache.org/job/HBase-0.92/241/])
HBASE-5163  TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on 
Jenkins or hadoop QA
   ("The directory is already locked.") (N Keywal)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java


> TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or 
> hadoop QA ("The directory is already locked.")
> --
>
> Key: HBASE-5163
> URL: https://issues.apache.org/jira/browse/HBASE-5163
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 5163-92.txt, 5163.patch
>
>
> The stack is typically:
> {noformat}
>  type="java.io.IOException">java.io.IOException: Cannot lock storage 
> /tmp/19e3e634-8980-4923-9e72-a5b900a71d63/dfscluster_32a46f7b-24ef-488f-bd33-915959e001f4/dfs/data/data3.
>  The directory is already locked.
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:290)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1467)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:460)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:470)
> // ...
> {noformat}
> It can be reproduced without parallelization or without executing the other 
> tests in the class. It seems to fail about 5% of the time.
> This comes from the naming policy for the directories in 
> MiniDFSCluster#startDataNode. It depends on the number of nodes *currently* 
> in the cluster, and does not take into account previous starts/stops:
> {noformat}
>for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) {
>   if (manageDfsDirs) {
> File dir1 = new File(data_dir, "data"+(2*i+1));
> File dir2 = new File(data_dir, "data"+(2*i+2));
> dir1.mkdirs();
> dir2.mkdirs();
>   // [...]
> {noformat}
> This means that it if we want to stop/start a datanode, we should always stop 
> the last one, if not the names will conflict. This test exhibits the behavior:
> {noformat}
>   @Test
>   public void testMiniDFSCluster_startDataNode() throws Exception {
> assertTrue( dfsCluster.getDataNodes().size() == 2 );
> // Works, as we kill the last datanode, we can now start a datanode
> dfsCluster.stopDataNode(1);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
> // Fails, as it's not the last datanode, the directory will conflict on
> //  creation
> dfsCluster.stopDataNode(0);
> try {
>   dfsCluster
> .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
>   fail("There should be an exception because the directory already 
> exists");
> } catch (IOException e) {
>   assertTrue( e.getMessage().contains("The directory is already 
> locked."));
>   LOG.info("Expected (!) exception caught " + e.getMessage());
> }
> // Works, as we kill the last datanode, we can now restart 2 datanodes
> // This makes us back with 2 nodes
> dfsCluster.stopDataNode(0);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 2, true, null, null);
>   }
> {noformat}
> And then this behavior is randomly triggered in testLogRollOnDatanodeDeath 
> because when we do
> {noformat}
> DatanodeInfo[] pipeline = getPipeline(log);
> assertTrue(pipeline.length == fs.getDefaultReplication());
> {noformat}
> and then kill the datanodes in the pipeline, we will have:
>  - most of t

[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5179:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510311/5179-90v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/744//console

This message is automatically generated.)

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, 
> 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184737#comment-13184737
 ] 

Zhihong Yu commented on HBASE-4720:
---

Latest patch passed unit tests.

> Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
> client/server 
> 
>
> Key: HBASE-4720
> URL: https://issues.apache.org/jira/browse/HBASE-4720
> Project: HBase
>  Issue Type: Improvement
>Reporter: Daniel Lord
>Assignee: Mubarak Seyed
> Fix For: 0.94.0
>
> Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, 
> HBASE-4720.trunk.v3.patch, HBASE-4720.trunk.v4.patch, HBASE-4720.v1.patch, 
> HBASE-4720.v3.patch
>
>
> I have several large application/HBase clusters where an application node 
> will occasionally need to talk to HBase from a different cluster.  In order 
> to help ensure some of my consistency guarantees I have a sentinel table that 
> is updated atomically as users interact with the system.  This works quite 
> well for the "regular" hbase client but the REST client does not implement 
> the checkAndPut and checkAndDelete operations.  This exposes the application 
> to some race conditions that have to be worked around.  It would be ideal if 
> the same checkAndPut/checkAndDelete operations could be supported by the REST 
> client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184734#comment-13184734
 ] 

Hadoop QA commented on HBASE-5179:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510311/5179-90v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/744//console

This message is automatically generated.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, 
> 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread chunhui shen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5179:


Attachment: 5179-90v2.patch

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, 
> 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5178) Backport HBASE-4101 - Regionserver Deadlock

2012-01-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5178:
--

Fix Version/s: 0.90.6

> Backport HBASE-4101 - Regionserver Deadlock
> ---
>
> Key: HBASE-5178
> URL: https://issues.apache.org/jira/browse/HBASE-5178
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.90.6
>
> Attachments: HBASE-4101_0.90_1.patch
>
>
> Critical issue not merged to 0.90.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5184) Backport HBASE-5152 - Region is on service before completing initialization when doing rollback of split, it will affect read correctness

2012-01-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5184:
--

Fix Version/s: 0.90.6

> Backport HBASE-5152 - Region is on service before completing initialization 
> when doing rollback of split, it will affect read correctness 
> --
>
> Key: HBASE-5184
> URL: https://issues.apache.org/jira/browse/HBASE-5184
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.90.6
>
>
> Important issue to be merged into 0.90.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5160) Backport HBASE-4397 - -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time

2012-01-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5160:
--

Fix Version/s: 0.90.6

> Backport HBASE-4397 - -ROOT-, .META. tables stay offline for too long in 
> recovery phase after all RSs are shutdown at the same time
> ---
>
> Key: HBASE-5160
> URL: https://issues.apache.org/jira/browse/HBASE-5160
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.90.6
>
>
> Backporting to 0.90.6 considering the importance of the issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5168) Backport HBASE-5100 - Rollback of split could cause closed region to be opened again

2012-01-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5168:
--

Fix Version/s: 0.90.6

> Backport HBASE-5100 - Rollback of split could cause closed region to be 
> opened again
> 
>
> Key: HBASE-5168
> URL: https://issues.apache.org/jira/browse/HBASE-5168
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.90.6
>
> Attachments: HBASE-5100_0.90.patch
>
>
> Considering the importance of the defect merging it to 0.90.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5158) Backport HBASE-4878 - Master crash when splitting hlog may cause data loss

2012-01-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5158:
--

Fix Version/s: 0.90.6

> Backport HBASE-4878 - Master crash when splitting hlog may cause data loss
> --
>
> Key: HBASE-5158
> URL: https://issues.apache.org/jira/browse/HBASE-5158
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.90.6
>
> Attachments: HBASE-4878_branch90_1.patch
>
>
> Backporting to 0.90.6 considering the importance of the issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5157) Backport HBASE-4880- Region is on service before openRegionHandler completes, may cause data loss

2012-01-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5157:
--

Fix Version/s: 0.90.6

> Backport HBASE-4880- Region is on service before openRegionHandler completes, 
> may cause data loss
> -
>
> Key: HBASE-5157
> URL: https://issues.apache.org/jira/browse/HBASE-5157
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.90.6
>
> Attachments: HBASE-4880_branch90_1.patch
>
>
> Backporting to 0.90.6 considering the importance of the issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5156) Backport HBASE-4899 - Region would be assigned twice easily with continually killing server and moving region in testing environment

2012-01-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5156:
--

Fix Version/s: 0.90.6

> Backport HBASE-4899 -  Region would be assigned twice easily with continually 
> killing server and moving region in testing environment
> -
>
> Key: HBASE-5156
> URL: https://issues.apache.org/jira/browse/HBASE-5156
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.90.6
>
> Attachments: HBASE-4899_Branch90_1.patch
>
>
> Need to backport to 0.90.6 considering the criticality of the issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2012-01-11 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184717#comment-13184717
 ] 

Hadoop QA commented on HBASE-4720:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12510309/HBASE-4720.trunk.v4.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -146 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 81 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/743//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/743//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/743//console

This message is automatically generated.

> Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
> client/server 
> 
>
> Key: HBASE-4720
> URL: https://issues.apache.org/jira/browse/HBASE-4720
> Project: HBase
>  Issue Type: Improvement
>Reporter: Daniel Lord
>Assignee: Mubarak Seyed
> Fix For: 0.94.0
>
> Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, 
> HBASE-4720.trunk.v3.patch, HBASE-4720.trunk.v4.patch, HBASE-4720.v1.patch, 
> HBASE-4720.v3.patch
>
>
> I have several large application/HBase clusters where an application node 
> will occasionally need to talk to HBase from a different cluster.  In order 
> to help ensure some of my consistency guarantees I have a sentinel table that 
> is updated atomically as users interact with the system.  This works quite 
> well for the "regular" hbase client but the REST client does not implement 
> the checkAndPut and checkAndDelete operations.  This exposes the application 
> to some race conditions that have to be worked around.  It would be ideal if 
> the same checkAndPut/checkAndDelete operations could be supported by the REST 
> client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5184) Backport HBASE-5152 - Region is on service before completing initialization when doing rollback of split, it will affect read correctness

2012-01-11 Thread ramkrishna.s.vasudevan (Created) (JIRA)

Backport HBASE-5152 - Region is on service before completing initialization 
when doing rollback of split, it will affect read correctness 
--

 Key: HBASE-5184
 URL: https://issues.apache.org/jira/browse/HBASE-5184
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan


Important issue to be merged into 0.90.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4748) Race between creating recovered edits for META and master assigning ROOT and META.

2012-01-11 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184715#comment-13184715
 ] 

ramkrishna.s.vasudevan commented on HBASE-4748:
---

@Chunhui
Ok let me check with your suggestion and then upload the patch. :) thanks

> Race between creating recovered edits for META and master assigning ROOT and 
> META.
> --
>
> Key: HBASE-4748
> URL: https://issues.apache.org/jira/browse/HBASE-4748
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> 1. Start a cluster.
> 2. Alter a table
> 3. Restart the master using ./hbase-daemon.sh restart master
> 4. Kill the RS after master restarts.
> 5. Start RS again.
> 6. No table operations can be performed on the table that was altered but 
> admin.listTables() is able to list the altered table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2012-01-11 Thread Mubarak Seyed (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mubarak Seyed updated HBASE-4720:
-

Attachment: HBASE-4720.trunk.v4.patch

Tests were still failing for runMediumTests on trunk but i have fixed the 
TestRowResource. The attached file (HBASE-4720.trunk.v4.patch) is a latest 
patch. Thanks.

{code}
mvn clean test -P runMediumTests -Dtest=org.apache.hadoop.hbase.rest.*

Running org.apache.hadoop.hbase.rest.TestRowResource
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.099 se
{code}

> Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
> client/server 
> 
>
> Key: HBASE-4720
> URL: https://issues.apache.org/jira/browse/HBASE-4720
> Project: HBase
>  Issue Type: Improvement
>Reporter: Daniel Lord
>Assignee: Mubarak Seyed
> Fix For: 0.94.0
>
> Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, 
> HBASE-4720.trunk.v3.patch, HBASE-4720.trunk.v4.patch, HBASE-4720.v1.patch, 
> HBASE-4720.v3.patch
>
>
> I have several large application/HBase clusters where an application node 
> will occasionally need to talk to HBase from a different cluster.  In order 
> to help ensure some of my consistency guarantees I have a sentinel table that 
> is updated atomically as users interact with the system.  This works quite 
> well for the "regular" hbase client but the REST client does not implement 
> the checkAndPut and checkAndDelete operations.  This exposes the application 
> to some race conditions that have to be worked around.  It would be ideal if 
> the same checkAndPut/checkAndDelete operations could be supported by the REST 
> client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

2012-01-11 Thread Jieshan Bean (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184705#comment-13184705
 ] 

Jieshan Bean commented on HBASE-5153:
-

@Ted, the patch for TRUNK seems very different, and i still need some time to 
check it. hope i can provide today:)

@Stack, I think ConnectionUtils is reasonable. I can add it:). I will update 
the patch.

Thank you all.

> HConnection re-creation in HTable after HConnection abort
> -
>
> Key: HBASE-5153
> URL: https://issues.apache.org/jira/browse/HBASE-5153
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.90.6
>
> Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
> HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
> share a same connection, once this connection got abort in one thread, the 
> other threads will got a 
> "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal 
> HTable instance cann't be continue to use. The connection in HTable should be 
> recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable 
> instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184704#comment-13184704
 ] 

Hudson commented on HBASE-5033:
---

Integrated in HBase-TRUNK #2623 (See 
[https://builds.apache.org/job/HBase-TRUNK/2623/])
HBASE-5033 Differential Revision: 933 Opening/Closing store in parallel to 
reduce region open/close time (Liyin)

tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Threads.java


> Opening/Closing store in parallel to reduce region open/close time
> --
>
> Key: HBASE-5033
> URL: https://issues.apache.org/jira/browse/HBASE-5033
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, 
> D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch
>
>
> Region servers are opening/closing each store and each store file for every 
> store in sequential fashion, which may cause inefficiency to open/close 
> regions. 
> So this diff is to open/close each store in parallel in order to reduce 
> region open/close time. Also it would help to reduce the cluster restart time.
> 1) Opening each store in parallel
> 2) Loading each store file for every store in parallel
> 3) Closing each store in parallel
> 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184703#comment-13184703
 ] 

Hadoop QA commented on HBASE-5179:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510304/hbase-5179v5.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -147 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 80 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.regionserver.wal.TestHLog
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/741//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/741//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/741//console

This message is automatically generated.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, 
> hbase-5179.patch, hbase-5179v5.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5163) TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.")

2012-01-11 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184701#comment-13184701
 ] 

Hadoop QA commented on HBASE-5163:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510307/5163-92.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/742//console

This message is automatically generated.

> TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or 
> hadoop QA ("The directory is already locked.")
> --
>
> Key: HBASE-5163
> URL: https://issues.apache.org/jira/browse/HBASE-5163
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 5163-92.txt, 5163.patch
>
>
> The stack is typically:
> {noformat}
>  type="java.io.IOException">java.io.IOException: Cannot lock storage 
> /tmp/19e3e634-8980-4923-9e72-a5b900a71d63/dfscluster_32a46f7b-24ef-488f-bd33-915959e001f4/dfs/data/data3.
>  The directory is already locked.
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:290)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1467)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:460)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:470)
> // ...
> {noformat}
> It can be reproduced without parallelization or without executing the other 
> tests in the class. It seems to fail about 5% of the time.
> This comes from the naming policy for the directories in 
> MiniDFSCluster#startDataNode. It depends on the number of nodes *currently* 
> in the cluster, and does not take into account previous starts/stops:
> {noformat}
>for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) {
>   if (manageDfsDirs) {
> File dir1 = new File(data_dir, "data"+(2*i+1));
> File dir2 = new File(data_dir, "data"+(2*i+2));
> dir1.mkdirs();
> dir2.mkdirs();
>   // [...]
> {noformat}
> This means that it if we want to stop/start a datanode, we should always stop 
> the last one, if not the names will conflict. This test exhibits the behavior:
> {noformat}
>   @Test
>   public void testMiniDFSCluster_startDataNode() throws Exception {
> assertTrue( dfsCluster.getDataNodes().size() == 2 );
> // Works, as we kill the last datanode, we can now start a datanode
> dfsCluster.stopDataNode(1);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
> // Fails, as it's not the last datanode, the directory will conflict on
> //  creation
> dfsCluster.stopDataNode(0);
> try {
>   dfsCluster
> .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
>   fail("There should be an exception because the directory already 
> exists");
> } catch (IOException e) {
>   assertTrue( e.getMessage().contains("The directory is already 
> locked."));
>   LOG.info("Expected (!) exception caught " + e.getMessage());
> }
> // Works, as we kill the last datanode, we can now restart 2 datanodes
> // This makes us back with 2 nodes
> dfsCluster.stopDataNode(0);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 2, true, null, null);
>   }
> {noformat}
> And then this behavior is randomly triggered in testLogRollOnDatanodeDeath 
> because when we do
> {noformat}
> DatanodeInfo[] pipeline = getPipeline(log);
> assertTrue(pipeline.length == fs.getDefaultReplic

[jira] [Updated] (HBASE-5163) TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.")

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5163:
--

Affects Version/s: (was: 0.94.0)
   0.92.0
Fix Version/s: 0.94.0
   0.92.0
 Hadoop Flags: Reviewed

> TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or 
> hadoop QA ("The directory is already locked.")
> --
>
> Key: HBASE-5163
> URL: https://issues.apache.org/jira/browse/HBASE-5163
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 5163-92.txt, 5163.patch
>
>
> The stack is typically:
> {noformat}
>  type="java.io.IOException">java.io.IOException: Cannot lock storage 
> /tmp/19e3e634-8980-4923-9e72-a5b900a71d63/dfscluster_32a46f7b-24ef-488f-bd33-915959e001f4/dfs/data/data3.
>  The directory is already locked.
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:290)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1467)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:460)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:470)
> // ...
> {noformat}
> It can be reproduced without parallelization or without executing the other 
> tests in the class. It seems to fail about 5% of the time.
> This comes from the naming policy for the directories in 
> MiniDFSCluster#startDataNode. It depends on the number of nodes *currently* 
> in the cluster, and does not take into account previous starts/stops:
> {noformat}
>for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) {
>   if (manageDfsDirs) {
> File dir1 = new File(data_dir, "data"+(2*i+1));
> File dir2 = new File(data_dir, "data"+(2*i+2));
> dir1.mkdirs();
> dir2.mkdirs();
>   // [...]
> {noformat}
> This means that it if we want to stop/start a datanode, we should always stop 
> the last one, if not the names will conflict. This test exhibits the behavior:
> {noformat}
>   @Test
>   public void testMiniDFSCluster_startDataNode() throws Exception {
> assertTrue( dfsCluster.getDataNodes().size() == 2 );
> // Works, as we kill the last datanode, we can now start a datanode
> dfsCluster.stopDataNode(1);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
> // Fails, as it's not the last datanode, the directory will conflict on
> //  creation
> dfsCluster.stopDataNode(0);
> try {
>   dfsCluster
> .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
>   fail("There should be an exception because the directory already 
> exists");
> } catch (IOException e) {
>   assertTrue( e.getMessage().contains("The directory is already 
> locked."));
>   LOG.info("Expected (!) exception caught " + e.getMessage());
> }
> // Works, as we kill the last datanode, we can now restart 2 datanodes
> // This makes us back with 2 nodes
> dfsCluster.stopDataNode(0);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 2, true, null, null);
>   }
> {noformat}
> And then this behavior is randomly triggered in testLogRollOnDatanodeDeath 
> because when we do
> {noformat}
> DatanodeInfo[] pipeline = getPipeline(log);
> assertTrue(pipeline.length == fs.getDefaultReplication());
> {noformat}
> and then kill the datanodes in the pipeline, we will have:
>  - most of the time: pipeline = 1 & 2, so after killing 1&2 we can start a 
> new datanode that will reuse the available 2's directory.
>  - sometimes: pipeline = 1 & 3. In this case,when we try to launch the new 
> datanode, it fails because it wants to use the same directory as the still 
> alive '2'.
> Ther

[jira] [Updated] (HBASE-5163) TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.")

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5163:
--

Attachment: 5163-92.txt

Patch I would integrate to 0.92

> TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or 
> hadoop QA ("The directory is already locked.")
> --
>
> Key: HBASE-5163
> URL: https://issues.apache.org/jira/browse/HBASE-5163
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.94.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Attachments: 5163-92.txt, 5163.patch
>
>
> The stack is typically:
> {noformat}
>  type="java.io.IOException">java.io.IOException: Cannot lock storage 
> /tmp/19e3e634-8980-4923-9e72-a5b900a71d63/dfscluster_32a46f7b-24ef-488f-bd33-915959e001f4/dfs/data/data3.
>  The directory is already locked.
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:290)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1467)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:460)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:470)
> // ...
> {noformat}
> It can be reproduced without parallelization or without executing the other 
> tests in the class. It seems to fail about 5% of the time.
> This comes from the naming policy for the directories in 
> MiniDFSCluster#startDataNode. It depends on the number of nodes *currently* 
> in the cluster, and does not take into account previous starts/stops:
> {noformat}
>for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) {
>   if (manageDfsDirs) {
> File dir1 = new File(data_dir, "data"+(2*i+1));
> File dir2 = new File(data_dir, "data"+(2*i+2));
> dir1.mkdirs();
> dir2.mkdirs();
>   // [...]
> {noformat}
> This means that it if we want to stop/start a datanode, we should always stop 
> the last one, if not the names will conflict. This test exhibits the behavior:
> {noformat}
>   @Test
>   public void testMiniDFSCluster_startDataNode() throws Exception {
> assertTrue( dfsCluster.getDataNodes().size() == 2 );
> // Works, as we kill the last datanode, we can now start a datanode
> dfsCluster.stopDataNode(1);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
> // Fails, as it's not the last datanode, the directory will conflict on
> //  creation
> dfsCluster.stopDataNode(0);
> try {
>   dfsCluster
> .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
>   fail("There should be an exception because the directory already 
> exists");
> } catch (IOException e) {
>   assertTrue( e.getMessage().contains("The directory is already 
> locked."));
>   LOG.info("Expected (!) exception caught " + e.getMessage());
> }
> // Works, as we kill the last datanode, we can now restart 2 datanodes
> // This makes us back with 2 nodes
> dfsCluster.stopDataNode(0);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 2, true, null, null);
>   }
> {noformat}
> And then this behavior is randomly triggered in testLogRollOnDatanodeDeath 
> because when we do
> {noformat}
> DatanodeInfo[] pipeline = getPipeline(log);
> assertTrue(pipeline.length == fs.getDefaultReplication());
> {noformat}
> and then kill the datanodes in the pipeline, we will have:
>  - most of the time: pipeline = 1 & 2, so after killing 1&2 we can start a 
> new datanode that will reuse the available 2's directory.
>  - sometimes: pipeline = 1 & 3. In this case,when we try to launch the new 
> datanode, it fails because it wants to use the same directory as the still 
> alive '2'.
> There are two ways of fixing the test:
> 1) Fix the naming rule in MiniDFSCluster#startDataNode, for example to ensure 
> that the dir

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184697#comment-13184697
 ] 

Zhihong Yu commented on HBASE-5179:
---

{code}
+ * Class to hold dead servers list, utility querying dead server list and being
+ * processed dead servers by the ServerShutdownHandler.
{code}
The above should read 'querying dead server list and the dead servers being 
processed by ...'.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, 
> hbase-5179.patch, hbase-5179v5.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4748) Race between creating recovered edits for META and master assigning ROOT and META.

2012-01-11 Thread chunhui shen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184696#comment-13184696
 ] 

chunhui shen commented on HBASE-4748:
-

Could I see the patch?

Since it is quite rare, I think we should wait to assignRootAndMeta unitl 
finishing MetaServerShutdownHandler if exists.

> Race between creating recovered edits for META and master assigning ROOT and 
> META.
> --
>
> Key: HBASE-4748
> URL: https://issues.apache.org/jira/browse/HBASE-4748
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> 1. Start a cluster.
> 2. Alter a table
> 3. Restart the master using ./hbase-daemon.sh restart master
> 4. Kill the RS after master restarts.
> 5. Start RS again.
> 6. No table operations can be performed on the table that was altered but 
> admin.listTables() is able to list the altered table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2012-01-11 Thread Mubarak Seyed (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184694#comment-13184694
 ] 

Mubarak Seyed commented on HBASE-4720:
--

My local tests were keep failing on trunk, will fix the TestRowResource.

> Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
> client/server 
> 
>
> Key: HBASE-4720
> URL: https://issues.apache.org/jira/browse/HBASE-4720
> Project: HBase
>  Issue Type: Improvement
>Reporter: Daniel Lord
>Assignee: Mubarak Seyed
> Fix For: 0.94.0
>
> Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, 
> HBASE-4720.trunk.v3.patch, HBASE-4720.v1.patch, HBASE-4720.v3.patch
>
>
> I have several large application/HBase clusters where an application node 
> will occasionally need to talk to HBase from a different cluster.  In order 
> to help ensure some of my consistency guarantees I have a sentinel table that 
> is updated atomically as users interact with the system.  This works quite 
> well for the "regular" hbase client but the REST client does not implement 
> the checkAndPut and checkAndDelete operations.  This exposes the application 
> to some race conditions that have to be worked around.  It would be ideal if 
> the same checkAndPut/checkAndDelete operations could be supported by the REST 
> client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread chunhui shen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5179:


Attachment: hbase-5179v5.patch

In patch v5, I add javadoc to explain getDeadServersBeingProcessed() and 
getDeadServers.
And also add some more in DeadServer about deadServersBeingProcessed.

About Stack's comment that a server is in either inProgress or its in the 
deadServers list?
I think a server could both in processingDeadServers  list and deadServers list.
DeadServers list only store one instance for one regionserver, but 
processingDeadServers  list may store multi instances for one regionserver with 
several startcode

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, 
> hbase-5179.patch, hbase-5179v5.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2012-01-11 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184671#comment-13184671
 ] 

Hadoop QA commented on HBASE-4720:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12510293/HBASE-4720.trunk.v3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -146 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 81 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.rest.TestRowResource
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/740//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/740//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/740//console

This message is automatically generated.

> Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
> client/server 
> 
>
> Key: HBASE-4720
> URL: https://issues.apache.org/jira/browse/HBASE-4720
> Project: HBase
>  Issue Type: Improvement
>Reporter: Daniel Lord
>Assignee: Mubarak Seyed
> Fix For: 0.94.0
>
> Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, 
> HBASE-4720.trunk.v3.patch, HBASE-4720.v1.patch, HBASE-4720.v3.patch
>
>
> I have several large application/HBase clusters where an application node 
> will occasionally need to talk to HBase from a different cluster.  In order 
> to help ensure some of my consistency guarantees I have a sentinel table that 
> is updated atomically as users interact with the system.  This works quite 
> well for the "regular" hbase client but the REST client does not implement 
> the checkAndPut and checkAndDelete operations.  This exposes the application 
> to some race conditions that have to be worked around.  It would be ideal if 
> the same checkAndPut/checkAndDelete operations could be supported by the REST 
> client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread chunhui shen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184668#comment-13184668
 ] 

chunhui shen commented on HBASE-5179:
-

I agree with the renaming in patchV4.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, 
> hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2012-01-11 Thread Mubarak Seyed (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184647#comment-13184647
 ] 

Mubarak Seyed commented on HBASE-4720:
--

This patch does not cover the following from Andrew's comments:

{quote}
The REST gateway does support a batch put operation, where the supplied model 
contains multiple rows. The request URI will contain the table name and a row 
key, but the row key would be ignored and should be set to something known not 
to exist, like "submit". (Row name in the model takes preference to whatever 
was supplied in the URI.) See RowResource, starting around line 160. This gives 
the client the option of submitting work in batch, to reduce overheads.

So optionally here you could retrieve a list of rows and process them, building 
a response that includes the disposition of each.
{quote}

[HTable.checkAndPut|http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html]
 and 
[HTable.checkAndDelete|http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html]
API supports only one row at a time. I don't think we need to support batch of 
checkAndPut and checkAndDelete.

{quote}
The URI format for requests is '/// ...' This violates that by 
adding, just for check-and cases, a prefix. Having a special case like that 
should be avoided. What about handling this in TableResource, with a query 
parameter? '///?check' E.g.Then you won't need 
CheckAndXTableResource classes. Additionally use the appropriate HTTP 
operations. PUT/POST for check-and-put. DELETE for check-and-delete. The spec 
does not forbid bodies in DELETE requests. (I am unsure if Jetty/Jersey will 
support it however.)
{quote}

We have discussed the design choices earlier (refer comments in the same JIRA), 
Stack and Ted have voted for option # 2 (/checkandput, /checkanddelete) option. 
If i have to go back to option #1 then i will have to re-work most of the stuff 
here.

> Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
> client/server 
> 
>
> Key: HBASE-4720
> URL: https://issues.apache.org/jira/browse/HBASE-4720
> Project: HBase
>  Issue Type: Improvement
>Reporter: Daniel Lord
>Assignee: Mubarak Seyed
> Fix For: 0.94.0
>
> Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, 
> HBASE-4720.trunk.v3.patch, HBASE-4720.v1.patch, HBASE-4720.v3.patch
>
>
> I have several large application/HBase clusters where an application node 
> will occasionally need to talk to HBase from a different cluster.  In order 
> to help ensure some of my consistency guarantees I have a sentinel table that 
> is updated atomically as users interact with the system.  This works quite 
> well for the "regular" hbase client but the REST client does not implement 
> the checkAndPut and checkAndDelete operations.  This exposes the application 
> to some race conditions that have to be worked around.  It would be ideal if 
> the same checkAndPut/checkAndDelete operations could be supported by the REST 
> client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184646#comment-13184646
 ] 

Hadoop QA commented on HBASE-5033:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510289/HBASE-5033.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -147 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 80 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/739//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/739//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/739//console

This message is automatically generated.

> Opening/Closing store in parallel to reduce region open/close time
> --
>
> Key: HBASE-5033
> URL: https://issues.apache.org/jira/browse/HBASE-5033
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, 
> D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch
>
>
> Region servers are opening/closing each store and each store file for every 
> store in sequential fashion, which may cause inefficiency to open/close 
> regions. 
> So this diff is to open/close each store in parallel in order to reduce 
> region open/close time. Also it would help to reduce the cluster restart time.
> 1) Opening each store in parallel
> 2) Loading each store file for every store in parallel
> 3) Closing each store in parallel
> 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2012-01-11 Thread Mubarak Seyed (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mubarak Seyed updated HBASE-4720:
-

Attachment: HBASE-4720.trunk.v3.patch

The attached file (HBASE-4720.trunk.v3.patch) contains changes for Andrew 
Purtell's code review comments.

This patch does not cover the following from Andrew's comments:

>The REST gateway does support a batch put operation, where the supplied model 
>contains multiple rows. The request URI will contain the table name and a row 
>key, but the row key would be ignored and should be set to something known not 
>to exist, like "submit". (Row name in the model takes preference to whatever 
>was supplied in the URI.) See RowResource, starting around line 160. This 
>gives the client the option of submitting work in batch, to reduce overheads.

So optionally here you could retrieve a list of rows and process them, building 
a response that includes the disposition of each.

HTable.checkAndPut and HTable.checkAndDelete
API supports only one row at a time 
(http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[],
 byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)). I don't think we 
need to support batch of checkAndPut and checkAndDelete.

>The URI format for requests is '/// ...' This violates that by 
>adding, just for check-and cases, a prefix. Having a special case like that 
>should be avoided. What about handling this in TableResource, with a query 
>parameter? '///?check' E.g.Then you won't need 
>CheckAndXTableResource classes. Additionally use the appropriate HTTP 
>operations. PUT/POST for check-and-put. DELETE for check-and-delete. The spec 
>does not forbid bodies in DELETE requests. (I am unsure if Jetty/Jersey will 
>support it however.)

We have discussed the design choices earlier (refer comments in the same JIRA), 
Stack and Ted have voted for option # 2 (/checkandput, /checkanddelete) option. 
If i have to go back to option #1 then i will have to re-work most of the stuff 
here.

> Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
> client/server 
> 
>
> Key: HBASE-4720
> URL: https://issues.apache.org/jira/browse/HBASE-4720
> Project: HBase
>  Issue Type: Improvement
>Reporter: Daniel Lord
>Assignee: Mubarak Seyed
> Fix For: 0.94.0
>
> Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, 
> HBASE-4720.trunk.v3.patch, HBASE-4720.v1.patch, HBASE-4720.v3.patch
>
>
> I have several large application/HBase clusters where an application node 
> will occasionally need to talk to HBase from a different cluster.  In order 
> to help ensure some of my consistency guarantees I have a sentinel table that 
> is updated atomically as users interact with the system.  This works quite 
> well for the "regular" hbase client but the REST client does not implement 
> the checkAndPut and checkAndDelete operations.  This exposes the application 
> to some race conditions that have to be worked around.  It would be ideal if 
> the same checkAndPut/checkAndDelete operations could be supported by the REST 
> client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184631#comment-13184631
 ] 

Zhihong Yu commented on HBASE-5033:
---

Integrated to TRUNK.

Thanks for the patch, Liyin.

Thanks for the review Lars and Kannan.

Hopefully I got commit message right :-)

> Opening/Closing store in parallel to reduce region open/close time
> --
>
> Key: HBASE-5033
> URL: https://issues.apache.org/jira/browse/HBASE-5033
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, 
> D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch
>
>
> Region servers are opening/closing each store and each store file for every 
> store in sequential fashion, which may cause inefficiency to open/close 
> regions. 
> So this diff is to open/close each store in parallel in order to reduce 
> region open/close time. Also it would help to reduce the cluster restart time.
> 1) Opening each store in parallel
> 2) Loading each store file for every store in parallel
> 3) Closing each store in parallel
> 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5033:
--

Release Note: 
"hbase.hstore.open.and.close.threads.max" is introduced to control the number 
of threads for opening/closing Store and store files.


> Opening/Closing store in parallel to reduce region open/close time
> --
>
> Key: HBASE-5033
> URL: https://issues.apache.org/jira/browse/HBASE-5033
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, 
> D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch
>
>
> Region servers are opening/closing each store and each store file for every 
> store in sequential fashion, which may cause inefficiency to open/close 
> regions. 
> So this diff is to open/close each store in parallel in order to reduce 
> region open/close time. Also it would help to reduce the cluster restart time.
> 1) Opening each store in parallel
> 2) Loading each store file for every store in parallel
> 3) Closing each store in parallel
> 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184625#comment-13184625
 ] 

Hadoop QA commented on HBASE-5033:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510282/5033-trunk.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -147 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 80 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/738//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/738//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/738//console

This message is automatically generated.

> Opening/Closing store in parallel to reduce region open/close time
> --
>
> Key: HBASE-5033
> URL: https://issues.apache.org/jira/browse/HBASE-5033
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, 
> D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch
>
>
> Region servers are opening/closing each store and each store file for every 
> store in sequential fashion, which may cause inefficiency to open/close 
> regions. 
> So this diff is to open/close each store in parallel in order to reduce 
> region open/close time. Also it would help to reduce the cluster restart time.
> 1) Opening each store in parallel
> 2) Loading each store file for every store in parallel
> 3) Closing each store in parallel
> 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5179:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510266/5179-v3.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -147 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 80 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/735//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/735//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/735//console

This message is automatically generated.)

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, 
> hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5179:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510164/hbase-5179.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -147 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 78 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/728//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/728//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/728//console

This message is automatically generated.)

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, 
> hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184617#comment-13184617
 ] 

Zhihong Yu commented on HBASE-5177:
---

The patch from Phabricator cannot be applied on TRUNK:
{code}
1 out of 1 hunk FAILED -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/client/HTable.java.rej
patching file 
src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
Hunk #1 succeeded at 22 with fuzz 2 (offset 1 line).
Hunk #2 FAILED at 71.
Hunk #3 FAILED at 94.
Hunk #4 FAILED at 4142.
3 out of 4 hunks FAILED -- saving rejects to file 
src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java.rej
{code}

Patch for 0.89-fb doesn't have to be attached here.
Attaching patch for TRUNK would allow TRUNK to be in sync with 0.89-fb

Cheers

> HTable needs a non cached version of getRegionLocation
> --
>
> Key: HBASE-5177
> URL: https://issues.apache.org/jira/browse/HBASE-5177
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.90.4
>Reporter: Pritam Damania
>Assignee: Pritam Damania
>Priority: Minor
> Attachments: HBASE-5177.D1197.2.patch
>
>
> There is a need for a non caching version of getRegionLocation
> on the client side. This API is needed to quickly lookup the regionserver
> that hosts a particular region without using the heavy weight
> getRegionsInfo() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184612#comment-13184612
 ] 

Hadoop QA commented on HBASE-5179:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510277/5179-v4.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -147 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 79 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/737//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/737//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/737//console

This message is automatically generated.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, 
> hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5033:
--

Attachment: HBASE-5033.patch

Resubmit the patch. Thanks Ted for correction.

> Opening/Closing store in parallel to reduce region open/close time
> --
>
> Key: HBASE-5033
> URL: https://issues.apache.org/jira/browse/HBASE-5033
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, 
> D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch
>
>
> Region servers are opening/closing each store and each store file for every 
> store in sequential fashion, which may cause inefficiency to open/close 
> regions. 
> So this diff is to open/close each store in parallel in order to reduce 
> region open/close time. Also it would help to reduce the cluster restart time.
> 1) Opening each store in parallel
> 2) Loading each store file for every store in parallel
> 3) Closing each store in parallel
> 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5033:
--

Attachment: (was: HBASE-5033-apach-trunk.patch)

> Opening/Closing store in parallel to reduce region open/close time
> --
>
> Key: HBASE-5033
> URL: https://issues.apache.org/jira/browse/HBASE-5033
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, 
> D933.3.patch, D933.4.patch, D933.5.patch
>
>
> Region servers are opening/closing each store and each store file for every 
> store in sequential fashion, which may cause inefficiency to open/close 
> regions. 
> So this diff is to open/close each store in parallel in order to reduce 
> region open/close time. Also it would help to reduce the cluster restart time.
> 1) Opening each store in parallel
> 2) Loading each store file for every store in parallel
> 3) Closing each store in parallel
> 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation

2012-01-11 Thread Pritam Damania (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184609#comment-13184609
 ] 

Pritam Damania commented on HBASE-5177:
---

@Zhihong Yu : I think Phabricator already attached the patch automatically. Do 
I still need to attach it separately ?

> HTable needs a non cached version of getRegionLocation
> --
>
> Key: HBASE-5177
> URL: https://issues.apache.org/jira/browse/HBASE-5177
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.90.4
>Reporter: Pritam Damania
>Assignee: Pritam Damania
>Priority: Minor
> Attachments: HBASE-5177.D1197.2.patch
>
>
> There is a need for a non caching version of getRegionLocation
> on the client side. This API is needed to quickly lookup the regionserver
> that hosts a particular region without using the heavy weight
> getRegionsInfo() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5177) HTable needs a non cached version of getRegionLocation

2012-01-11 Thread Pritam Damania (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritam Damania updated HBASE-5177:
--

Attachment: (was: getRegionLocationNonCaching89fb.patch)

> HTable needs a non cached version of getRegionLocation
> --
>
> Key: HBASE-5177
> URL: https://issues.apache.org/jira/browse/HBASE-5177
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.90.4
>Reporter: Pritam Damania
>Assignee: Pritam Damania
>Priority: Minor
> Attachments: HBASE-5177.D1197.2.patch
>
>
> There is a need for a non caching version of getRegionLocation
> on the client side. This API is needed to quickly lookup the regionserver
> that hosts a particular region without using the heavy weight
> getRegionsInfo() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184608#comment-13184608
 ] 

Zhihong Yu commented on HBASE-5033:
---

I doubt:
{code}
diff --git a/src/main/java/org/apache/hadoop/hbase/HConstants.java 
b/src/main/java/org/apache/hadoop/hbase/HConstants.java
index 5120a3c..fcb024b 100644
--- a/src/main/java/org/apache/hadoop/hbase/HConstants.java
+++ b/src/main/java/org/apache/hadoop/hbase/HConstants.java
{code}

> Opening/Closing store in parallel to reduce region open/close time
> --
>
> Key: HBASE-5033
> URL: https://issues.apache.org/jira/browse/HBASE-5033
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, 
> D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch
>
>
> Region servers are opening/closing each store and each store file for every 
> store in sequential fashion, which may cause inefficiency to open/close 
> regions. 
> So this diff is to open/close each store in parallel in order to reduce 
> region open/close time. Also it would help to reduce the cluster restart time.
> 1) Opening each store in parallel
> 2) Loading each store file for every store in parallel
> 3) Closing each store in parallel
> 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5177) HTable needs a non cached version of getRegionLocation

2012-01-11 Thread Pritam Damania (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritam Damania updated HBASE-5177:
--

Attachment: getRegionLocationNonCaching89fb.patch

This patch is for the 89fb branch.

> HTable needs a non cached version of getRegionLocation
> --
>
> Key: HBASE-5177
> URL: https://issues.apache.org/jira/browse/HBASE-5177
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.90.4
>Reporter: Pritam Damania
>Assignee: Pritam Damania
>Priority: Minor
> Attachments: HBASE-5177.D1197.2.patch, 
> getRegionLocationNonCaching89fb.patch
>
>
> There is a need for a non caching version of getRegionLocation
> on the client side. This API is needed to quickly lookup the regionserver
> that hosts a particular region without using the heavy weight
> getRegionsInfo() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Liyin Tang (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184603#comment-13184603
 ] 

Liyin Tang commented on HBASE-5033:
---

Thanks Ted. BTW, I do use --no-prefix for this recently submitted patch. 

> Opening/Closing store in parallel to reduce region open/close time
> --
>
> Key: HBASE-5033
> URL: https://issues.apache.org/jira/browse/HBASE-5033
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, 
> D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch
>
>
> Region servers are opening/closing each store and each store file for every 
> store in sequential fashion, which may cause inefficiency to open/close 
> regions. 
> So this diff is to open/close each store in parallel in order to reduce 
> region open/close time. Also it would help to reduce the cluster restart time.
> 1) Opening each store in parallel
> 2) Loading each store file for every store in parallel
> 3) Closing each store in parallel
> 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184595#comment-13184595
 ] 

Zhihong Yu commented on HBASE-5177:
---

@Pritam:
Can you attach the latest patch here so that Hadoop QA can run through it ?
Remember to use '--no-prefix'

Thanks

> HTable needs a non cached version of getRegionLocation
> --
>
> Key: HBASE-5177
> URL: https://issues.apache.org/jira/browse/HBASE-5177
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.90.4
>Reporter: Pritam Damania
>Assignee: Pritam Damania
>Priority: Minor
> Attachments: HBASE-5177.D1197.2.patch
>
>
> There is a need for a non caching version of getRegionLocation
> on the client side. This API is needed to quickly lookup the regionserver
> that hosts a particular region without using the heavy weight
> getRegionsInfo() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation

2012-01-11 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184592#comment-13184592
 ] 

Phabricator commented on HBASE-5177:


tedyu has accepted the revision "HBASE-5177 [jira] Add a non-caching version of 
getRegionLocation.".

  Thanks for the explanation.

REVISION DETAIL
  https://reviews.facebook.net/D1197


> HTable needs a non cached version of getRegionLocation
> --
>
> Key: HBASE-5177
> URL: https://issues.apache.org/jira/browse/HBASE-5177
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.90.4
>Reporter: Pritam Damania
>Assignee: Pritam Damania
>Priority: Minor
> Attachments: HBASE-5177.D1197.2.patch
>
>
> There is a need for a non caching version of getRegionLocation
> on the client side. This API is needed to quickly lookup the regionserver
> that hosts a particular region without using the heavy weight
> getRegionsInfo() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5183) Render the monitored tasks as a treeview

2012-01-11 Thread Zhihong Yu (Created) (JIRA)

Render the monitored tasks as a treeview


 Key: HBASE-5183
 URL: https://issues.apache.org/jira/browse/HBASE-5183
 Project: HBase
  Issue Type: Sub-task
Reporter: Zhihong Yu


Andy made the suggestion here:
https://issues.apache.org/jira/browse/HBASE-5174?focusedCommentId=13184571&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13184571

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation

2012-01-11 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184588#comment-13184588
 ] 

Phabricator commented on HBASE-5177:


pritamdamania has commented on the revision "HBASE-5177 [jira] Add a 
non-caching version of getRegionLocation.".

INLINE COMMENTS
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java:4164 
That doesn't really matter in this part of the code right ? Since the region 
has not moved till now. Irrespective of the order of the calls, both results 
would be same correct ? The variables addrCache and addrNoCache refer to the 
type of method being invoked.

REVISION DETAIL
  https://reviews.facebook.net/D1197


> HTable needs a non cached version of getRegionLocation
> --
>
> Key: HBASE-5177
> URL: https://issues.apache.org/jira/browse/HBASE-5177
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.90.4
>Reporter: Pritam Damania
>Assignee: Pritam Damania
>Priority: Minor
> Attachments: HBASE-5177.D1197.2.patch
>
>
> There is a need for a non caching version of getRegionLocation
> on the client side. This API is needed to quickly lookup the regionserver
> that hosts a particular region without using the heavy weight
> getRegionsInfo() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation

2012-01-11 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184587#comment-13184587
 ] 

Phabricator commented on HBASE-5177:


tedyu has commented on the revision "HBASE-5177 [jira] Add a non-caching 
version of getRegionLocation.".

INLINE COMMENTS
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java:4164 So 
that the call on line 4162 can fetch from cache.

REVISION DETAIL
  https://reviews.facebook.net/D1197


> HTable needs a non cached version of getRegionLocation
> --
>
> Key: HBASE-5177
> URL: https://issues.apache.org/jira/browse/HBASE-5177
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.90.4
>Reporter: Pritam Damania
>Assignee: Pritam Damania
>Priority: Minor
> Attachments: HBASE-5177.D1197.2.patch
>
>
> There is a need for a non caching version of getRegionLocation
> on the client side. This API is needed to quickly lookup the regionserver
> that hosts a particular region without using the heavy weight
> getRegionsInfo() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5033:
--

Attachment: 5033-trunk.txt

Resolved a conflict in HRegion.java

In the future, please use --no-prefix to generate patch so that Hadoop QA can 
test it.

> Opening/Closing store in parallel to reduce region open/close time
> --
>
> Key: HBASE-5033
> URL: https://issues.apache.org/jira/browse/HBASE-5033
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, 
> D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch
>
>
> Region servers are opening/closing each store and each store file for every 
> store in sequential fashion, which may cause inefficiency to open/close 
> regions. 
> So this diff is to open/close each store in parallel in order to reduce 
> region open/close time. Also it would help to reduce the cluster restart time.
> 1) Opening each store in parallel
> 2) Loading each store file for every store in parallel
> 3) Closing each store in parallel
> 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation

2012-01-11 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184583#comment-13184583
 ] 

Phabricator commented on HBASE-5177:


pritamdamania has commented on the revision "HBASE-5177 [jira] Add a 
non-caching version of getRegionLocation.".

INLINE COMMENTS
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java:4164 Why 
do you think so ? How does the order affect this part of the code ?

REVISION DETAIL
  https://reviews.facebook.net/D1197


> HTable needs a non cached version of getRegionLocation
> --
>
> Key: HBASE-5177
> URL: https://issues.apache.org/jira/browse/HBASE-5177
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.90.4
>Reporter: Pritam Damania
>Assignee: Pritam Damania
>Priority: Minor
> Attachments: HBASE-5177.D1197.2.patch
>
>
> There is a need for a non caching version of getRegionLocation
> on the client side. This API is needed to quickly lookup the regionserver
> that hosts a particular region without using the heavy weight
> getRegionsInfo() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-01-11 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184578#comment-13184578
 ] 

Hadoop QA commented on HBASE-2600:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12510274/0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 20 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -147 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 80 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD
  
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/736//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/736//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/736//console

This message is automatically generated.

> Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
> tablename+ENDROW+randomid
> 
>
> Key: HBASE-2600
> URL: https://issues.apache.org/jira/browse/HBASE-2600
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Alex Newman
> Attachments: 
> 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch
>
>
> This is an idea that Ryan and I have been kicking around on and off for a 
> while now.
> If regionnames were made of tablename+endrow instead of tablename+startrow, 
> then in the metatables, doing a search for the region that contains the 
> wanted row, we'd just have to open a scanner using passed row and the first 
> row found by the scan would be that of the region we need (If offlined 
> parent, we'd have to scan to the next row).
> If we redid the meta tables in this format, we'd be using an access that is 
> natural to hbase, a scan as opposed to the perverse, expensive 
> getClosestRowBefore we currently have that has to walk backward in meta 
> finding a containing region.
> This issue is about changing the way we name regions.
> If we were using scans, prewarming client cache would be near costless (as 
> opposed to what we'll currently have to do which is first a 
> getClosestRowBefore and then a scan from the closestrowbefore forward).
> Converting to the new method, we'd have to run a migration on startup 
> changing the content in meta.
> Up to this, the randomid component of a region name has been the timestamp of 
> region creation.   HBASE-2531 "32-bit encoding of regionnames waaay 
> too susceptible to hash clashes" proposes changing the randomid so that it 
> contains actual name of the directory in the filesystem that hosts the 
> region.  If we had this in place, I think it would help with the migration to 
> this new way of doing the meta because as is, the region name in fs is a hash 
> of regionname... changing the format of the regionname would mean we generate 
> a different hash... so we'd need hbase-2531 to be in place before we could do 
> this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5174) Coalesce aborted tasks in the TaskMonitor

2012-01-11 Thread Andrew Purtell (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184571#comment-13184571
 ] 

Andrew Purtell commented on HBASE-5174:
---

Render the monitored tasks as a treeview, with something like 
http://jquery.bassistance.de/treeview/ ? While building the tree, put entries 
with identical text one level down, as soon as you see something different, 
move back up to toplevel? Render fully collapsed?

> Coalesce aborted tasks in the TaskMonitor
> -
>
> Key: HBASE-5174
> URL: https://issues.apache.org/jira/browse/HBASE-5174
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
> Fix For: 0.94.0, 0.92.1
>
>
> Some tasks can get repeatedly canceled like flushing when splitting is going 
> on, in the logs it looks like this:
> {noformat}
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> 2012-01-10 19:28:29,164 DEBUG 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
> because memory above low water=1.6g
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> 2012-01-10 19:28:29,164 DEBUG 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
> because memory above low water=1.6g
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> {noformat}
> But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the 
> regions. Basically 1000x:
> {noformat}
> Tue Jan 10 19:28:29 UTC 2012  Flushing 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. ABORTED (since 31sec 
> ago)   Not flushing since writes not enabled (since 31sec ago)
> {noformat}
> It's ugly and I'm sure some users will freak out seeing this, plus you have 
> to scroll down all the way to see your regions. Coalescing consecutive 
> aborted tasks seems like a good solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5033:
--

Fix Version/s: 0.94.0

> Opening/Closing store in parallel to reduce region open/close time
> --
>
> Key: HBASE-5033
> URL: https://issues.apache.org/jira/browse/HBASE-5033
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.94.0
>
> Attachments: 5033.txt, D933.1.patch, D933.2.patch, D933.3.patch, 
> D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch
>
>
> Region servers are opening/closing each store and each store file for every 
> store in sequential fashion, which may cause inefficiency to open/close 
> regions. 
> So this diff is to open/close each store in parallel in order to reduce 
> region open/close time. Also it would help to reduce the cluster restart time.
> 1) Opening each store in parallel
> 2) Loading each store file for every store in parallel
> 3) Closing each store in parallel
> 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5033:
--

Attachment: HBASE-5033-apach-trunk.patch

1) Based on the recent trunk and generate the patch --no-prefix
2) The default number of thread is set to 1.
3) Performance evaluation: the performance will be vary for different cluster 
environment such as the number of regions and the number of store files for 
each region.

The simple restart test shows the single region server (22 regions) restart 
time decreased from 78 sec to 55 sec 
So this will roughly save about 29% region server restart time. Also the 
cluster (100 nodes) restart time decreased from 316 secs to 189 secs, which has 
saved around 40% restart time.



> Opening/Closing store in parallel to reduce region open/close time
> --
>
> Key: HBASE-5033
> URL: https://issues.apache.org/jira/browse/HBASE-5033
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: 5033.txt, D933.1.patch, D933.2.patch, D933.3.patch, 
> D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch
>
>
> Region servers are opening/closing each store and each store file for every 
> store in sequential fashion, which may cause inefficiency to open/close 
> regions. 
> So this diff is to open/close each store in parallel in order to reduce 
> region open/close time. Also it would help to reduce the cluster restart time.
> 1) Opening each store in parallel
> 2) Loading each store file for every store in parallel
> 3) Closing each store in parallel
> 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5182) TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly

2012-01-11 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184565#comment-13184565
 ] 

Hudson commented on HBASE-5182:
---

Integrated in HBase-TRUNK #2622 (See 
[https://builds.apache.org/job/HBase-TRUNK/2622/])
HBASE-5182 TBoundedThreadPoolServer threadKeepAliveTimeSec is not 
configured properly

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift/TBoundedThreadPoolServer.java


> TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly
> --
>
> Key: HBASE-5182
> URL: https://issues.apache.org/jira/browse/HBASE-5182
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Scott Chen
>Assignee: Scott Chen
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: hbase-5182.txt
>
>
> TBoundedThreadPoolServer does not take the configured threadKeepAliveTimeSec. 
> It uses the default value instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5174) Coalesce aborted tasks in the TaskMonitor

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184562#comment-13184562
 ] 

Zhihong Yu commented on HBASE-5174:
---

I think the MonitoredTask display should be placed under region server section.

> Coalesce aborted tasks in the TaskMonitor
> -
>
> Key: HBASE-5174
> URL: https://issues.apache.org/jira/browse/HBASE-5174
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
> Fix For: 0.94.0, 0.92.1
>
>
> Some tasks can get repeatedly canceled like flushing when splitting is going 
> on, in the logs it looks like this:
> {noformat}
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> 2012-01-10 19:28:29,164 DEBUG 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
> because memory above low water=1.6g
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> 2012-01-10 19:28:29,164 DEBUG 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
> because memory above low water=1.6g
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> {noformat}
> But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the 
> regions. Basically 1000x:
> {noformat}
> Tue Jan 10 19:28:29 UTC 2012  Flushing 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. ABORTED (since 31sec 
> ago)   Not flushing since writes not enabled (since 31sec ago)
> {noformat}
> It's ugly and I'm sure some users will freak out seeing this, plus you have 
> to scroll down all the way to see your regions. Coalescing consecutive 
> aborted tasks seems like a good solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation

2012-01-11 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184561#comment-13184561
 ] 

Phabricator commented on HBASE-5177:


tedyu has commented on the revision "HBASE-5177 [jira] Add a non-caching 
version of getRegionLocation.".

INLINE COMMENTS
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java:4164 I 
think this call should be placed before the call on line 4162.

REVISION DETAIL
  https://reviews.facebook.net/D1197


> HTable needs a non cached version of getRegionLocation
> --
>
> Key: HBASE-5177
> URL: https://issues.apache.org/jira/browse/HBASE-5177
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.90.4
>Reporter: Pritam Damania
>Assignee: Pritam Damania
>Priority: Minor
> Attachments: HBASE-5177.D1197.2.patch
>
>
> There is a need for a non caching version of getRegionLocation
> on the client side. This API is needed to quickly lookup the regionserver
> that hosts a particular region without using the heavy weight
> getRegionsInfo() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5033:
--

Attachment: (was: HBASE-5033-apach-trunk.patch)

> Opening/Closing store in parallel to reduce region open/close time
> --
>
> Key: HBASE-5033
> URL: https://issues.apache.org/jira/browse/HBASE-5033
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: 5033.txt, D933.1.patch, D933.2.patch, D933.3.patch, 
> D933.4.patch, D933.5.patch
>
>
> Region servers are opening/closing each store and each store file for every 
> store in sequential fashion, which may cause inefficiency to open/close 
> regions. 
> So this diff is to open/close each store in parallel in order to reduce 
> region open/close time. Also it would help to reduce the cluster restart time.
> 1) Opening each store in parallel
> 2) Loading each store file for every store in parallel
> 3) Closing each store in parallel
> 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5179:
--

Attachment: 5179-v4.txt

Adopted getDeadServersBeingProcessed() method name.
Also made it package private.

Waiting for Chunhui's feedback about Stack's comments.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, 
> hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5181) Improve error message when Master fail-over happens and ZK unassigned node contains stale znode(s)

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184551#comment-13184551
 ] 

Zhihong Yu commented on HBASE-5181:
---

The message is certainly detailed :-)
Please remember to replace '/hbase' with the value of zookeeper.znode.parent

> Improve error message when Master fail-over happens and ZK unassigned node 
> contains stale znode(s)
> --
>
> Key: HBASE-5181
> URL: https://issues.apache.org/jira/browse/HBASE-5181
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0, 0.90.5
>Reporter: Mubarak Seyed
>Assignee: Mubarak Seyed
>Priority: Minor
>  Labels: noob
>
> When master fail-over happens, if we have number of RITs under 
> /hbase/unassigned and if we have stale znode(s) (encoded region names) under 
> /hbase/unassigned, we are getting
> {code}
> 2011-12-30 10:27:35,623 INFO org.apache.hadoop.hbase.master.HMaster: Master 
> startup proceeding: master failover 
> 2011-12-30 10:27:36,002 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to 
> process 1717 regions in transition 
> 2011-12-30 10:27:36,004 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unhandled exception. Starting shutdown. 
> java.lang.ArrayIndexOutOfBoundsException: -256 
> at 
> org.apache.hadoop.hbase.executor.RegionTransitionData.readFields(RegionTransitionData.java:148)
>  
> at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:105) 
> at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) 
> at 
> org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
>  
> at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:743) 
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransition(AssignmentManager.java:262)
>  
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:223)
>  
> at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:401) 
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283)
> {code}
> and there is no clue on how to clean-up the stale znode(s) from unassigned 
> using zkCli.sh (del /hbase/unassigned/). It would be good if 
> we include the bad region name in IOException from 
> RegionTransitionData.readFields().
> {code}
> @Override
>   public void readFields(DataInput in) throws IOException {
> // the event type byte
> eventType = EventType.values()[in.readShort()];
> // the timestamp
> stamp = in.readLong();
> // the encoded name of the region being transitioned
> regionName = Bytes.readByteArray(in);
> // remaining fields are optional so prefixed with boolean
> // the name of the regionserver sending the data
> if (in.readBoolean()) {
>   byte [] versionedBytes = Bytes.readByteArray(in);
>   this.origin = ServerName.parseVersionedServerName(versionedBytes);
> }
> if (in.readBoolean()) {
>   this.payload = Bytes.readByteArray(in);
> }
>   }
> {code}
> If the code execution has survived until regionName then we can include the 
> regionName in IOException with error message to clean-up the stale znode(s) 
> under /hbase/unassigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5174) Coalesce aborted tasks in the TaskMonitor

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184547#comment-13184547
 ] 

Zhihong Yu commented on HBASE-5174:
---

Looks like I didn't take State of MonitoredTask into account.
Personally I think seeing the latest status for a MonitoredTask is fine. To dig 
deeper, log is always the place to check.

Map>> is easy to 
confuse a few people reading the code :-)

> Coalesce aborted tasks in the TaskMonitor
> -
>
> Key: HBASE-5174
> URL: https://issues.apache.org/jira/browse/HBASE-5174
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
> Fix For: 0.94.0, 0.92.1
>
>
> Some tasks can get repeatedly canceled like flushing when splitting is going 
> on, in the logs it looks like this:
> {noformat}
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> 2012-01-10 19:28:29,164 DEBUG 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
> because memory above low water=1.6g
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> 2012-01-10 19:28:29,164 DEBUG 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
> because memory above low water=1.6g
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> {noformat}
> But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the 
> regions. Basically 1000x:
> {noformat}
> Tue Jan 10 19:28:29 UTC 2012  Flushing 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. ABORTED (since 31sec 
> ago)   Not flushing since writes not enabled (since 31sec ago)
> {noformat}
> It's ugly and I'm sure some users will freak out seeing this, plus you have 
> to scroll down all the way to see your regions. Coalescing consecutive 
> aborted tasks seems like a good solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-01-11 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184545#comment-13184545
 ] 

jirapos...@reviews.apache.org commented on HBASE-2600:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3466/
---

Review request for hbase and Michael Stack.


Summary
---

This is an idea that Ryan and I have been kicking around on and off for a while 
now.

If regionnames were made of tablename+endrow instead of tablename+startrow, 
then in the metatables, doing a search for the region that contains the wanted 
row, we'd just have to open a scanner using passed row and the first row found 
by the scan would be that of the region we need (If offlined parent, we'd have 
to scan to the next row).

If we redid the meta tables in this format, we'd be using an access that is 
natural to hbase, a scan as opposed to the perverse, expensive 
getClosestRowBefore we currently have that has to walk backward in meta finding 
a containing region.

This issue is about changing the way we name regions.

If we were using scans, prewarming client cache would be near costless (as 
opposed to what we'll currently have to do which is first a getClosestRowBefore 
and then a scan from the closestrowbefore forward).

Converting to the new method, we'd have to run a migration on startup changing 
the content in meta.

Up to this, the randomid component of a region name has been the timestamp of 
region creation. HBASE-2531 "32-bit encoding of regionnames waaay too 
susceptible to hash clashes" proposes changing the randomid so that it contains 
actual name of the directory in the filesystem that hosts the region. If we had 
this in place, I think it would help with the migration to this new way of 
doing the meta because as is, the region name in fs is a hash of regionname... 
changing the format of the regionname would mean we generate a different 
hash... so we'd need hbase-2531 to be in place before we could do this change.


This addresses bug HBASE-2600.
https://issues.apache.org/jira/browse/HBASE-2600


Diffs
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 74cb821 
  src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 133759d 
  src/main/java/org/apache/hadoop/hbase/KeyValue.java be7e2d8 
  src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java e5e60a8 
  src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 88c381f 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 99f90b2 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java f0c6828 
  src/main/java/org/apache/hadoop/hbase/rest/RegionsResource.java bf85bc1 
  src/main/java/org/apache/hadoop/hbase/rest/model/TableRegionModel.java 
67e7a04 
  src/test/java/org/apache/hadoop/hbase/TestKeyValue.java dc4ee8d 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestGetClosestAtOrBefore.java
 5f97167 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java 
6e1211b 
  src/test/java/org/apache/hadoop/hbase/rest/TestStatusResource.java cffdcb6 
  src/test/java/org/apache/hadoop/hbase/rest/model/TestTableRegionModel.java 
b6f0ab5 

Diff: https://reviews.apache.org/r/3466/diff


Testing
---

Unit tests started table. 


Tests in error: 
  org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD: Table 'TestTable 
we searched for the StartKey: TestTable ,, startKey lastChar's int value: 32 
with the stopKey: TestTable#,, stopRow lastChar's int value: 35 with 
parentTable:.META.

I need to know how to update/recreate the tar ball which is the source for that 
test.


Thanks,

Alex



> Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
> tablename+ENDROW+randomid
> 
>
> Key: HBASE-2600
> URL: https://issues.apache.org/jira/browse/HBASE-2600
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Alex Newman
> Attachments: 
> 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch
>
>
> This is an idea that Ryan and I have been kicking around on and off for a 
> while now.
> If regionnames were made of tablename+endrow instead of tablename+startrow, 
> then in the metatables, doing a search for the region that contains the 
> wanted row, we'd just have to open a scanner using passed row and the first 
> row found by the scan would be that of the region we need (If offlined 
> parent, we'd have to scan to the next row).
> If w

[jira] [Updated] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-01-11 Thread Alex Newman (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-2600:
---

Status: Patch Available  (was: Open)

> Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
> tablename+ENDROW+randomid
> 
>
> Key: HBASE-2600
> URL: https://issues.apache.org/jira/browse/HBASE-2600
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Alex Newman
> Attachments: 
> 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch
>
>
> This is an idea that Ryan and I have been kicking around on and off for a 
> while now.
> If regionnames were made of tablename+endrow instead of tablename+startrow, 
> then in the metatables, doing a search for the region that contains the 
> wanted row, we'd just have to open a scanner using passed row and the first 
> row found by the scan would be that of the region we need (If offlined 
> parent, we'd have to scan to the next row).
> If we redid the meta tables in this format, we'd be using an access that is 
> natural to hbase, a scan as opposed to the perverse, expensive 
> getClosestRowBefore we currently have that has to walk backward in meta 
> finding a containing region.
> This issue is about changing the way we name regions.
> If we were using scans, prewarming client cache would be near costless (as 
> opposed to what we'll currently have to do which is first a 
> getClosestRowBefore and then a scan from the closestrowbefore forward).
> Converting to the new method, we'd have to run a migration on startup 
> changing the content in meta.
> Up to this, the randomid component of a region name has been the timestamp of 
> region creation.   HBASE-2531 "32-bit encoding of regionnames waaay 
> too susceptible to hash clashes" proposes changing the randomid so that it 
> contains actual name of the directory in the filesystem that hosts the 
> region.  If we had this in place, I think it would help with the migration to 
> this new way of doing the meta because as is, the region name in fs is a hash 
> of regionname... changing the format of the regionname would mean we generate 
> a different hash... so we'd need hbase-2531 to be in place before we could do 
> this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4616) Update hregion encoded name to reduce logic and prevent region collisions in META

2012-01-11 Thread Alex Newman (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184542#comment-13184542
 ] 

Alex Newman commented on HBASE-4616:


https://reviews.apache.org/r/3466/

> Update hregion encoded name to reduce logic and prevent region collisions in 
> META
> -
>
> Key: HBASE-4616
> URL: https://issues.apache.org/jira/browse/HBASE-4616
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Alex Newman
>Assignee: Alex Newman
> Attachments: HBASE-4616-v2.patch, HBASE-4616-v3.patch, 
> HBASE-4616.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4616) Update hregion encoded name to reduce logic and prevent region collisions in META

2012-01-11 Thread Alex Newman (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184541#comment-13184541
 ] 

Alex Newman commented on HBASE-4616:


https://reviews.apache.org/r/3466/

> Update hregion encoded name to reduce logic and prevent region collisions in 
> META
> -
>
> Key: HBASE-4616
> URL: https://issues.apache.org/jira/browse/HBASE-4616
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Alex Newman
>Assignee: Alex Newman
> Attachments: HBASE-4616-v2.patch, HBASE-4616-v3.patch, 
> HBASE-4616.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-01-11 Thread Alex Newman (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184543#comment-13184543
 ] 

Alex Newman commented on HBASE-2600:


There's lots of discussion https://issues.apache.org/jira/browse/HBASE-4616 as 
well

> Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
> tablename+ENDROW+randomid
> 
>
> Key: HBASE-2600
> URL: https://issues.apache.org/jira/browse/HBASE-2600
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Alex Newman
> Attachments: 
> 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch
>
>
> This is an idea that Ryan and I have been kicking around on and off for a 
> while now.
> If regionnames were made of tablename+endrow instead of tablename+startrow, 
> then in the metatables, doing a search for the region that contains the 
> wanted row, we'd just have to open a scanner using passed row and the first 
> row found by the scan would be that of the region we need (If offlined 
> parent, we'd have to scan to the next row).
> If we redid the meta tables in this format, we'd be using an access that is 
> natural to hbase, a scan as opposed to the perverse, expensive 
> getClosestRowBefore we currently have that has to walk backward in meta 
> finding a containing region.
> This issue is about changing the way we name regions.
> If we were using scans, prewarming client cache would be near costless (as 
> opposed to what we'll currently have to do which is first a 
> getClosestRowBefore and then a scan from the closestrowbefore forward).
> Converting to the new method, we'd have to run a migration on startup 
> changing the content in meta.
> Up to this, the randomid component of a region name has been the timestamp of 
> region creation.   HBASE-2531 "32-bit encoding of regionnames waaay 
> too susceptible to hash clashes" proposes changing the randomid so that it 
> contains actual name of the directory in the filesystem that hosts the 
> region.  If we had this in place, I think it would help with the migration to 
> this new way of doing the meta because as is, the region name in fs is a hash 
> of regionname... changing the format of the regionname would mean we generate 
> a different hash... so we'd need hbase-2531 to be in place before we could do 
> this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-01-11 Thread Alex Newman (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-2600:
---

Attachment: 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch

> Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
> tablename+ENDROW+randomid
> 
>
> Key: HBASE-2600
> URL: https://issues.apache.org/jira/browse/HBASE-2600
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Alex Newman
> Attachments: 
> 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch
>
>
> This is an idea that Ryan and I have been kicking around on and off for a 
> while now.
> If regionnames were made of tablename+endrow instead of tablename+startrow, 
> then in the metatables, doing a search for the region that contains the 
> wanted row, we'd just have to open a scanner using passed row and the first 
> row found by the scan would be that of the region we need (If offlined 
> parent, we'd have to scan to the next row).
> If we redid the meta tables in this format, we'd be using an access that is 
> natural to hbase, a scan as opposed to the perverse, expensive 
> getClosestRowBefore we currently have that has to walk backward in meta 
> finding a containing region.
> This issue is about changing the way we name regions.
> If we were using scans, prewarming client cache would be near costless (as 
> opposed to what we'll currently have to do which is first a 
> getClosestRowBefore and then a scan from the closestrowbefore forward).
> Converting to the new method, we'd have to run a migration on startup 
> changing the content in meta.
> Up to this, the randomid component of a region name has been the timestamp of 
> region creation.   HBASE-2531 "32-bit encoding of regionnames waaay 
> too susceptible to hash clashes" proposes changing the randomid so that it 
> contains actual name of the directory in the filesystem that hosts the 
> region.  If we had this in place, I think it would help with the migration to 
> this new way of doing the meta because as is, the region name in fs is a hash 
> of regionname... changing the format of the regionname would mean we generate 
> a different hash... so we'd need hbase-2531 to be in place before we could do 
> this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-01-11 Thread Alex Newman (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-2600:
---

Issue Type: Bug  (was: Sub-task)
Parent: (was: HBASE-4616)

> Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
> tablename+ENDROW+randomid
> 
>
> Key: HBASE-2600
> URL: https://issues.apache.org/jira/browse/HBASE-2600
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Alex Newman
> Attachments: 
> 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch
>
>
> This is an idea that Ryan and I have been kicking around on and off for a 
> while now.
> If regionnames were made of tablename+endrow instead of tablename+startrow, 
> then in the metatables, doing a search for the region that contains the 
> wanted row, we'd just have to open a scanner using passed row and the first 
> row found by the scan would be that of the region we need (If offlined 
> parent, we'd have to scan to the next row).
> If we redid the meta tables in this format, we'd be using an access that is 
> natural to hbase, a scan as opposed to the perverse, expensive 
> getClosestRowBefore we currently have that has to walk backward in meta 
> finding a containing region.
> This issue is about changing the way we name regions.
> If we were using scans, prewarming client cache would be near costless (as 
> opposed to what we'll currently have to do which is first a 
> getClosestRowBefore and then a scan from the closestrowbefore forward).
> Converting to the new method, we'd have to run a migration on startup 
> changing the content in meta.
> Up to this, the randomid component of a region name has been the timestamp of 
> region creation.   HBASE-2531 "32-bit encoding of regionnames waaay 
> too susceptible to hash clashes" proposes changing the randomid so that it 
> contains actual name of the directory in the filesystem that hosts the 
> region.  If we had this in place, I think it would help with the migration to 
> this new way of doing the meta because as is, the region name in fs is a hash 
> of regionname... changing the format of the regionname would mean we generate 
> a different hash... so we'd need hbase-2531 to be in place before we could do 
> this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184536#comment-13184536
 ] 

Hadoop QA commented on HBASE-5179:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510266/5179-v3.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -147 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 80 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/735//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/735//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/735//console

This message is automatically generated.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5182) TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly

2012-01-11 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184534#comment-13184534
 ] 

Hadoop QA commented on HBASE-5182:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510265/hbase-5182.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -147 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 79 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/734//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/734//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/734//console

This message is automatically generated.

> TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly
> --
>
> Key: HBASE-5182
> URL: https://issues.apache.org/jira/browse/HBASE-5182
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Scott Chen
>Assignee: Scott Chen
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: hbase-5182.txt
>
>
> TBoundedThreadPoolServer does not take the configured threadKeepAliveTimeSec. 
> It uses the default value instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184533#comment-13184533
 ] 

stack commented on HBASE-5179:
--

bq. I think the reason Chunhui introduced a new Set for the dead servers being 
processed is that DeadServer is supposed to remember dead servers

Yeah, I seem to remember such a need but I'd think we should doc' it up some 
more in DeadServer so next person in here looking at code has a chance figuring 
whats up.

On v3:

{code}
getDeadServersUnderProcessing
{code}

is still public and I think it should be named getDeadServersBeingProcessed ... 
or BeingHandled... or better so it matches areDeadServersInProgress, 
getDeadServersInProgress.. they are in the process of being made into 
DeadServers!!! (and there is missing javadoc explaining what this method is at 
least relative to getDeadServers -- that its servers that are going through 
ServerShutdownHandler processing).

Does this method need to be in the Interface for ServerManager (The less in the 
Interface the better)?

knownServers should be onlineServers which makes me think that this check for 
DeadServersInProgress should be made inside in ServerManager so that what comes 
out of getOnlineServers has already had the InProgress servers stripped?

Do you think we need that the new Collection deadServersUnderProcessing should 
instead be called inProgress... and a server is in either inProgress or its in 
the deadServers list?  On remove, it gets moved (under synchronize) from one 
list to the other.





> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5174) Coalesce aborted tasks in the TaskMonitor

2012-01-11 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184532#comment-13184532
 ] 

Todd Lipcon commented on HBASE-5174:


There's no guarantee that Object.hashCode() is unique - just that it's usually 
unique. Would rather coalesce by actual identity (WeakIdentityHashMap?) or by 
some string (eg region id) than use hashcode.

> Coalesce aborted tasks in the TaskMonitor
> -
>
> Key: HBASE-5174
> URL: https://issues.apache.org/jira/browse/HBASE-5174
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
> Fix For: 0.94.0, 0.92.1
>
>
> Some tasks can get repeatedly canceled like flushing when splitting is going 
> on, in the logs it looks like this:
> {noformat}
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> 2012-01-10 19:28:29,164 DEBUG 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
> because memory above low water=1.6g
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> 2012-01-10 19:28:29,164 DEBUG 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
> because memory above low water=1.6g
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> {noformat}
> But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the 
> regions. Basically 1000x:
> {noformat}
> Tue Jan 10 19:28:29 UTC 2012  Flushing 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. ABORTED (since 31sec 
> ago)   Not flushing since writes not enabled (since 31sec ago)
> {noformat}
> It's ugly and I'm sure some users will freak out seeing this, plus you have 
> to scroll down all the way to see your regions. Coalescing consecutive 
> aborted tasks seems like a good solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5177) HTable needs a non cached version of getRegionLocation

2012-01-11 Thread Phabricator (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5177:
---

Attachment: HBASE-5177.D1197.2.patch

pritamdamania updated the revision "HBASE-5177 [jira] Add a non-caching version 
of getRegionLocation.".
Reviewers: Kannan, nspiegelberg, JIRA

  1) Addressing Ted's comment.

REVISION DETAIL
  https://reviews.facebook.net/D1197

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/client/HTable.java
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java


> HTable needs a non cached version of getRegionLocation
> --
>
> Key: HBASE-5177
> URL: https://issues.apache.org/jira/browse/HBASE-5177
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.90.4
>Reporter: Pritam Damania
>Assignee: Pritam Damania
>Priority: Minor
> Attachments: HBASE-5177.D1197.2.patch
>
>
> There is a need for a non caching version of getRegionLocation
> on the client side. This API is needed to quickly lookup the regionserver
> that hosts a particular region without using the heavy weight
> getRegionsInfo() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5174) Coalesce aborted tasks in the TaskMonitor

2012-01-11 Thread Jean-Daniel Cryans (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184520#comment-13184520
 ] 

Jean-Daniel Cryans commented on HBASE-5174:
---

Same as in HBASE-5136, I think we need to know something was aborted. 
Overwriting it will make it seem that nothing wrong's happening. Then add 
coalescing to make sure you only have 1 aborted and not a flood.

> Coalesce aborted tasks in the TaskMonitor
> -
>
> Key: HBASE-5174
> URL: https://issues.apache.org/jira/browse/HBASE-5174
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
> Fix For: 0.94.0, 0.92.1
>
>
> Some tasks can get repeatedly canceled like flushing when splitting is going 
> on, in the logs it looks like this:
> {noformat}
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> 2012-01-10 19:28:29,164 DEBUG 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
> because memory above low water=1.6g
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> 2012-01-10 19:28:29,164 DEBUG 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
> because memory above low water=1.6g
> 2012-01-10 19:28:29,164 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
> pressure
> 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> NOT flushing memstore for region 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
> writesEnabled=false
> {noformat}
> But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the 
> regions. Basically 1000x:
> {noformat}
> Tue Jan 10 19:28:29 UTC 2012  Flushing 
> test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. ABORTED (since 31sec 
> ago)   Not flushing since writes not enabled (since 31sec ago)
> {noformat}
> It's ugly and I'm sure some users will freak out seeing this, plus you have 
> to scroll down all the way to see your regions. Coalescing consecutive 
> aborted tasks seems like a good solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5181) Improve error message when Master fail-over happens and ZK unassigned node contains stale znode(s)

2012-01-11 Thread Mubarak Seyed (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184514#comment-13184514
 ] 

Mubarak Seyed commented on HBASE-5181:
--

Is there any suggestion on error message? How about

throw new IOException("There could be a stale region-in-transition in ZK." +
" The bad region is " + Bytes.toString(regionName) +
". Try deleting the region-in-transition using 'del 
/hbase/unassigned/"
+ Bytes.toString(regionName) + "' command over a ZK 
connection (in zkCli.sh)", ioe);

> Improve error message when Master fail-over happens and ZK unassigned node 
> contains stale znode(s)
> --
>
> Key: HBASE-5181
> URL: https://issues.apache.org/jira/browse/HBASE-5181
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0, 0.90.5
>Reporter: Mubarak Seyed
>Assignee: Mubarak Seyed
>Priority: Minor
>  Labels: noob
>
> When master fail-over happens, if we have number of RITs under 
> /hbase/unassigned and if we have stale znode(s) (encoded region names) under 
> /hbase/unassigned, we are getting
> {code}
> 2011-12-30 10:27:35,623 INFO org.apache.hadoop.hbase.master.HMaster: Master 
> startup proceeding: master failover 
> 2011-12-30 10:27:36,002 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to 
> process 1717 regions in transition 
> 2011-12-30 10:27:36,004 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unhandled exception. Starting shutdown. 
> java.lang.ArrayIndexOutOfBoundsException: -256 
> at 
> org.apache.hadoop.hbase.executor.RegionTransitionData.readFields(RegionTransitionData.java:148)
>  
> at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:105) 
> at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) 
> at 
> org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
>  
> at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:743) 
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransition(AssignmentManager.java:262)
>  
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:223)
>  
> at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:401) 
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283)
> {code}
> and there is no clue on how to clean-up the stale znode(s) from unassigned 
> using zkCli.sh (del /hbase/unassigned/). It would be good if 
> we include the bad region name in IOException from 
> RegionTransitionData.readFields().
> {code}
> @Override
>   public void readFields(DataInput in) throws IOException {
> // the event type byte
> eventType = EventType.values()[in.readShort()];
> // the timestamp
> stamp = in.readLong();
> // the encoded name of the region being transitioned
> regionName = Bytes.readByteArray(in);
> // remaining fields are optional so prefixed with boolean
> // the name of the regionserver sending the data
> if (in.readBoolean()) {
>   byte [] versionedBytes = Bytes.readByteArray(in);
>   this.origin = ServerName.parseVersionedServerName(versionedBytes);
> }
> if (in.readBoolean()) {
>   this.payload = Bytes.readByteArray(in);
> }
>   }
> {code}
> If the code execution has survived until regionName then we can include the 
> regionName in IOException with error message to clean-up the stale znode(s) 
> under /hbase/unassigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184507#comment-13184507
 ] 

Zhihong Yu commented on HBASE-5136:
---

Then this JIRA depends on HBASE-5174.
Please comment on my proposal there.

The patch in this JIRA is just specialized version of my proposal for 
HBASE-5174.

> Redundant MonitoredTask instances in case of distributed log splitting retry
> 
>
> Key: HBASE-5136
> URL: https://issues.apache.org/jira/browse/HBASE-5136
> Project: HBase
>  Issue Type: Task
>Reporter: Zhihong Yu
>Assignee: Zhihong Yu
> Attachments: 5136.txt
>
>
> In case of log splitting retry, the following code would be executed multiple 
> times:
> {code}
>   public long splitLogDistributed(final List logDirs) throws 
> IOException {
> MonitoredTask status = TaskMonitor.get().createStatus(
>   "Doing distributed log split in " + logDirs);
> {code}
> leading to multiple MonitoredTask instances.
> User may get confused by multiple distributed log splitting entries for the 
> same region server on master UI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation

2012-01-11 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184503#comment-13184503
 ] 

Phabricator commented on HBASE-5177:


tedyu has commented on the revision "HBASE-5177 [jira] Add a non-caching 
version of getRegionLocation.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/client/HTable.java:268 When reload is 
false, this new method becomes identical to the method on line 255.

  Should we deprecate the method on line 255 ?

REVISION DETAIL
  https://reviews.facebook.net/D1197


> HTable needs a non cached version of getRegionLocation
> --
>
> Key: HBASE-5177
> URL: https://issues.apache.org/jira/browse/HBASE-5177
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.90.4
>Reporter: Pritam Damania
>Assignee: Pritam Damania
>Priority: Minor
>
> There is a need for a non caching version of getRegionLocation
> on the client side. This API is needed to quickly lookup the regionserver
> that hosts a particular region without using the heavy weight
> getRegionsInfo() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-5053) HCM Tests leak connections

2012-01-11 Thread nkeywal (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal resolved HBASE-5053.


Resolution: Fixed

> HCM Tests leak connections
> --
>
> Key: HBASE-5053
> URL: https://issues.apache.org/jira/browse/HBASE-5053
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.94.0
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 5053.patch, 5053.v2.patch, 5053.v2.patch
>
>
> There are simple leaks and one more complex.
> The complex one comes from the fact fact 
> HConnectionManager.HConnectionImplementation keeps a *reference* to the 
> configuration used for the creation. So if this configuration is updated 
> later, the HConnectionKey created initially will differ from the current one. 
> As a consequence, the close() will not find the connection anymore in the 
> list, and the connection won't be deleted.
> I added a warning when a close does not find the connection in the list; but 
> I wonder if we should not copy the HConnectionKey instead of keeping a 
> reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-4602) Make the suite run in at least half the time

2012-01-11 Thread nkeywal (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal resolved HBASE-4602.


Resolution: Fixed

seems to be working 4 times faster now => solved

> Make the suite run in at least half the time
> 
>
> Key: HBASE-4602
> URL: https://issues.apache.org/jira/browse/HBASE-4602
> Project: HBase
>  Issue Type: Umbrella
> Environment: All.
>Reporter: nkeywal
>Assignee: nkeywal
> Attachments: tests_list.xlsx
>
>
> - Cutting down on the number of cluster spinups by coalescing related tests 
> rather than have each spin up its own cluster
> - Make cluster start/stop faster
> - Rewriting long-running tests so they do not need to be run on a cluster; 
> e.g. by instead mocking expected signals/messages
> - Move long running tests out of the unit test suite to instead run as part 
> of the recently introduced 'integration test' step

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5030) Some tests do not close the HFile.Reader they use, leaving some file descriptors open

2012-01-11 Thread nkeywal (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5030:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Some tests do not close the HFile.Reader they use, leaving some file 
> descriptors open
> -
>
> Key: HBASE-5030
> URL: https://issues.apache.org/jira/browse/HBASE-5030
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.94.0
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Trivial
> Attachments: 5030.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-5181) Improve error message when Master fail-over happens and ZK unassigned node contains stale znode(s)

2012-01-11 Thread Zhihong Yu (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-5181:
-

Assignee: Mubarak Seyed

> Improve error message when Master fail-over happens and ZK unassigned node 
> contains stale znode(s)
> --
>
> Key: HBASE-5181
> URL: https://issues.apache.org/jira/browse/HBASE-5181
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0, 0.90.5
>Reporter: Mubarak Seyed
>Assignee: Mubarak Seyed
>Priority: Minor
>  Labels: noob
>
> When master fail-over happens, if we have number of RITs under 
> /hbase/unassigned and if we have stale znode(s) (encoded region names) under 
> /hbase/unassigned, we are getting
> {code}
> 2011-12-30 10:27:35,623 INFO org.apache.hadoop.hbase.master.HMaster: Master 
> startup proceeding: master failover 
> 2011-12-30 10:27:36,002 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to 
> process 1717 regions in transition 
> 2011-12-30 10:27:36,004 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unhandled exception. Starting shutdown. 
> java.lang.ArrayIndexOutOfBoundsException: -256 
> at 
> org.apache.hadoop.hbase.executor.RegionTransitionData.readFields(RegionTransitionData.java:148)
>  
> at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:105) 
> at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) 
> at 
> org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
>  
> at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:743) 
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransition(AssignmentManager.java:262)
>  
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:223)
>  
> at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:401) 
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283)
> {code}
> and there is no clue on how to clean-up the stale znode(s) from unassigned 
> using zkCli.sh (del /hbase/unassigned/). It would be good if 
> we include the bad region name in IOException from 
> RegionTransitionData.readFields().
> {code}
> @Override
>   public void readFields(DataInput in) throws IOException {
> // the event type byte
> eventType = EventType.values()[in.readShort()];
> // the timestamp
> stamp = in.readLong();
> // the encoded name of the region being transitioned
> regionName = Bytes.readByteArray(in);
> // remaining fields are optional so prefixed with boolean
> // the name of the regionserver sending the data
> if (in.readBoolean()) {
>   byte [] versionedBytes = Bytes.readByteArray(in);
>   this.origin = ServerName.parseVersionedServerName(versionedBytes);
> }
> if (in.readBoolean()) {
>   this.payload = Bytes.readByteArray(in);
> }
>   }
> {code}
> If the code execution has survived until regionName then we can include the 
> regionName in IOException with error message to clean-up the stale znode(s) 
> under /hbase/unassigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation

2012-01-11 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184498#comment-13184498
 ] 

Phabricator commented on HBASE-5177:


pritamdamania has added reviewers to the revision "HBASE-5177 [jira] Add a 
non-caching version of getRegionLocation.".
Added Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D1197


> HTable needs a non cached version of getRegionLocation
> --
>
> Key: HBASE-5177
> URL: https://issues.apache.org/jira/browse/HBASE-5177
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.90.4
>Reporter: Pritam Damania
>Assignee: Pritam Damania
>Priority: Minor
>
> There is a need for a non caching version of getRegionLocation
> on the client side. This API is needed to quickly lookup the regionserver
> that hosts a particular region without using the heavy weight
> getRegionsInfo() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5182) TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly

2012-01-11 Thread Scott Chen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184485#comment-13184485
 ] 

Scott Chen commented on HBASE-5182:
---

Wow. That's super fast.
Thanks, Zhihong :)

> TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly
> --
>
> Key: HBASE-5182
> URL: https://issues.apache.org/jira/browse/HBASE-5182
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Scott Chen
>Assignee: Scott Chen
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: hbase-5182.txt
>
>
> TBoundedThreadPoolServer does not take the configured threadKeepAliveTimeSec. 
> It uses the default value instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5180) [book] book.xml - fixed scanner example

2012-01-11 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184484#comment-13184484
 ] 

Hudson commented on HBASE-5180:
---

Integrated in HBase-TRUNK #2621 (See 
[https://builds.apache.org/job/HBase-TRUNK/2621/])
hbase-5180 book.xml  - the scanner example wasn't closing the 
ResultScanner.  That's not good practice.


> [book] book.xml - fixed scanner example
> ---
>
> Key: HBASE-5180
> URL: https://issues.apache.org/jira/browse/HBASE-5180
> Project: HBase
>  Issue Type: Bug
>Reporter: Doug Meil
>Assignee: Doug Meil
> Attachments: book_HBASE_5180.xml.patch
>
>
> book.xml - the scanner example wasn't closing the ResultScanner!  that's bad 
> practice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5129) book is inconsistent regarding disabling - major compaction

2012-01-11 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184483#comment-13184483
 ] 

Hudson commented on HBASE-5129:
---

Integrated in HBase-TRUNK #2621 (See 
[https://builds.apache.org/job/HBase-TRUNK/2621/])
hbase-5129 [BOOK] configuration.xml - changed the major compaction disable 
instruction from Long.MAX_VALUE to 0.


> book is inconsistent regarding disabling - major compaction
> ---
>
> Key: HBASE-5129
> URL: https://issues.apache.org/jira/browse/HBASE-5129
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.90.1
>Reporter: Mikael Sitruk
>Assignee: Doug Meil
>Priority: Minor
> Attachments: configuration_HBASE_5129.xml.patch
>
>
> It seems that the book has some inconsistencies regarding the way to disable 
> major compactions
> According to the book in chapter 2.6.1.1. HBase Default Configuration
> hbase.hregion.majorcompaction - The time (in miliseconds) between 'major' 
> compactions of all HStoreFiles in a region. Default: 1 day. Set to 0 to 
> disable automated major compactions.
> Default: 8640 
> (http://hbase.apache.org/book.html#hbase_default_configurations)
> According to the book at chapter 2.8.2.8. Managed Compactions
> "A common administrative technique is to manage major compactions manually, 
> rather than letting HBase do it. By default, 
> HConstants.MAJOR_COMPACTION_PERIOD is one day and major compactions may kick 
> in when you least desire it - especially on a busy system. To "turn off" 
> automatic major compactions set the value to Long.MAX_VALUE."
> According to the code org.apache.hadoop.hbase.regionserver.Store.java, "0" is 
> the right answer. 
> (affect all documentation from 0.90.1)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-5177) HTable needs a non cached version of getRegionLocation

2012-01-11 Thread Pritam Damania (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritam Damania reassigned HBASE-5177:
-

Assignee: Pritam Damania

> HTable needs a non cached version of getRegionLocation
> --
>
> Key: HBASE-5177
> URL: https://issues.apache.org/jira/browse/HBASE-5177
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.90.4
>Reporter: Pritam Damania
>Assignee: Pritam Damania
>Priority: Minor
>
> There is a need for a non caching version of getRegionLocation
> on the client side. This API is needed to quickly lookup the regionserver
> that hosts a particular region without using the heavy weight
> getRegionsInfo() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5182) TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly

2012-01-11 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5182:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks for the patch Scott.

> TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly
> --
>
> Key: HBASE-5182
> URL: https://issues.apache.org/jira/browse/HBASE-5182
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Scott Chen
>Assignee: Scott Chen
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: hbase-5182.txt
>
>
> TBoundedThreadPoolServer does not take the configured threadKeepAliveTimeSec. 
> It uses the default value instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5167) We shouldn't be injecting 'Killing [daemon]' into logs, when we aren't doing that.

2012-01-11 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184482#comment-13184482
 ] 

Hudson commented on HBASE-5167:
---

Integrated in HBase-TRUNK #2621 (See 
[https://builds.apache.org/job/HBase-TRUNK/2621/])
HBASE-5167 We shouldn't be injecting 'Killing [daemon]' into logs, when we 
aren't doing that.

stack : 
Files : 
* /hbase/trunk/bin/hbase-daemon.sh


> We shouldn't be injecting 'Killing [daemon]' into logs, when we aren't doing 
> that.
> --
>
> Key: HBASE-5167
> URL: https://issues.apache.org/jira/browse/HBASE-5167
> Project: HBase
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 0.92.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 0.94.0
>
> Attachments: HBASE-5167.patch
>
>
> HBASE-4209 changed the behavior of the scripts such that we do not kill the 
> daemons away anymore. We should have also changed the message shown in the 
> logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5182) TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5182:
--

Fix Version/s: 0.94.0
 Hadoop Flags: Reviewed

> TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly
> --
>
> Key: HBASE-5182
> URL: https://issues.apache.org/jira/browse/HBASE-5182
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Scott Chen
>Assignee: Scott Chen
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: hbase-5182.txt
>
>
> TBoundedThreadPoolServer does not take the configured threadKeepAliveTimeSec. 
> It uses the default value instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5181) Improve error message when Master fail-over happens and ZK unassigned node contains stale znode(s)

2012-01-11 Thread Mubarak Seyed (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184478#comment-13184478
 ] 

Mubarak Seyed commented on HBASE-5181:
--

Working on corporate approval to contribute this patch. Thanks.

> Improve error message when Master fail-over happens and ZK unassigned node 
> contains stale znode(s)
> --
>
> Key: HBASE-5181
> URL: https://issues.apache.org/jira/browse/HBASE-5181
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0, 0.90.5
>Reporter: Mubarak Seyed
>Priority: Minor
>  Labels: noob
>
> When master fail-over happens, if we have number of RITs under 
> /hbase/unassigned and if we have stale znode(s) (encoded region names) under 
> /hbase/unassigned, we are getting
> {code}
> 2011-12-30 10:27:35,623 INFO org.apache.hadoop.hbase.master.HMaster: Master 
> startup proceeding: master failover 
> 2011-12-30 10:27:36,002 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to 
> process 1717 regions in transition 
> 2011-12-30 10:27:36,004 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unhandled exception. Starting shutdown. 
> java.lang.ArrayIndexOutOfBoundsException: -256 
> at 
> org.apache.hadoop.hbase.executor.RegionTransitionData.readFields(RegionTransitionData.java:148)
>  
> at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:105) 
> at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) 
> at 
> org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
>  
> at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:743) 
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransition(AssignmentManager.java:262)
>  
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:223)
>  
> at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:401) 
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283)
> {code}
> and there is no clue on how to clean-up the stale znode(s) from unassigned 
> using zkCli.sh (del /hbase/unassigned/). It would be good if 
> we include the bad region name in IOException from 
> RegionTransitionData.readFields().
> {code}
> @Override
>   public void readFields(DataInput in) throws IOException {
> // the event type byte
> eventType = EventType.values()[in.readShort()];
> // the timestamp
> stamp = in.readLong();
> // the encoded name of the region being transitioned
> regionName = Bytes.readByteArray(in);
> // remaining fields are optional so prefixed with boolean
> // the name of the regionserver sending the data
> if (in.readBoolean()) {
>   byte [] versionedBytes = Bytes.readByteArray(in);
>   this.origin = ServerName.parseVersionedServerName(versionedBytes);
> }
> if (in.readBoolean()) {
>   this.payload = Bytes.readByteArray(in);
> }
>   }
> {code}
> If the code execution has survived until regionName then we can include the 
> regionName in IOException with error message to clean-up the stale znode(s) 
> under /hbase/unassigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5182) TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5182:
--

Status: Patch Available  (was: Open)

> TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly
> --
>
> Key: HBASE-5182
> URL: https://issues.apache.org/jira/browse/HBASE-5182
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Scott Chen
>Assignee: Scott Chen
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: hbase-5182.txt
>
>
> TBoundedThreadPoolServer does not take the configured threadKeepAliveTimeSec. 
> It uses the default value instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 3 >

1 - 100 of 204 matches

Mail list logo