[jira] [Commented] (HBASE-5163) TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.")
[ https://issues.apache.org/jira/browse/HBASE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184788#comment-13184788 ] Hudson commented on HBASE-5163: --- Integrated in HBase-0.92-security #72 (See [https://builds.apache.org/job/HBase-0.92-security/72/]) HBASE-5163 TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.") (N Keywal) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java > TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or > hadoop QA ("The directory is already locked.") > -- > > Key: HBASE-5163 > URL: https://issues.apache.org/jira/browse/HBASE-5163 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.92.0 > Environment: all >Reporter: nkeywal >Assignee: nkeywal >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: 5163-92.txt, 5163.patch > > > The stack is typically: > {noformat} > type="java.io.IOException">java.io.IOException: Cannot lock storage > /tmp/19e3e634-8980-4923-9e72-a5b900a71d63/dfscluster_32a46f7b-24ef-488f-bd33-915959e001f4/dfs/data/data3. > The directory is already locked. > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602) > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:376) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:290) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1467) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:460) > at > org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:470) > // ... > {noformat} > It can be reproduced without parallelization or without executing the other > tests in the class. It seems to fail about 5% of the time. > This comes from the naming policy for the directories in > MiniDFSCluster#startDataNode. It depends on the number of nodes *currently* > in the cluster, and does not take into account previous starts/stops: > {noformat} >for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) { > if (manageDfsDirs) { > File dir1 = new File(data_dir, "data"+(2*i+1)); > File dir2 = new File(data_dir, "data"+(2*i+2)); > dir1.mkdirs(); > dir2.mkdirs(); > // [...] > {noformat} > This means that it if we want to stop/start a datanode, we should always stop > the last one, if not the names will conflict. This test exhibits the behavior: > {noformat} > @Test > public void testMiniDFSCluster_startDataNode() throws Exception { > assertTrue( dfsCluster.getDataNodes().size() == 2 ); > // Works, as we kill the last datanode, we can now start a datanode > dfsCluster.stopDataNode(1); > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null); > // Fails, as it's not the last datanode, the directory will conflict on > // creation > dfsCluster.stopDataNode(0); > try { > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null); > fail("There should be an exception because the directory already > exists"); > } catch (IOException e) { > assertTrue( e.getMessage().contains("The directory is already > locked.")); > LOG.info("Expected (!) exception caught " + e.getMessage()); > } > // Works, as we kill the last datanode, we can now restart 2 datanodes > // This makes us back with 2 nodes > dfsCluster.stopDataNode(0); > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 2, true, null, null); > } > {noformat} > And then this behavior is randomly triggered in testLogRollOnDatanodeDeath > because when we do > {noformat} > DatanodeInfo[] pipeline = getPipeline(log); > assertTrue(pipeline.length == fs.getDefaultReplication()); > {noformat} > and then kill the datanodes in the pipeline, we will have
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184772#comment-13184772 ] stack commented on HBASE-5179: -- Sure. Do what you fellas think best. > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, > 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184755#comment-13184755 ] ramkrishna.s.vasudevan edited comment on HBASE-5179 at 1/12/12 6:49 AM: @Ted, @Stack @Chunhui I think we may have to combine the change in HBASE-4748 as Chunhui suggested 12/Jan/12 03:23. Is it ok to combine it? Because only then the processFailOver and SSH problem can be solved totally. Pls suggest. Sorry for the typo was (Author: ram_krish): @Ted, @Stack @Chunhui I think we may have to combine the change in HBASE-4879 as Chunhui suggested 12/Jan/12 03:23. Is it ok to combine it? Because only then the processFailOver and SSH problem can be solved totally. Pls suggest. > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, > 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184764#comment-13184764 ] chunhui shen commented on HBASE-5179: - I think so too > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, > 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184760#comment-13184760 ] Zhihong Yu commented on HBASE-5179: --- You mean hbase-4748, right ? I think we should combine the two. > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, > 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184755#comment-13184755 ] ramkrishna.s.vasudevan commented on HBASE-5179: --- @Ted, @Stack @Chunhui I think we may have to combine the change in HBASE-4879 as Chunhui suggested 12/Jan/12 03:23. Is it ok to combine it? Because only then the processFailOver and SSH problem can be solved totally. Pls suggest. > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, > 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184754#comment-13184754 ] stack commented on HBASE-5179: -- I think getDeadServersInProgress is better than getDeadServersBeingProcessed since it relates to areDeadServersInProgress (I can fix this on commit -- would also change name of the Collection in DeadServers so its inProgress). Yeah, would be interested in notion that we do this server checking inside in ServerManager so when you ask for onlineServers, this stuff has been done for you already... or is thought that ServerManager need not know about 'handlers' that HMaster only should have to know whats running under it (A ServerManager and handlers such as ServerShutdownHandler). > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, > 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5163) TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.")
[ https://issues.apache.org/jira/browse/HBASE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184741#comment-13184741 ] Hudson commented on HBASE-5163: --- Integrated in HBase-0.92 #241 (See [https://builds.apache.org/job/HBase-0.92/241/]) HBASE-5163 TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.") (N Keywal) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java > TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or > hadoop QA ("The directory is already locked.") > -- > > Key: HBASE-5163 > URL: https://issues.apache.org/jira/browse/HBASE-5163 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.92.0 > Environment: all >Reporter: nkeywal >Assignee: nkeywal >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: 5163-92.txt, 5163.patch > > > The stack is typically: > {noformat} > type="java.io.IOException">java.io.IOException: Cannot lock storage > /tmp/19e3e634-8980-4923-9e72-a5b900a71d63/dfscluster_32a46f7b-24ef-488f-bd33-915959e001f4/dfs/data/data3. > The directory is already locked. > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602) > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:376) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:290) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1467) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:460) > at > org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:470) > // ... > {noformat} > It can be reproduced without parallelization or without executing the other > tests in the class. It seems to fail about 5% of the time. > This comes from the naming policy for the directories in > MiniDFSCluster#startDataNode. It depends on the number of nodes *currently* > in the cluster, and does not take into account previous starts/stops: > {noformat} >for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) { > if (manageDfsDirs) { > File dir1 = new File(data_dir, "data"+(2*i+1)); > File dir2 = new File(data_dir, "data"+(2*i+2)); > dir1.mkdirs(); > dir2.mkdirs(); > // [...] > {noformat} > This means that it if we want to stop/start a datanode, we should always stop > the last one, if not the names will conflict. This test exhibits the behavior: > {noformat} > @Test > public void testMiniDFSCluster_startDataNode() throws Exception { > assertTrue( dfsCluster.getDataNodes().size() == 2 ); > // Works, as we kill the last datanode, we can now start a datanode > dfsCluster.stopDataNode(1); > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null); > // Fails, as it's not the last datanode, the directory will conflict on > // creation > dfsCluster.stopDataNode(0); > try { > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null); > fail("There should be an exception because the directory already > exists"); > } catch (IOException e) { > assertTrue( e.getMessage().contains("The directory is already > locked.")); > LOG.info("Expected (!) exception caught " + e.getMessage()); > } > // Works, as we kill the last datanode, we can now restart 2 datanodes > // This makes us back with 2 nodes > dfsCluster.stopDataNode(0); > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 2, true, null, null); > } > {noformat} > And then this behavior is randomly triggered in testLogRollOnDatanodeDeath > because when we do > {noformat} > DatanodeInfo[] pipeline = getPipeline(log); > assertTrue(pipeline.length == fs.getDefaultReplication()); > {noformat} > and then kill the datanodes in the pipeline, we will have: > - most of t
[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5179: -- Comment: was deleted (was: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510311/5179-90v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/744//console This message is automatically generated.) > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, > 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server
[ https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184737#comment-13184737 ] Zhihong Yu commented on HBASE-4720: --- Latest patch passed unit tests. > Implement atomic update operations (checkAndPut, checkAndDelete) for REST > client/server > > > Key: HBASE-4720 > URL: https://issues.apache.org/jira/browse/HBASE-4720 > Project: HBase > Issue Type: Improvement >Reporter: Daniel Lord >Assignee: Mubarak Seyed > Fix For: 0.94.0 > > Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, > HBASE-4720.trunk.v3.patch, HBASE-4720.trunk.v4.patch, HBASE-4720.v1.patch, > HBASE-4720.v3.patch > > > I have several large application/HBase clusters where an application node > will occasionally need to talk to HBase from a different cluster. In order > to help ensure some of my consistency guarantees I have a sentinel table that > is updated atomically as users interact with the system. This works quite > well for the "regular" hbase client but the REST client does not implement > the checkAndPut and checkAndDelete operations. This exposes the application > to some race conditions that have to be worked around. It would be ideal if > the same checkAndPut/checkAndDelete operations could be supported by the REST > client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184734#comment-13184734 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510311/5179-90v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/744//console This message is automatically generated. > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, > 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-5179: Attachment: 5179-90v2.patch > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-90v2.patch, 5179-v2.txt, 5179-v3.txt, > 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5178) Backport HBASE-4101 - Regionserver Deadlock
[ https://issues.apache.org/jira/browse/HBASE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5178: -- Fix Version/s: 0.90.6 > Backport HBASE-4101 - Regionserver Deadlock > --- > > Key: HBASE-5178 > URL: https://issues.apache.org/jira/browse/HBASE-5178 > Project: HBase > Issue Type: Bug >Reporter: ramkrishna.s.vasudevan > Fix For: 0.90.6 > > Attachments: HBASE-4101_0.90_1.patch > > > Critical issue not merged to 0.90. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5184) Backport HBASE-5152 - Region is on service before completing initialization when doing rollback of split, it will affect read correctness
[ https://issues.apache.org/jira/browse/HBASE-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5184: -- Fix Version/s: 0.90.6 > Backport HBASE-5152 - Region is on service before completing initialization > when doing rollback of split, it will affect read correctness > -- > > Key: HBASE-5184 > URL: https://issues.apache.org/jira/browse/HBASE-5184 > Project: HBase > Issue Type: Bug >Reporter: ramkrishna.s.vasudevan > Fix For: 0.90.6 > > > Important issue to be merged into 0.90. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5160) Backport HBASE-4397 - -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time
[ https://issues.apache.org/jira/browse/HBASE-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5160: -- Fix Version/s: 0.90.6 > Backport HBASE-4397 - -ROOT-, .META. tables stay offline for too long in > recovery phase after all RSs are shutdown at the same time > --- > > Key: HBASE-5160 > URL: https://issues.apache.org/jira/browse/HBASE-5160 > Project: HBase > Issue Type: Bug >Reporter: ramkrishna.s.vasudevan > Fix For: 0.90.6 > > > Backporting to 0.90.6 considering the importance of the issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5168) Backport HBASE-5100 - Rollback of split could cause closed region to be opened again
[ https://issues.apache.org/jira/browse/HBASE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5168: -- Fix Version/s: 0.90.6 > Backport HBASE-5100 - Rollback of split could cause closed region to be > opened again > > > Key: HBASE-5168 > URL: https://issues.apache.org/jira/browse/HBASE-5168 > Project: HBase > Issue Type: Bug >Reporter: ramkrishna.s.vasudevan > Fix For: 0.90.6 > > Attachments: HBASE-5100_0.90.patch > > > Considering the importance of the defect merging it to 0.90.6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5158) Backport HBASE-4878 - Master crash when splitting hlog may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5158: -- Fix Version/s: 0.90.6 > Backport HBASE-4878 - Master crash when splitting hlog may cause data loss > -- > > Key: HBASE-5158 > URL: https://issues.apache.org/jira/browse/HBASE-5158 > Project: HBase > Issue Type: Bug >Reporter: ramkrishna.s.vasudevan > Fix For: 0.90.6 > > Attachments: HBASE-4878_branch90_1.patch > > > Backporting to 0.90.6 considering the importance of the issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5157) Backport HBASE-4880- Region is on service before openRegionHandler completes, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5157: -- Fix Version/s: 0.90.6 > Backport HBASE-4880- Region is on service before openRegionHandler completes, > may cause data loss > - > > Key: HBASE-5157 > URL: https://issues.apache.org/jira/browse/HBASE-5157 > Project: HBase > Issue Type: Bug >Reporter: ramkrishna.s.vasudevan > Fix For: 0.90.6 > > Attachments: HBASE-4880_branch90_1.patch > > > Backporting to 0.90.6 considering the importance of the issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5156) Backport HBASE-4899 - Region would be assigned twice easily with continually killing server and moving region in testing environment
[ https://issues.apache.org/jira/browse/HBASE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5156: -- Fix Version/s: 0.90.6 > Backport HBASE-4899 - Region would be assigned twice easily with continually > killing server and moving region in testing environment > - > > Key: HBASE-5156 > URL: https://issues.apache.org/jira/browse/HBASE-5156 > Project: HBase > Issue Type: Bug >Reporter: ramkrishna.s.vasudevan > Fix For: 0.90.6 > > Attachments: HBASE-4899_Branch90_1.patch > > > Need to backport to 0.90.6 considering the criticality of the issue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server
[ https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184717#comment-13184717 ] Hadoop QA commented on HBASE-4720: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510309/HBASE-4720.trunk.v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -146 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 81 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/743//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/743//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/743//console This message is automatically generated. > Implement atomic update operations (checkAndPut, checkAndDelete) for REST > client/server > > > Key: HBASE-4720 > URL: https://issues.apache.org/jira/browse/HBASE-4720 > Project: HBase > Issue Type: Improvement >Reporter: Daniel Lord >Assignee: Mubarak Seyed > Fix For: 0.94.0 > > Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, > HBASE-4720.trunk.v3.patch, HBASE-4720.trunk.v4.patch, HBASE-4720.v1.patch, > HBASE-4720.v3.patch > > > I have several large application/HBase clusters where an application node > will occasionally need to talk to HBase from a different cluster. In order > to help ensure some of my consistency guarantees I have a sentinel table that > is updated atomically as users interact with the system. This works quite > well for the "regular" hbase client but the REST client does not implement > the checkAndPut and checkAndDelete operations. This exposes the application > to some race conditions that have to be worked around. It would be ideal if > the same checkAndPut/checkAndDelete operations could be supported by the REST > client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5184) Backport HBASE-5152 - Region is on service before completing initialization when doing rollback of split, it will affect read correctness
Backport HBASE-5152 - Region is on service before completing initialization when doing rollback of split, it will affect read correctness -- Key: HBASE-5184 URL: https://issues.apache.org/jira/browse/HBASE-5184 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Important issue to be merged into 0.90. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4748) Race between creating recovered edits for META and master assigning ROOT and META.
[ https://issues.apache.org/jira/browse/HBASE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184715#comment-13184715 ] ramkrishna.s.vasudevan commented on HBASE-4748: --- @Chunhui Ok let me check with your suggestion and then upload the patch. :) thanks > Race between creating recovered edits for META and master assigning ROOT and > META. > -- > > Key: HBASE-4748 > URL: https://issues.apache.org/jira/browse/HBASE-4748 > Project: HBase > Issue Type: Bug >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > > 1. Start a cluster. > 2. Alter a table > 3. Restart the master using ./hbase-daemon.sh restart master > 4. Kill the RS after master restarts. > 5. Start RS again. > 6. No table operations can be performed on the table that was altered but > admin.listTables() is able to list the altered table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server
[ https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mubarak Seyed updated HBASE-4720: - Attachment: HBASE-4720.trunk.v4.patch Tests were still failing for runMediumTests on trunk but i have fixed the TestRowResource. The attached file (HBASE-4720.trunk.v4.patch) is a latest patch. Thanks. {code} mvn clean test -P runMediumTests -Dtest=org.apache.hadoop.hbase.rest.* Running org.apache.hadoop.hbase.rest.TestRowResource Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.099 se {code} > Implement atomic update operations (checkAndPut, checkAndDelete) for REST > client/server > > > Key: HBASE-4720 > URL: https://issues.apache.org/jira/browse/HBASE-4720 > Project: HBase > Issue Type: Improvement >Reporter: Daniel Lord >Assignee: Mubarak Seyed > Fix For: 0.94.0 > > Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, > HBASE-4720.trunk.v3.patch, HBASE-4720.trunk.v4.patch, HBASE-4720.v1.patch, > HBASE-4720.v3.patch > > > I have several large application/HBase clusters where an application node > will occasionally need to talk to HBase from a different cluster. In order > to help ensure some of my consistency guarantees I have a sentinel table that > is updated atomically as users interact with the system. This works quite > well for the "regular" hbase client but the REST client does not implement > the checkAndPut and checkAndDelete operations. This exposes the application > to some race conditions that have to be worked around. It would be ideal if > the same checkAndPut/checkAndDelete operations could be supported by the REST > client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184705#comment-13184705 ] Jieshan Bean commented on HBASE-5153: - @Ted, the patch for TRUNK seems very different, and i still need some time to check it. hope i can provide today:) @Stack, I think ConnectionUtils is reasonable. I can add it:). I will update the patch. Thank you all. > HConnection re-creation in HTable after HConnection abort > - > > Key: HBASE-5153 > URL: https://issues.apache.org/jira/browse/HBASE-5153 > Project: HBase > Issue Type: Bug > Components: client >Affects Versions: 0.90.4 >Reporter: Jieshan Bean >Assignee: Jieshan Bean > Fix For: 0.90.6 > > Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, > HBASE-5153.patch > > > HBASE-4893 is related to this issue. In that issue, we know, if multi-threads > share a same connection, once this connection got abort in one thread, the > other threads will got a > "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception. > It solve the problem of "stale connection can't removed". But the orignal > HTable instance cann't be continue to use. The connection in HTable should be > recreated. > Actually, there's two aproach to solve this: > 1. In user code, once catch an IOE, close connection and re-create HTable > instance. We can use this as a workaround. > 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184704#comment-13184704 ] Hudson commented on HBASE-5033: --- Integrated in HBase-TRUNK #2623 (See [https://builds.apache.org/job/HBase-TRUNK/2623/]) HBASE-5033 Differential Revision: 933 Opening/Closing store in parallel to reduce region open/close time (Liyin) tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Threads.java > Opening/Closing store in parallel to reduce region open/close time > -- > > Key: HBASE-5033 > URL: https://issues.apache.org/jira/browse/HBASE-5033 > Project: HBase > Issue Type: Improvement >Reporter: Liyin Tang >Assignee: Liyin Tang > Fix For: 0.94.0 > > Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, > D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch > > > Region servers are opening/closing each store and each store file for every > store in sequential fashion, which may cause inefficiency to open/close > regions. > So this diff is to open/close each store in parallel in order to reduce > region open/close time. Also it would help to reduce the cluster restart time. > 1) Opening each store in parallel > 2) Loading each store file for every store in parallel > 3) Closing each store in parallel > 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184703#comment-13184703 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510304/hbase-5179v5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -147 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 80 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.regionserver.wal.TestHLog org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/741//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/741//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/741//console This message is automatically generated. > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, > hbase-5179.patch, hbase-5179v5.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5163) TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.")
[ https://issues.apache.org/jira/browse/HBASE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184701#comment-13184701 ] Hadoop QA commented on HBASE-5163: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510307/5163-92.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/742//console This message is automatically generated. > TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or > hadoop QA ("The directory is already locked.") > -- > > Key: HBASE-5163 > URL: https://issues.apache.org/jira/browse/HBASE-5163 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.92.0 > Environment: all >Reporter: nkeywal >Assignee: nkeywal >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: 5163-92.txt, 5163.patch > > > The stack is typically: > {noformat} > type="java.io.IOException">java.io.IOException: Cannot lock storage > /tmp/19e3e634-8980-4923-9e72-a5b900a71d63/dfscluster_32a46f7b-24ef-488f-bd33-915959e001f4/dfs/data/data3. > The directory is already locked. > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602) > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:376) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:290) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1467) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:460) > at > org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:470) > // ... > {noformat} > It can be reproduced without parallelization or without executing the other > tests in the class. It seems to fail about 5% of the time. > This comes from the naming policy for the directories in > MiniDFSCluster#startDataNode. It depends on the number of nodes *currently* > in the cluster, and does not take into account previous starts/stops: > {noformat} >for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) { > if (manageDfsDirs) { > File dir1 = new File(data_dir, "data"+(2*i+1)); > File dir2 = new File(data_dir, "data"+(2*i+2)); > dir1.mkdirs(); > dir2.mkdirs(); > // [...] > {noformat} > This means that it if we want to stop/start a datanode, we should always stop > the last one, if not the names will conflict. This test exhibits the behavior: > {noformat} > @Test > public void testMiniDFSCluster_startDataNode() throws Exception { > assertTrue( dfsCluster.getDataNodes().size() == 2 ); > // Works, as we kill the last datanode, we can now start a datanode > dfsCluster.stopDataNode(1); > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null); > // Fails, as it's not the last datanode, the directory will conflict on > // creation > dfsCluster.stopDataNode(0); > try { > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null); > fail("There should be an exception because the directory already > exists"); > } catch (IOException e) { > assertTrue( e.getMessage().contains("The directory is already > locked.")); > LOG.info("Expected (!) exception caught " + e.getMessage()); > } > // Works, as we kill the last datanode, we can now restart 2 datanodes > // This makes us back with 2 nodes > dfsCluster.stopDataNode(0); > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 2, true, null, null); > } > {noformat} > And then this behavior is randomly triggered in testLogRollOnDatanodeDeath > because when we do > {noformat} > DatanodeInfo[] pipeline = getPipeline(log); > assertTrue(pipeline.length == fs.getDefaultReplic
[jira] [Updated] (HBASE-5163) TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.")
[ https://issues.apache.org/jira/browse/HBASE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5163: -- Affects Version/s: (was: 0.94.0) 0.92.0 Fix Version/s: 0.94.0 0.92.0 Hadoop Flags: Reviewed > TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or > hadoop QA ("The directory is already locked.") > -- > > Key: HBASE-5163 > URL: https://issues.apache.org/jira/browse/HBASE-5163 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.92.0 > Environment: all >Reporter: nkeywal >Assignee: nkeywal >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: 5163-92.txt, 5163.patch > > > The stack is typically: > {noformat} > type="java.io.IOException">java.io.IOException: Cannot lock storage > /tmp/19e3e634-8980-4923-9e72-a5b900a71d63/dfscluster_32a46f7b-24ef-488f-bd33-915959e001f4/dfs/data/data3. > The directory is already locked. > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602) > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:376) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:290) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1467) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:460) > at > org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:470) > // ... > {noformat} > It can be reproduced without parallelization or without executing the other > tests in the class. It seems to fail about 5% of the time. > This comes from the naming policy for the directories in > MiniDFSCluster#startDataNode. It depends on the number of nodes *currently* > in the cluster, and does not take into account previous starts/stops: > {noformat} >for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) { > if (manageDfsDirs) { > File dir1 = new File(data_dir, "data"+(2*i+1)); > File dir2 = new File(data_dir, "data"+(2*i+2)); > dir1.mkdirs(); > dir2.mkdirs(); > // [...] > {noformat} > This means that it if we want to stop/start a datanode, we should always stop > the last one, if not the names will conflict. This test exhibits the behavior: > {noformat} > @Test > public void testMiniDFSCluster_startDataNode() throws Exception { > assertTrue( dfsCluster.getDataNodes().size() == 2 ); > // Works, as we kill the last datanode, we can now start a datanode > dfsCluster.stopDataNode(1); > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null); > // Fails, as it's not the last datanode, the directory will conflict on > // creation > dfsCluster.stopDataNode(0); > try { > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null); > fail("There should be an exception because the directory already > exists"); > } catch (IOException e) { > assertTrue( e.getMessage().contains("The directory is already > locked.")); > LOG.info("Expected (!) exception caught " + e.getMessage()); > } > // Works, as we kill the last datanode, we can now restart 2 datanodes > // This makes us back with 2 nodes > dfsCluster.stopDataNode(0); > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 2, true, null, null); > } > {noformat} > And then this behavior is randomly triggered in testLogRollOnDatanodeDeath > because when we do > {noformat} > DatanodeInfo[] pipeline = getPipeline(log); > assertTrue(pipeline.length == fs.getDefaultReplication()); > {noformat} > and then kill the datanodes in the pipeline, we will have: > - most of the time: pipeline = 1 & 2, so after killing 1&2 we can start a > new datanode that will reuse the available 2's directory. > - sometimes: pipeline = 1 & 3. In this case,when we try to launch the new > datanode, it fails because it wants to use the same directory as the still > alive '2'. > Ther
[jira] [Updated] (HBASE-5163) TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.")
[ https://issues.apache.org/jira/browse/HBASE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5163: -- Attachment: 5163-92.txt Patch I would integrate to 0.92 > TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or > hadoop QA ("The directory is already locked.") > -- > > Key: HBASE-5163 > URL: https://issues.apache.org/jira/browse/HBASE-5163 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.0 > Environment: all >Reporter: nkeywal >Assignee: nkeywal >Priority: Minor > Attachments: 5163-92.txt, 5163.patch > > > The stack is typically: > {noformat} > type="java.io.IOException">java.io.IOException: Cannot lock storage > /tmp/19e3e634-8980-4923-9e72-a5b900a71d63/dfscluster_32a46f7b-24ef-488f-bd33-915959e001f4/dfs/data/data3. > The directory is already locked. > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602) > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:376) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:290) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1467) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417) > at > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:460) > at > org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:470) > // ... > {noformat} > It can be reproduced without parallelization or without executing the other > tests in the class. It seems to fail about 5% of the time. > This comes from the naming policy for the directories in > MiniDFSCluster#startDataNode. It depends on the number of nodes *currently* > in the cluster, and does not take into account previous starts/stops: > {noformat} >for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) { > if (manageDfsDirs) { > File dir1 = new File(data_dir, "data"+(2*i+1)); > File dir2 = new File(data_dir, "data"+(2*i+2)); > dir1.mkdirs(); > dir2.mkdirs(); > // [...] > {noformat} > This means that it if we want to stop/start a datanode, we should always stop > the last one, if not the names will conflict. This test exhibits the behavior: > {noformat} > @Test > public void testMiniDFSCluster_startDataNode() throws Exception { > assertTrue( dfsCluster.getDataNodes().size() == 2 ); > // Works, as we kill the last datanode, we can now start a datanode > dfsCluster.stopDataNode(1); > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null); > // Fails, as it's not the last datanode, the directory will conflict on > // creation > dfsCluster.stopDataNode(0); > try { > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null); > fail("There should be an exception because the directory already > exists"); > } catch (IOException e) { > assertTrue( e.getMessage().contains("The directory is already > locked.")); > LOG.info("Expected (!) exception caught " + e.getMessage()); > } > // Works, as we kill the last datanode, we can now restart 2 datanodes > // This makes us back with 2 nodes > dfsCluster.stopDataNode(0); > dfsCluster > .startDataNodes(TEST_UTIL.getConfiguration(), 2, true, null, null); > } > {noformat} > And then this behavior is randomly triggered in testLogRollOnDatanodeDeath > because when we do > {noformat} > DatanodeInfo[] pipeline = getPipeline(log); > assertTrue(pipeline.length == fs.getDefaultReplication()); > {noformat} > and then kill the datanodes in the pipeline, we will have: > - most of the time: pipeline = 1 & 2, so after killing 1&2 we can start a > new datanode that will reuse the available 2's directory. > - sometimes: pipeline = 1 & 3. In this case,when we try to launch the new > datanode, it fails because it wants to use the same directory as the still > alive '2'. > There are two ways of fixing the test: > 1) Fix the naming rule in MiniDFSCluster#startDataNode, for example to ensure > that the dir
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184697#comment-13184697 ] Zhihong Yu commented on HBASE-5179: --- {code} + * Class to hold dead servers list, utility querying dead server list and being + * processed dead servers by the ServerShutdownHandler. {code} The above should read 'querying dead server list and the dead servers being processed by ...'. > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, > hbase-5179.patch, hbase-5179v5.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4748) Race between creating recovered edits for META and master assigning ROOT and META.
[ https://issues.apache.org/jira/browse/HBASE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184696#comment-13184696 ] chunhui shen commented on HBASE-4748: - Could I see the patch? Since it is quite rare, I think we should wait to assignRootAndMeta unitl finishing MetaServerShutdownHandler if exists. > Race between creating recovered edits for META and master assigning ROOT and > META. > -- > > Key: HBASE-4748 > URL: https://issues.apache.org/jira/browse/HBASE-4748 > Project: HBase > Issue Type: Bug >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > > 1. Start a cluster. > 2. Alter a table > 3. Restart the master using ./hbase-daemon.sh restart master > 4. Kill the RS after master restarts. > 5. Start RS again. > 6. No table operations can be performed on the table that was altered but > admin.listTables() is able to list the altered table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server
[ https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184694#comment-13184694 ] Mubarak Seyed commented on HBASE-4720: -- My local tests were keep failing on trunk, will fix the TestRowResource. > Implement atomic update operations (checkAndPut, checkAndDelete) for REST > client/server > > > Key: HBASE-4720 > URL: https://issues.apache.org/jira/browse/HBASE-4720 > Project: HBase > Issue Type: Improvement >Reporter: Daniel Lord >Assignee: Mubarak Seyed > Fix For: 0.94.0 > > Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, > HBASE-4720.trunk.v3.patch, HBASE-4720.v1.patch, HBASE-4720.v3.patch > > > I have several large application/HBase clusters where an application node > will occasionally need to talk to HBase from a different cluster. In order > to help ensure some of my consistency guarantees I have a sentinel table that > is updated atomically as users interact with the system. This works quite > well for the "regular" hbase client but the REST client does not implement > the checkAndPut and checkAndDelete operations. This exposes the application > to some race conditions that have to be worked around. It would be ideal if > the same checkAndPut/checkAndDelete operations could be supported by the REST > client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-5179: Attachment: hbase-5179v5.patch In patch v5, I add javadoc to explain getDeadServersBeingProcessed() and getDeadServers. And also add some more in DeadServer about deadServersBeingProcessed. About Stack's comment that a server is in either inProgress or its in the deadServers list? I think a server could both in processingDeadServers list and deadServers list. DeadServers list only store one instance for one regionserver, but processingDeadServers list may store multi instances for one regionserver with several startcode > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, > hbase-5179.patch, hbase-5179v5.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server
[ https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184671#comment-13184671 ] Hadoop QA commented on HBASE-4720: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510293/HBASE-4720.trunk.v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -146 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 81 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.rest.TestRowResource org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/740//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/740//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/740//console This message is automatically generated. > Implement atomic update operations (checkAndPut, checkAndDelete) for REST > client/server > > > Key: HBASE-4720 > URL: https://issues.apache.org/jira/browse/HBASE-4720 > Project: HBase > Issue Type: Improvement >Reporter: Daniel Lord >Assignee: Mubarak Seyed > Fix For: 0.94.0 > > Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, > HBASE-4720.trunk.v3.patch, HBASE-4720.v1.patch, HBASE-4720.v3.patch > > > I have several large application/HBase clusters where an application node > will occasionally need to talk to HBase from a different cluster. In order > to help ensure some of my consistency guarantees I have a sentinel table that > is updated atomically as users interact with the system. This works quite > well for the "regular" hbase client but the REST client does not implement > the checkAndPut and checkAndDelete operations. This exposes the application > to some race conditions that have to be worked around. It would be ideal if > the same checkAndPut/checkAndDelete operations could be supported by the REST > client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184668#comment-13184668 ] chunhui shen commented on HBASE-5179: - I agree with the renaming in patchV4. > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, > hbase-5179.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server
[ https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184647#comment-13184647 ] Mubarak Seyed commented on HBASE-4720: -- This patch does not cover the following from Andrew's comments: {quote} The REST gateway does support a batch put operation, where the supplied model contains multiple rows. The request URI will contain the table name and a row key, but the row key would be ignored and should be set to something known not to exist, like "submit". (Row name in the model takes preference to whatever was supplied in the URI.) See RowResource, starting around line 160. This gives the client the option of submitting work in batch, to reduce overheads. So optionally here you could retrieve a list of rows and process them, building a response that includes the disposition of each. {quote} [HTable.checkAndPut|http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html] and [HTable.checkAndDelete|http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html] API supports only one row at a time. I don't think we need to support batch of checkAndPut and checkAndDelete. {quote} The URI format for requests is '/// ...' This violates that by adding, just for check-and cases, a prefix. Having a special case like that should be avoided. What about handling this in TableResource, with a query parameter? '///?check' E.g.Then you won't need CheckAndXTableResource classes. Additionally use the appropriate HTTP operations. PUT/POST for check-and-put. DELETE for check-and-delete. The spec does not forbid bodies in DELETE requests. (I am unsure if Jetty/Jersey will support it however.) {quote} We have discussed the design choices earlier (refer comments in the same JIRA), Stack and Ted have voted for option # 2 (/checkandput, /checkanddelete) option. If i have to go back to option #1 then i will have to re-work most of the stuff here. > Implement atomic update operations (checkAndPut, checkAndDelete) for REST > client/server > > > Key: HBASE-4720 > URL: https://issues.apache.org/jira/browse/HBASE-4720 > Project: HBase > Issue Type: Improvement >Reporter: Daniel Lord >Assignee: Mubarak Seyed > Fix For: 0.94.0 > > Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, > HBASE-4720.trunk.v3.patch, HBASE-4720.v1.patch, HBASE-4720.v3.patch > > > I have several large application/HBase clusters where an application node > will occasionally need to talk to HBase from a different cluster. In order > to help ensure some of my consistency guarantees I have a sentinel table that > is updated atomically as users interact with the system. This works quite > well for the "regular" hbase client but the REST client does not implement > the checkAndPut and checkAndDelete operations. This exposes the application > to some race conditions that have to be worked around. It would be ideal if > the same checkAndPut/checkAndDelete operations could be supported by the REST > client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184646#comment-13184646 ] Hadoop QA commented on HBASE-5033: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510289/HBASE-5033.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -147 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 80 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/739//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/739//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/739//console This message is automatically generated. > Opening/Closing store in parallel to reduce region open/close time > -- > > Key: HBASE-5033 > URL: https://issues.apache.org/jira/browse/HBASE-5033 > Project: HBase > Issue Type: Improvement >Reporter: Liyin Tang >Assignee: Liyin Tang > Fix For: 0.94.0 > > Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, > D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch > > > Region servers are opening/closing each store and each store file for every > store in sequential fashion, which may cause inefficiency to open/close > regions. > So this diff is to open/close each store in parallel in order to reduce > region open/close time. Also it would help to reduce the cluster restart time. > 1) Opening each store in parallel > 2) Loading each store file for every store in parallel > 3) Closing each store in parallel > 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server
[ https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mubarak Seyed updated HBASE-4720: - Attachment: HBASE-4720.trunk.v3.patch The attached file (HBASE-4720.trunk.v3.patch) contains changes for Andrew Purtell's code review comments. This patch does not cover the following from Andrew's comments: >The REST gateway does support a batch put operation, where the supplied model >contains multiple rows. The request URI will contain the table name and a row >key, but the row key would be ignored and should be set to something known not >to exist, like "submit". (Row name in the model takes preference to whatever >was supplied in the URI.) See RowResource, starting around line 160. This >gives the client the option of submitting work in batch, to reduce overheads. So optionally here you could retrieve a list of rows and process them, building a response that includes the disposition of each. HTable.checkAndPut and HTable.checkAndDelete API supports only one row at a time (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[], byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)). I don't think we need to support batch of checkAndPut and checkAndDelete. >The URI format for requests is '/// ...' This violates that by >adding, just for check-and cases, a prefix. Having a special case like that >should be avoided. What about handling this in TableResource, with a query >parameter? '///?check' E.g.Then you won't need >CheckAndXTableResource classes. Additionally use the appropriate HTTP >operations. PUT/POST for check-and-put. DELETE for check-and-delete. The spec >does not forbid bodies in DELETE requests. (I am unsure if Jetty/Jersey will >support it however.) We have discussed the design choices earlier (refer comments in the same JIRA), Stack and Ted have voted for option # 2 (/checkandput, /checkanddelete) option. If i have to go back to option #1 then i will have to re-work most of the stuff here. > Implement atomic update operations (checkAndPut, checkAndDelete) for REST > client/server > > > Key: HBASE-4720 > URL: https://issues.apache.org/jira/browse/HBASE-4720 > Project: HBase > Issue Type: Improvement >Reporter: Daniel Lord >Assignee: Mubarak Seyed > Fix For: 0.94.0 > > Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, > HBASE-4720.trunk.v3.patch, HBASE-4720.v1.patch, HBASE-4720.v3.patch > > > I have several large application/HBase clusters where an application node > will occasionally need to talk to HBase from a different cluster. In order > to help ensure some of my consistency guarantees I have a sentinel table that > is updated atomically as users interact with the system. This works quite > well for the "regular" hbase client but the REST client does not implement > the checkAndPut and checkAndDelete operations. This exposes the application > to some race conditions that have to be worked around. It would be ideal if > the same checkAndPut/checkAndDelete operations could be supported by the REST > client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184631#comment-13184631 ] Zhihong Yu commented on HBASE-5033: --- Integrated to TRUNK. Thanks for the patch, Liyin. Thanks for the review Lars and Kannan. Hopefully I got commit message right :-) > Opening/Closing store in parallel to reduce region open/close time > -- > > Key: HBASE-5033 > URL: https://issues.apache.org/jira/browse/HBASE-5033 > Project: HBase > Issue Type: Improvement >Reporter: Liyin Tang >Assignee: Liyin Tang > Fix For: 0.94.0 > > Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, > D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch > > > Region servers are opening/closing each store and each store file for every > store in sequential fashion, which may cause inefficiency to open/close > regions. > So this diff is to open/close each store in parallel in order to reduce > region open/close time. Also it would help to reduce the cluster restart time. > 1) Opening each store in parallel > 2) Loading each store file for every store in parallel > 3) Closing each store in parallel > 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5033: -- Release Note: "hbase.hstore.open.and.close.threads.max" is introduced to control the number of threads for opening/closing Store and store files. > Opening/Closing store in parallel to reduce region open/close time > -- > > Key: HBASE-5033 > URL: https://issues.apache.org/jira/browse/HBASE-5033 > Project: HBase > Issue Type: Improvement >Reporter: Liyin Tang >Assignee: Liyin Tang > Fix For: 0.94.0 > > Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, > D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch > > > Region servers are opening/closing each store and each store file for every > store in sequential fashion, which may cause inefficiency to open/close > regions. > So this diff is to open/close each store in parallel in order to reduce > region open/close time. Also it would help to reduce the cluster restart time. > 1) Opening each store in parallel > 2) Loading each store file for every store in parallel > 3) Closing each store in parallel > 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184625#comment-13184625 ] Hadoop QA commented on HBASE-5033: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510282/5033-trunk.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -147 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 80 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/738//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/738//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/738//console This message is automatically generated. > Opening/Closing store in parallel to reduce region open/close time > -- > > Key: HBASE-5033 > URL: https://issues.apache.org/jira/browse/HBASE-5033 > Project: HBase > Issue Type: Improvement >Reporter: Liyin Tang >Assignee: Liyin Tang > Fix For: 0.94.0 > > Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, > D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch > > > Region servers are opening/closing each store and each store file for every > store in sequential fashion, which may cause inefficiency to open/close > regions. > So this diff is to open/close each store in parallel in order to reduce > region open/close time. Also it would help to reduce the cluster restart time. > 1) Opening each store in parallel > 2) Loading each store file for every store in parallel > 3) Closing each store in parallel > 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5179: -- Comment: was deleted (was: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510266/5179-v3.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -147 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 80 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/735//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/735//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/735//console This message is automatically generated.) > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, > hbase-5179.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5179: -- Comment: was deleted (was: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510164/hbase-5179.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -147 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 78 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/728//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/728//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/728//console This message is automatically generated.) > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, > hbase-5179.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184617#comment-13184617 ] Zhihong Yu commented on HBASE-5177: --- The patch from Phabricator cannot be applied on TRUNK: {code} 1 out of 1 hunk FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/client/HTable.java.rej patching file src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java Hunk #1 succeeded at 22 with fuzz 2 (offset 1 line). Hunk #2 FAILED at 71. Hunk #3 FAILED at 94. Hunk #4 FAILED at 4142. 3 out of 4 hunks FAILED -- saving rejects to file src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java.rej {code} Patch for 0.89-fb doesn't have to be attached here. Attaching patch for TRUNK would allow TRUNK to be in sync with 0.89-fb Cheers > HTable needs a non cached version of getRegionLocation > -- > > Key: HBASE-5177 > URL: https://issues.apache.org/jira/browse/HBASE-5177 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.90.4 >Reporter: Pritam Damania >Assignee: Pritam Damania >Priority: Minor > Attachments: HBASE-5177.D1197.2.patch > > > There is a need for a non caching version of getRegionLocation > on the client side. This API is needed to quickly lookup the regionserver > that hosts a particular region without using the heavy weight > getRegionsInfo() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184612#comment-13184612 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510277/5179-v4.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -147 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 79 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/737//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/737//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/737//console This message is automatically generated. > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, > hbase-5179.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5033: -- Attachment: HBASE-5033.patch Resubmit the patch. Thanks Ted for correction. > Opening/Closing store in parallel to reduce region open/close time > -- > > Key: HBASE-5033 > URL: https://issues.apache.org/jira/browse/HBASE-5033 > Project: HBase > Issue Type: Improvement >Reporter: Liyin Tang >Assignee: Liyin Tang > Fix For: 0.94.0 > > Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, > D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch > > > Region servers are opening/closing each store and each store file for every > store in sequential fashion, which may cause inefficiency to open/close > regions. > So this diff is to open/close each store in parallel in order to reduce > region open/close time. Also it would help to reduce the cluster restart time. > 1) Opening each store in parallel > 2) Loading each store file for every store in parallel > 3) Closing each store in parallel > 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5033: -- Attachment: (was: HBASE-5033-apach-trunk.patch) > Opening/Closing store in parallel to reduce region open/close time > -- > > Key: HBASE-5033 > URL: https://issues.apache.org/jira/browse/HBASE-5033 > Project: HBase > Issue Type: Improvement >Reporter: Liyin Tang >Assignee: Liyin Tang > Fix For: 0.94.0 > > Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, > D933.3.patch, D933.4.patch, D933.5.patch > > > Region servers are opening/closing each store and each store file for every > store in sequential fashion, which may cause inefficiency to open/close > regions. > So this diff is to open/close each store in parallel in order to reduce > region open/close time. Also it would help to reduce the cluster restart time. > 1) Opening each store in parallel > 2) Loading each store file for every store in parallel > 3) Closing each store in parallel > 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184609#comment-13184609 ] Pritam Damania commented on HBASE-5177: --- @Zhihong Yu : I think Phabricator already attached the patch automatically. Do I still need to attach it separately ? > HTable needs a non cached version of getRegionLocation > -- > > Key: HBASE-5177 > URL: https://issues.apache.org/jira/browse/HBASE-5177 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.90.4 >Reporter: Pritam Damania >Assignee: Pritam Damania >Priority: Minor > Attachments: HBASE-5177.D1197.2.patch > > > There is a need for a non caching version of getRegionLocation > on the client side. This API is needed to quickly lookup the regionserver > that hosts a particular region without using the heavy weight > getRegionsInfo() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5177) HTable needs a non cached version of getRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritam Damania updated HBASE-5177: -- Attachment: (was: getRegionLocationNonCaching89fb.patch) > HTable needs a non cached version of getRegionLocation > -- > > Key: HBASE-5177 > URL: https://issues.apache.org/jira/browse/HBASE-5177 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.90.4 >Reporter: Pritam Damania >Assignee: Pritam Damania >Priority: Minor > Attachments: HBASE-5177.D1197.2.patch > > > There is a need for a non caching version of getRegionLocation > on the client side. This API is needed to quickly lookup the regionserver > that hosts a particular region without using the heavy weight > getRegionsInfo() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184608#comment-13184608 ] Zhihong Yu commented on HBASE-5033: --- I doubt: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/HConstants.java b/src/main/java/org/apache/hadoop/hbase/HConstants.java index 5120a3c..fcb024b 100644 --- a/src/main/java/org/apache/hadoop/hbase/HConstants.java +++ b/src/main/java/org/apache/hadoop/hbase/HConstants.java {code} > Opening/Closing store in parallel to reduce region open/close time > -- > > Key: HBASE-5033 > URL: https://issues.apache.org/jira/browse/HBASE-5033 > Project: HBase > Issue Type: Improvement >Reporter: Liyin Tang >Assignee: Liyin Tang > Fix For: 0.94.0 > > Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, > D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch > > > Region servers are opening/closing each store and each store file for every > store in sequential fashion, which may cause inefficiency to open/close > regions. > So this diff is to open/close each store in parallel in order to reduce > region open/close time. Also it would help to reduce the cluster restart time. > 1) Opening each store in parallel > 2) Loading each store file for every store in parallel > 3) Closing each store in parallel > 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5177) HTable needs a non cached version of getRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritam Damania updated HBASE-5177: -- Attachment: getRegionLocationNonCaching89fb.patch This patch is for the 89fb branch. > HTable needs a non cached version of getRegionLocation > -- > > Key: HBASE-5177 > URL: https://issues.apache.org/jira/browse/HBASE-5177 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.90.4 >Reporter: Pritam Damania >Assignee: Pritam Damania >Priority: Minor > Attachments: HBASE-5177.D1197.2.patch, > getRegionLocationNonCaching89fb.patch > > > There is a need for a non caching version of getRegionLocation > on the client side. This API is needed to quickly lookup the regionserver > that hosts a particular region without using the heavy weight > getRegionsInfo() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184603#comment-13184603 ] Liyin Tang commented on HBASE-5033: --- Thanks Ted. BTW, I do use --no-prefix for this recently submitted patch. > Opening/Closing store in parallel to reduce region open/close time > -- > > Key: HBASE-5033 > URL: https://issues.apache.org/jira/browse/HBASE-5033 > Project: HBase > Issue Type: Improvement >Reporter: Liyin Tang >Assignee: Liyin Tang > Fix For: 0.94.0 > > Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, > D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch > > > Region servers are opening/closing each store and each store file for every > store in sequential fashion, which may cause inefficiency to open/close > regions. > So this diff is to open/close each store in parallel in order to reduce > region open/close time. Also it would help to reduce the cluster restart time. > 1) Opening each store in parallel > 2) Loading each store file for every store in parallel > 3) Closing each store in parallel > 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184595#comment-13184595 ] Zhihong Yu commented on HBASE-5177: --- @Pritam: Can you attach the latest patch here so that Hadoop QA can run through it ? Remember to use '--no-prefix' Thanks > HTable needs a non cached version of getRegionLocation > -- > > Key: HBASE-5177 > URL: https://issues.apache.org/jira/browse/HBASE-5177 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.90.4 >Reporter: Pritam Damania >Assignee: Pritam Damania >Priority: Minor > Attachments: HBASE-5177.D1197.2.patch > > > There is a need for a non caching version of getRegionLocation > on the client side. This API is needed to quickly lookup the regionserver > that hosts a particular region without using the heavy weight > getRegionsInfo() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184592#comment-13184592 ] Phabricator commented on HBASE-5177: tedyu has accepted the revision "HBASE-5177 [jira] Add a non-caching version of getRegionLocation.". Thanks for the explanation. REVISION DETAIL https://reviews.facebook.net/D1197 > HTable needs a non cached version of getRegionLocation > -- > > Key: HBASE-5177 > URL: https://issues.apache.org/jira/browse/HBASE-5177 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.90.4 >Reporter: Pritam Damania >Assignee: Pritam Damania >Priority: Minor > Attachments: HBASE-5177.D1197.2.patch > > > There is a need for a non caching version of getRegionLocation > on the client side. This API is needed to quickly lookup the regionserver > that hosts a particular region without using the heavy weight > getRegionsInfo() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5183) Render the monitored tasks as a treeview
Render the monitored tasks as a treeview Key: HBASE-5183 URL: https://issues.apache.org/jira/browse/HBASE-5183 Project: HBase Issue Type: Sub-task Reporter: Zhihong Yu Andy made the suggestion here: https://issues.apache.org/jira/browse/HBASE-5174?focusedCommentId=13184571&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13184571 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184588#comment-13184588 ] Phabricator commented on HBASE-5177: pritamdamania has commented on the revision "HBASE-5177 [jira] Add a non-caching version of getRegionLocation.". INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java:4164 That doesn't really matter in this part of the code right ? Since the region has not moved till now. Irrespective of the order of the calls, both results would be same correct ? The variables addrCache and addrNoCache refer to the type of method being invoked. REVISION DETAIL https://reviews.facebook.net/D1197 > HTable needs a non cached version of getRegionLocation > -- > > Key: HBASE-5177 > URL: https://issues.apache.org/jira/browse/HBASE-5177 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.90.4 >Reporter: Pritam Damania >Assignee: Pritam Damania >Priority: Minor > Attachments: HBASE-5177.D1197.2.patch > > > There is a need for a non caching version of getRegionLocation > on the client side. This API is needed to quickly lookup the regionserver > that hosts a particular region without using the heavy weight > getRegionsInfo() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184587#comment-13184587 ] Phabricator commented on HBASE-5177: tedyu has commented on the revision "HBASE-5177 [jira] Add a non-caching version of getRegionLocation.". INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java:4164 So that the call on line 4162 can fetch from cache. REVISION DETAIL https://reviews.facebook.net/D1197 > HTable needs a non cached version of getRegionLocation > -- > > Key: HBASE-5177 > URL: https://issues.apache.org/jira/browse/HBASE-5177 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.90.4 >Reporter: Pritam Damania >Assignee: Pritam Damania >Priority: Minor > Attachments: HBASE-5177.D1197.2.patch > > > There is a need for a non caching version of getRegionLocation > on the client side. This API is needed to quickly lookup the regionserver > that hosts a particular region without using the heavy weight > getRegionsInfo() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5033: -- Attachment: 5033-trunk.txt Resolved a conflict in HRegion.java In the future, please use --no-prefix to generate patch so that Hadoop QA can test it. > Opening/Closing store in parallel to reduce region open/close time > -- > > Key: HBASE-5033 > URL: https://issues.apache.org/jira/browse/HBASE-5033 > Project: HBase > Issue Type: Improvement >Reporter: Liyin Tang >Assignee: Liyin Tang > Fix For: 0.94.0 > > Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, > D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch > > > Region servers are opening/closing each store and each store file for every > store in sequential fashion, which may cause inefficiency to open/close > regions. > So this diff is to open/close each store in parallel in order to reduce > region open/close time. Also it would help to reduce the cluster restart time. > 1) Opening each store in parallel > 2) Loading each store file for every store in parallel > 3) Closing each store in parallel > 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184583#comment-13184583 ] Phabricator commented on HBASE-5177: pritamdamania has commented on the revision "HBASE-5177 [jira] Add a non-caching version of getRegionLocation.". INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java:4164 Why do you think so ? How does the order affect this part of the code ? REVISION DETAIL https://reviews.facebook.net/D1197 > HTable needs a non cached version of getRegionLocation > -- > > Key: HBASE-5177 > URL: https://issues.apache.org/jira/browse/HBASE-5177 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.90.4 >Reporter: Pritam Damania >Assignee: Pritam Damania >Priority: Minor > Attachments: HBASE-5177.D1197.2.patch > > > There is a need for a non caching version of getRegionLocation > on the client side. This API is needed to quickly lookup the regionserver > that hosts a particular region without using the heavy weight > getRegionsInfo() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
[ https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184578#comment-13184578 ] Hadoop QA commented on HBASE-2600: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510274/0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 20 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -147 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 80 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/736//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/736//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/736//console This message is automatically generated. > Change how we do meta tables; from tablename+STARTROW+randomid to instead, > tablename+ENDROW+randomid > > > Key: HBASE-2600 > URL: https://issues.apache.org/jira/browse/HBASE-2600 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Alex Newman > Attachments: > 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, > 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch > > > This is an idea that Ryan and I have been kicking around on and off for a > while now. > If regionnames were made of tablename+endrow instead of tablename+startrow, > then in the metatables, doing a search for the region that contains the > wanted row, we'd just have to open a scanner using passed row and the first > row found by the scan would be that of the region we need (If offlined > parent, we'd have to scan to the next row). > If we redid the meta tables in this format, we'd be using an access that is > natural to hbase, a scan as opposed to the perverse, expensive > getClosestRowBefore we currently have that has to walk backward in meta > finding a containing region. > This issue is about changing the way we name regions. > If we were using scans, prewarming client cache would be near costless (as > opposed to what we'll currently have to do which is first a > getClosestRowBefore and then a scan from the closestrowbefore forward). > Converting to the new method, we'd have to run a migration on startup > changing the content in meta. > Up to this, the randomid component of a region name has been the timestamp of > region creation. HBASE-2531 "32-bit encoding of regionnames waaay > too susceptible to hash clashes" proposes changing the randomid so that it > contains actual name of the directory in the filesystem that hosts the > region. If we had this in place, I think it would help with the migration to > this new way of doing the meta because as is, the region name in fs is a hash > of regionname... changing the format of the regionname would mean we generate > a different hash... so we'd need hbase-2531 to be in place before we could do > this change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5174) Coalesce aborted tasks in the TaskMonitor
[ https://issues.apache.org/jira/browse/HBASE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184571#comment-13184571 ] Andrew Purtell commented on HBASE-5174: --- Render the monitored tasks as a treeview, with something like http://jquery.bassistance.de/treeview/ ? While building the tree, put entries with identical text one level down, as soon as you see something different, move back up to toplevel? Render fully collapsed? > Coalesce aborted tasks in the TaskMonitor > - > > Key: HBASE-5174 > URL: https://issues.apache.org/jira/browse/HBASE-5174 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.92.0 >Reporter: Jean-Daniel Cryans > Fix For: 0.94.0, 0.92.1 > > > Some tasks can get repeatedly canceled like flushing when splitting is going > on, in the logs it looks like this: > {noformat} > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > 2012-01-10 19:28:29,164 DEBUG > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up > because memory above low water=1.6g > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > 2012-01-10 19:28:29,164 DEBUG > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up > because memory above low water=1.6g > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > {noformat} > But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the > regions. Basically 1000x: > {noformat} > Tue Jan 10 19:28:29 UTC 2012 Flushing > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. ABORTED (since 31sec > ago) Not flushing since writes not enabled (since 31sec ago) > {noformat} > It's ugly and I'm sure some users will freak out seeing this, plus you have > to scroll down all the way to see your regions. Coalescing consecutive > aborted tasks seems like a good solution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5033: -- Fix Version/s: 0.94.0 > Opening/Closing store in parallel to reduce region open/close time > -- > > Key: HBASE-5033 > URL: https://issues.apache.org/jira/browse/HBASE-5033 > Project: HBase > Issue Type: Improvement >Reporter: Liyin Tang >Assignee: Liyin Tang > Fix For: 0.94.0 > > Attachments: 5033.txt, D933.1.patch, D933.2.patch, D933.3.patch, > D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch > > > Region servers are opening/closing each store and each store file for every > store in sequential fashion, which may cause inefficiency to open/close > regions. > So this diff is to open/close each store in parallel in order to reduce > region open/close time. Also it would help to reduce the cluster restart time. > 1) Opening each store in parallel > 2) Loading each store file for every store in parallel > 3) Closing each store in parallel > 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5033: -- Attachment: HBASE-5033-apach-trunk.patch 1) Based on the recent trunk and generate the patch --no-prefix 2) The default number of thread is set to 1. 3) Performance evaluation: the performance will be vary for different cluster environment such as the number of regions and the number of store files for each region. The simple restart test shows the single region server (22 regions) restart time decreased from 78 sec to 55 sec So this will roughly save about 29% region server restart time. Also the cluster (100 nodes) restart time decreased from 316 secs to 189 secs, which has saved around 40% restart time. > Opening/Closing store in parallel to reduce region open/close time > -- > > Key: HBASE-5033 > URL: https://issues.apache.org/jira/browse/HBASE-5033 > Project: HBase > Issue Type: Improvement >Reporter: Liyin Tang >Assignee: Liyin Tang > Attachments: 5033.txt, D933.1.patch, D933.2.patch, D933.3.patch, > D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch > > > Region servers are opening/closing each store and each store file for every > store in sequential fashion, which may cause inefficiency to open/close > regions. > So this diff is to open/close each store in parallel in order to reduce > region open/close time. Also it would help to reduce the cluster restart time. > 1) Opening each store in parallel > 2) Loading each store file for every store in parallel > 3) Closing each store in parallel > 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5182) TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly
[ https://issues.apache.org/jira/browse/HBASE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184565#comment-13184565 ] Hudson commented on HBASE-5182: --- Integrated in HBase-TRUNK #2622 (See [https://builds.apache.org/job/HBase-TRUNK/2622/]) HBASE-5182 TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift/TBoundedThreadPoolServer.java > TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly > -- > > Key: HBASE-5182 > URL: https://issues.apache.org/jira/browse/HBASE-5182 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Scott Chen >Assignee: Scott Chen >Priority: Minor > Fix For: 0.94.0 > > Attachments: hbase-5182.txt > > > TBoundedThreadPoolServer does not take the configured threadKeepAliveTimeSec. > It uses the default value instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5174) Coalesce aborted tasks in the TaskMonitor
[ https://issues.apache.org/jira/browse/HBASE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184562#comment-13184562 ] Zhihong Yu commented on HBASE-5174: --- I think the MonitoredTask display should be placed under region server section. > Coalesce aborted tasks in the TaskMonitor > - > > Key: HBASE-5174 > URL: https://issues.apache.org/jira/browse/HBASE-5174 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.92.0 >Reporter: Jean-Daniel Cryans > Fix For: 0.94.0, 0.92.1 > > > Some tasks can get repeatedly canceled like flushing when splitting is going > on, in the logs it looks like this: > {noformat} > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > 2012-01-10 19:28:29,164 DEBUG > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up > because memory above low water=1.6g > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > 2012-01-10 19:28:29,164 DEBUG > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up > because memory above low water=1.6g > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > {noformat} > But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the > regions. Basically 1000x: > {noformat} > Tue Jan 10 19:28:29 UTC 2012 Flushing > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. ABORTED (since 31sec > ago) Not flushing since writes not enabled (since 31sec ago) > {noformat} > It's ugly and I'm sure some users will freak out seeing this, plus you have > to scroll down all the way to see your regions. Coalescing consecutive > aborted tasks seems like a good solution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184561#comment-13184561 ] Phabricator commented on HBASE-5177: tedyu has commented on the revision "HBASE-5177 [jira] Add a non-caching version of getRegionLocation.". INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java:4164 I think this call should be placed before the call on line 4162. REVISION DETAIL https://reviews.facebook.net/D1197 > HTable needs a non cached version of getRegionLocation > -- > > Key: HBASE-5177 > URL: https://issues.apache.org/jira/browse/HBASE-5177 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.90.4 >Reporter: Pritam Damania >Assignee: Pritam Damania >Priority: Minor > Attachments: HBASE-5177.D1197.2.patch > > > There is a need for a non caching version of getRegionLocation > on the client side. This API is needed to quickly lookup the regionserver > that hosts a particular region without using the heavy weight > getRegionsInfo() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5033: -- Attachment: (was: HBASE-5033-apach-trunk.patch) > Opening/Closing store in parallel to reduce region open/close time > -- > > Key: HBASE-5033 > URL: https://issues.apache.org/jira/browse/HBASE-5033 > Project: HBase > Issue Type: Improvement >Reporter: Liyin Tang >Assignee: Liyin Tang > Attachments: 5033.txt, D933.1.patch, D933.2.patch, D933.3.patch, > D933.4.patch, D933.5.patch > > > Region servers are opening/closing each store and each store file for every > store in sequential fashion, which may cause inefficiency to open/close > regions. > So this diff is to open/close each store in parallel in order to reduce > region open/close time. Also it would help to reduce the cluster restart time. > 1) Opening each store in parallel > 2) Loading each store file for every store in parallel > 3) Closing each store in parallel > 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5179: -- Attachment: 5179-v4.txt Adopted getDeadServersBeingProcessed() method name. Also made it package private. Waiting for Chunhui's feedback about Stack's comments. > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, > hbase-5179.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5181) Improve error message when Master fail-over happens and ZK unassigned node contains stale znode(s)
[ https://issues.apache.org/jira/browse/HBASE-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184551#comment-13184551 ] Zhihong Yu commented on HBASE-5181: --- The message is certainly detailed :-) Please remember to replace '/hbase' with the value of zookeeper.znode.parent > Improve error message when Master fail-over happens and ZK unassigned node > contains stale znode(s) > -- > > Key: HBASE-5181 > URL: https://issues.apache.org/jira/browse/HBASE-5181 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.92.0, 0.90.5 >Reporter: Mubarak Seyed >Assignee: Mubarak Seyed >Priority: Minor > Labels: noob > > When master fail-over happens, if we have number of RITs under > /hbase/unassigned and if we have stale znode(s) (encoded region names) under > /hbase/unassigned, we are getting > {code} > 2011-12-30 10:27:35,623 INFO org.apache.hadoop.hbase.master.HMaster: Master > startup proceeding: master failover > 2011-12-30 10:27:36,002 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to > process 1717 regions in transition > 2011-12-30 10:27:36,004 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > java.lang.ArrayIndexOutOfBoundsException: -256 > at > org.apache.hadoop.hbase.executor.RegionTransitionData.readFields(RegionTransitionData.java:148) > > at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:105) > at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) > at > org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198) > > at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:743) > at > org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransition(AssignmentManager.java:262) > > at > org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:223) > > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:401) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283) > {code} > and there is no clue on how to clean-up the stale znode(s) from unassigned > using zkCli.sh (del /hbase/unassigned/). It would be good if > we include the bad region name in IOException from > RegionTransitionData.readFields(). > {code} > @Override > public void readFields(DataInput in) throws IOException { > // the event type byte > eventType = EventType.values()[in.readShort()]; > // the timestamp > stamp = in.readLong(); > // the encoded name of the region being transitioned > regionName = Bytes.readByteArray(in); > // remaining fields are optional so prefixed with boolean > // the name of the regionserver sending the data > if (in.readBoolean()) { > byte [] versionedBytes = Bytes.readByteArray(in); > this.origin = ServerName.parseVersionedServerName(versionedBytes); > } > if (in.readBoolean()) { > this.payload = Bytes.readByteArray(in); > } > } > {code} > If the code execution has survived until regionName then we can include the > regionName in IOException with error message to clean-up the stale znode(s) > under /hbase/unassigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5174) Coalesce aborted tasks in the TaskMonitor
[ https://issues.apache.org/jira/browse/HBASE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184547#comment-13184547 ] Zhihong Yu commented on HBASE-5174: --- Looks like I didn't take State of MonitoredTask into account. Personally I think seeing the latest status for a MonitoredTask is fine. To dig deeper, log is always the place to check. Map>> is easy to confuse a few people reading the code :-) > Coalesce aborted tasks in the TaskMonitor > - > > Key: HBASE-5174 > URL: https://issues.apache.org/jira/browse/HBASE-5174 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.92.0 >Reporter: Jean-Daniel Cryans > Fix For: 0.94.0, 0.92.1 > > > Some tasks can get repeatedly canceled like flushing when splitting is going > on, in the logs it looks like this: > {noformat} > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > 2012-01-10 19:28:29,164 DEBUG > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up > because memory above low water=1.6g > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > 2012-01-10 19:28:29,164 DEBUG > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up > because memory above low water=1.6g > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > {noformat} > But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the > regions. Basically 1000x: > {noformat} > Tue Jan 10 19:28:29 UTC 2012 Flushing > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. ABORTED (since 31sec > ago) Not flushing since writes not enabled (since 31sec ago) > {noformat} > It's ugly and I'm sure some users will freak out seeing this, plus you have > to scroll down all the way to see your regions. Coalescing consecutive > aborted tasks seems like a good solution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
[ https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184545#comment-13184545 ] jirapos...@reviews.apache.org commented on HBASE-2600: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3466/ --- Review request for hbase and Michael Stack. Summary --- This is an idea that Ryan and I have been kicking around on and off for a while now. If regionnames were made of tablename+endrow instead of tablename+startrow, then in the metatables, doing a search for the region that contains the wanted row, we'd just have to open a scanner using passed row and the first row found by the scan would be that of the region we need (If offlined parent, we'd have to scan to the next row). If we redid the meta tables in this format, we'd be using an access that is natural to hbase, a scan as opposed to the perverse, expensive getClosestRowBefore we currently have that has to walk backward in meta finding a containing region. This issue is about changing the way we name regions. If we were using scans, prewarming client cache would be near costless (as opposed to what we'll currently have to do which is first a getClosestRowBefore and then a scan from the closestrowbefore forward). Converting to the new method, we'd have to run a migration on startup changing the content in meta. Up to this, the randomid component of a region name has been the timestamp of region creation. HBASE-2531 "32-bit encoding of regionnames waaay too susceptible to hash clashes" proposes changing the randomid so that it contains actual name of the directory in the filesystem that hosts the region. If we had this in place, I think it would help with the migration to this new way of doing the meta because as is, the region name in fs is a hash of regionname... changing the format of the regionname would mean we generate a different hash... so we'd need hbase-2531 to be in place before we could do this change. This addresses bug HBASE-2600. https://issues.apache.org/jira/browse/HBASE-2600 Diffs - src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 74cb821 src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 133759d src/main/java/org/apache/hadoop/hbase/KeyValue.java be7e2d8 src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java e5e60a8 src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 88c381f src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 99f90b2 src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java f0c6828 src/main/java/org/apache/hadoop/hbase/rest/RegionsResource.java bf85bc1 src/main/java/org/apache/hadoop/hbase/rest/model/TableRegionModel.java 67e7a04 src/test/java/org/apache/hadoop/hbase/TestKeyValue.java dc4ee8d src/test/java/org/apache/hadoop/hbase/regionserver/TestGetClosestAtOrBefore.java 5f97167 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java 6e1211b src/test/java/org/apache/hadoop/hbase/rest/TestStatusResource.java cffdcb6 src/test/java/org/apache/hadoop/hbase/rest/model/TestTableRegionModel.java b6f0ab5 Diff: https://reviews.apache.org/r/3466/diff Testing --- Unit tests started table. Tests in error: org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD: Table 'TestTable we searched for the StartKey: TestTable ,, startKey lastChar's int value: 32 with the stopKey: TestTable#,, stopRow lastChar's int value: 35 with parentTable:.META. I need to know how to update/recreate the tar ball which is the source for that test. Thanks, Alex > Change how we do meta tables; from tablename+STARTROW+randomid to instead, > tablename+ENDROW+randomid > > > Key: HBASE-2600 > URL: https://issues.apache.org/jira/browse/HBASE-2600 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Alex Newman > Attachments: > 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, > 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch > > > This is an idea that Ryan and I have been kicking around on and off for a > while now. > If regionnames were made of tablename+endrow instead of tablename+startrow, > then in the metatables, doing a search for the region that contains the > wanted row, we'd just have to open a scanner using passed row and the first > row found by the scan would be that of the region we need (If offlined > parent, we'd have to scan to the next row). > If w
[jira] [Updated] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
[ https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-2600: --- Status: Patch Available (was: Open) > Change how we do meta tables; from tablename+STARTROW+randomid to instead, > tablename+ENDROW+randomid > > > Key: HBASE-2600 > URL: https://issues.apache.org/jira/browse/HBASE-2600 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Alex Newman > Attachments: > 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, > 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch > > > This is an idea that Ryan and I have been kicking around on and off for a > while now. > If regionnames were made of tablename+endrow instead of tablename+startrow, > then in the metatables, doing a search for the region that contains the > wanted row, we'd just have to open a scanner using passed row and the first > row found by the scan would be that of the region we need (If offlined > parent, we'd have to scan to the next row). > If we redid the meta tables in this format, we'd be using an access that is > natural to hbase, a scan as opposed to the perverse, expensive > getClosestRowBefore we currently have that has to walk backward in meta > finding a containing region. > This issue is about changing the way we name regions. > If we were using scans, prewarming client cache would be near costless (as > opposed to what we'll currently have to do which is first a > getClosestRowBefore and then a scan from the closestrowbefore forward). > Converting to the new method, we'd have to run a migration on startup > changing the content in meta. > Up to this, the randomid component of a region name has been the timestamp of > region creation. HBASE-2531 "32-bit encoding of regionnames waaay > too susceptible to hash clashes" proposes changing the randomid so that it > contains actual name of the directory in the filesystem that hosts the > region. If we had this in place, I think it would help with the migration to > this new way of doing the meta because as is, the region name in fs is a hash > of regionname... changing the format of the regionname would mean we generate > a different hash... so we'd need hbase-2531 to be in place before we could do > this change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4616) Update hregion encoded name to reduce logic and prevent region collisions in META
[ https://issues.apache.org/jira/browse/HBASE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184542#comment-13184542 ] Alex Newman commented on HBASE-4616: https://reviews.apache.org/r/3466/ > Update hregion encoded name to reduce logic and prevent region collisions in > META > - > > Key: HBASE-4616 > URL: https://issues.apache.org/jira/browse/HBASE-4616 > Project: HBase > Issue Type: Umbrella >Reporter: Alex Newman >Assignee: Alex Newman > Attachments: HBASE-4616-v2.patch, HBASE-4616-v3.patch, > HBASE-4616.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4616) Update hregion encoded name to reduce logic and prevent region collisions in META
[ https://issues.apache.org/jira/browse/HBASE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184541#comment-13184541 ] Alex Newman commented on HBASE-4616: https://reviews.apache.org/r/3466/ > Update hregion encoded name to reduce logic and prevent region collisions in > META > - > > Key: HBASE-4616 > URL: https://issues.apache.org/jira/browse/HBASE-4616 > Project: HBase > Issue Type: Umbrella >Reporter: Alex Newman >Assignee: Alex Newman > Attachments: HBASE-4616-v2.patch, HBASE-4616-v3.patch, > HBASE-4616.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
[ https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184543#comment-13184543 ] Alex Newman commented on HBASE-2600: There's lots of discussion https://issues.apache.org/jira/browse/HBASE-4616 as well > Change how we do meta tables; from tablename+STARTROW+randomid to instead, > tablename+ENDROW+randomid > > > Key: HBASE-2600 > URL: https://issues.apache.org/jira/browse/HBASE-2600 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Alex Newman > Attachments: > 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, > 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch > > > This is an idea that Ryan and I have been kicking around on and off for a > while now. > If regionnames were made of tablename+endrow instead of tablename+startrow, > then in the metatables, doing a search for the region that contains the > wanted row, we'd just have to open a scanner using passed row and the first > row found by the scan would be that of the region we need (If offlined > parent, we'd have to scan to the next row). > If we redid the meta tables in this format, we'd be using an access that is > natural to hbase, a scan as opposed to the perverse, expensive > getClosestRowBefore we currently have that has to walk backward in meta > finding a containing region. > This issue is about changing the way we name regions. > If we were using scans, prewarming client cache would be near costless (as > opposed to what we'll currently have to do which is first a > getClosestRowBefore and then a scan from the closestrowbefore forward). > Converting to the new method, we'd have to run a migration on startup > changing the content in meta. > Up to this, the randomid component of a region name has been the timestamp of > region creation. HBASE-2531 "32-bit encoding of regionnames waaay > too susceptible to hash clashes" proposes changing the randomid so that it > contains actual name of the directory in the filesystem that hosts the > region. If we had this in place, I think it would help with the migration to > this new way of doing the meta because as is, the region name in fs is a hash > of regionname... changing the format of the regionname would mean we generate > a different hash... so we'd need hbase-2531 to be in place before we could do > this change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
[ https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-2600: --- Attachment: 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch > Change how we do meta tables; from tablename+STARTROW+randomid to instead, > tablename+ENDROW+randomid > > > Key: HBASE-2600 > URL: https://issues.apache.org/jira/browse/HBASE-2600 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Alex Newman > Attachments: > 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, > 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch > > > This is an idea that Ryan and I have been kicking around on and off for a > while now. > If regionnames were made of tablename+endrow instead of tablename+startrow, > then in the metatables, doing a search for the region that contains the > wanted row, we'd just have to open a scanner using passed row and the first > row found by the scan would be that of the region we need (If offlined > parent, we'd have to scan to the next row). > If we redid the meta tables in this format, we'd be using an access that is > natural to hbase, a scan as opposed to the perverse, expensive > getClosestRowBefore we currently have that has to walk backward in meta > finding a containing region. > This issue is about changing the way we name regions. > If we were using scans, prewarming client cache would be near costless (as > opposed to what we'll currently have to do which is first a > getClosestRowBefore and then a scan from the closestrowbefore forward). > Converting to the new method, we'd have to run a migration on startup > changing the content in meta. > Up to this, the randomid component of a region name has been the timestamp of > region creation. HBASE-2531 "32-bit encoding of regionnames waaay > too susceptible to hash clashes" proposes changing the randomid so that it > contains actual name of the directory in the filesystem that hosts the > region. If we had this in place, I think it would help with the migration to > this new way of doing the meta because as is, the region name in fs is a hash > of regionname... changing the format of the regionname would mean we generate > a different hash... so we'd need hbase-2531 to be in place before we could do > this change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
[ https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-2600: --- Issue Type: Bug (was: Sub-task) Parent: (was: HBASE-4616) > Change how we do meta tables; from tablename+STARTROW+randomid to instead, > tablename+ENDROW+randomid > > > Key: HBASE-2600 > URL: https://issues.apache.org/jira/browse/HBASE-2600 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Alex Newman > Attachments: > 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, > 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch > > > This is an idea that Ryan and I have been kicking around on and off for a > while now. > If regionnames were made of tablename+endrow instead of tablename+startrow, > then in the metatables, doing a search for the region that contains the > wanted row, we'd just have to open a scanner using passed row and the first > row found by the scan would be that of the region we need (If offlined > parent, we'd have to scan to the next row). > If we redid the meta tables in this format, we'd be using an access that is > natural to hbase, a scan as opposed to the perverse, expensive > getClosestRowBefore we currently have that has to walk backward in meta > finding a containing region. > This issue is about changing the way we name regions. > If we were using scans, prewarming client cache would be near costless (as > opposed to what we'll currently have to do which is first a > getClosestRowBefore and then a scan from the closestrowbefore forward). > Converting to the new method, we'd have to run a migration on startup > changing the content in meta. > Up to this, the randomid component of a region name has been the timestamp of > region creation. HBASE-2531 "32-bit encoding of regionnames waaay > too susceptible to hash clashes" proposes changing the randomid so that it > contains actual name of the directory in the filesystem that hosts the > region. If we had this in place, I think it would help with the migration to > this new way of doing the meta because as is, the region name in fs is a hash > of regionname... changing the format of the regionname would mean we generate > a different hash... so we'd need hbase-2531 to be in place before we could do > this change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184536#comment-13184536 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510266/5179-v3.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -147 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 80 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/735//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/735//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/735//console This message is automatically generated. > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, hbase-5179.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5182) TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly
[ https://issues.apache.org/jira/browse/HBASE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184534#comment-13184534 ] Hadoop QA commented on HBASE-5182: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510265/hbase-5182.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -147 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 79 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/734//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/734//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/734//console This message is automatically generated. > TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly > -- > > Key: HBASE-5182 > URL: https://issues.apache.org/jira/browse/HBASE-5182 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Scott Chen >Assignee: Scott Chen >Priority: Minor > Fix For: 0.94.0 > > Attachments: hbase-5182.txt > > > TBoundedThreadPoolServer does not take the configured threadKeepAliveTimeSec. > It uses the default value instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184533#comment-13184533 ] stack commented on HBASE-5179: -- bq. I think the reason Chunhui introduced a new Set for the dead servers being processed is that DeadServer is supposed to remember dead servers Yeah, I seem to remember such a need but I'd think we should doc' it up some more in DeadServer so next person in here looking at code has a chance figuring whats up. On v3: {code} getDeadServersUnderProcessing {code} is still public and I think it should be named getDeadServersBeingProcessed ... or BeingHandled... or better so it matches areDeadServersInProgress, getDeadServersInProgress.. they are in the process of being made into DeadServers!!! (and there is missing javadoc explaining what this method is at least relative to getDeadServers -- that its servers that are going through ServerShutdownHandler processing). Does this method need to be in the Interface for ServerManager (The less in the Interface the better)? knownServers should be onlineServers which makes me think that this check for DeadServersInProgress should be made inside in ServerManager so that what comes out of getOnlineServers has already had the InProgress servers stripped? Do you think we need that the new Collection deadServersUnderProcessing should instead be called inProgress... and a server is in either inProgress or its in the deadServers list? On remove, it gets moved (under synchronize) from one list to the other. > Concurrent processing of processFaileOver and ServerShutdownHandler may > cause region is assigned before completing split log, it would cause data loss > --- > > Key: HBASE-5179 > URL: https://issues.apache.org/jira/browse/HBASE-5179 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, hbase-5179.patch > > > If master's processing its failover and ServerShutdownHandler's processing > happen concurrently, it may appear following case. > 1.master completed splitLogAfterStartup() > 2.RegionserverA restarts, and ServerShutdownHandler is processing. > 3.master starts to rebuildUserRegions, and RegionserverA is considered as > dead server. > 4.master starts to assign regions of RegionserverA because it is a dead > server by step3. > However, when doing step4(assigning region), ServerShutdownHandler may be > doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5174) Coalesce aborted tasks in the TaskMonitor
[ https://issues.apache.org/jira/browse/HBASE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184532#comment-13184532 ] Todd Lipcon commented on HBASE-5174: There's no guarantee that Object.hashCode() is unique - just that it's usually unique. Would rather coalesce by actual identity (WeakIdentityHashMap?) or by some string (eg region id) than use hashcode. > Coalesce aborted tasks in the TaskMonitor > - > > Key: HBASE-5174 > URL: https://issues.apache.org/jira/browse/HBASE-5174 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.92.0 >Reporter: Jean-Daniel Cryans > Fix For: 0.94.0, 0.92.1 > > > Some tasks can get repeatedly canceled like flushing when splitting is going > on, in the logs it looks like this: > {noformat} > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > 2012-01-10 19:28:29,164 DEBUG > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up > because memory above low water=1.6g > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > 2012-01-10 19:28:29,164 DEBUG > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up > because memory above low water=1.6g > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > {noformat} > But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the > regions. Basically 1000x: > {noformat} > Tue Jan 10 19:28:29 UTC 2012 Flushing > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. ABORTED (since 31sec > ago) Not flushing since writes not enabled (since 31sec ago) > {noformat} > It's ugly and I'm sure some users will freak out seeing this, plus you have > to scroll down all the way to see your regions. Coalescing consecutive > aborted tasks seems like a good solution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5177) HTable needs a non cached version of getRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5177: --- Attachment: HBASE-5177.D1197.2.patch pritamdamania updated the revision "HBASE-5177 [jira] Add a non-caching version of getRegionLocation.". Reviewers: Kannan, nspiegelberg, JIRA 1) Addressing Ted's comment. REVISION DETAIL https://reviews.facebook.net/D1197 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/client/HTable.java src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java > HTable needs a non cached version of getRegionLocation > -- > > Key: HBASE-5177 > URL: https://issues.apache.org/jira/browse/HBASE-5177 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.90.4 >Reporter: Pritam Damania >Assignee: Pritam Damania >Priority: Minor > Attachments: HBASE-5177.D1197.2.patch > > > There is a need for a non caching version of getRegionLocation > on the client side. This API is needed to quickly lookup the regionserver > that hosts a particular region without using the heavy weight > getRegionsInfo() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5174) Coalesce aborted tasks in the TaskMonitor
[ https://issues.apache.org/jira/browse/HBASE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184520#comment-13184520 ] Jean-Daniel Cryans commented on HBASE-5174: --- Same as in HBASE-5136, I think we need to know something was aborted. Overwriting it will make it seem that nothing wrong's happening. Then add coalescing to make sure you only have 1 aborted and not a flood. > Coalesce aborted tasks in the TaskMonitor > - > > Key: HBASE-5174 > URL: https://issues.apache.org/jira/browse/HBASE-5174 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.92.0 >Reporter: Jean-Daniel Cryans > Fix For: 0.94.0, 0.92.1 > > > Some tasks can get repeatedly canceled like flushing when splitting is going > on, in the logs it looks like this: > {noformat} > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > 2012-01-10 19:28:29,164 DEBUG > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up > because memory above low water=1.6g > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > 2012-01-10 19:28:29,164 DEBUG > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up > because memory above low water=1.6g > 2012-01-10 19:28:29,164 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap > pressure > 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > NOT flushing memstore for region > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, > writesEnabled=false > {noformat} > But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the > regions. Basically 1000x: > {noformat} > Tue Jan 10 19:28:29 UTC 2012 Flushing > test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. ABORTED (since 31sec > ago) Not flushing since writes not enabled (since 31sec ago) > {noformat} > It's ugly and I'm sure some users will freak out seeing this, plus you have > to scroll down all the way to see your regions. Coalescing consecutive > aborted tasks seems like a good solution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5181) Improve error message when Master fail-over happens and ZK unassigned node contains stale znode(s)
[ https://issues.apache.org/jira/browse/HBASE-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184514#comment-13184514 ] Mubarak Seyed commented on HBASE-5181: -- Is there any suggestion on error message? How about throw new IOException("There could be a stale region-in-transition in ZK." + " The bad region is " + Bytes.toString(regionName) + ". Try deleting the region-in-transition using 'del /hbase/unassigned/" + Bytes.toString(regionName) + "' command over a ZK connection (in zkCli.sh)", ioe); > Improve error message when Master fail-over happens and ZK unassigned node > contains stale znode(s) > -- > > Key: HBASE-5181 > URL: https://issues.apache.org/jira/browse/HBASE-5181 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.92.0, 0.90.5 >Reporter: Mubarak Seyed >Assignee: Mubarak Seyed >Priority: Minor > Labels: noob > > When master fail-over happens, if we have number of RITs under > /hbase/unassigned and if we have stale znode(s) (encoded region names) under > /hbase/unassigned, we are getting > {code} > 2011-12-30 10:27:35,623 INFO org.apache.hadoop.hbase.master.HMaster: Master > startup proceeding: master failover > 2011-12-30 10:27:36,002 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to > process 1717 regions in transition > 2011-12-30 10:27:36,004 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > java.lang.ArrayIndexOutOfBoundsException: -256 > at > org.apache.hadoop.hbase.executor.RegionTransitionData.readFields(RegionTransitionData.java:148) > > at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:105) > at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) > at > org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198) > > at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:743) > at > org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransition(AssignmentManager.java:262) > > at > org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:223) > > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:401) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283) > {code} > and there is no clue on how to clean-up the stale znode(s) from unassigned > using zkCli.sh (del /hbase/unassigned/). It would be good if > we include the bad region name in IOException from > RegionTransitionData.readFields(). > {code} > @Override > public void readFields(DataInput in) throws IOException { > // the event type byte > eventType = EventType.values()[in.readShort()]; > // the timestamp > stamp = in.readLong(); > // the encoded name of the region being transitioned > regionName = Bytes.readByteArray(in); > // remaining fields are optional so prefixed with boolean > // the name of the regionserver sending the data > if (in.readBoolean()) { > byte [] versionedBytes = Bytes.readByteArray(in); > this.origin = ServerName.parseVersionedServerName(versionedBytes); > } > if (in.readBoolean()) { > this.payload = Bytes.readByteArray(in); > } > } > {code} > If the code execution has survived until regionName then we can include the > regionName in IOException with error message to clean-up the stale znode(s) > under /hbase/unassigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry
[ https://issues.apache.org/jira/browse/HBASE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184507#comment-13184507 ] Zhihong Yu commented on HBASE-5136: --- Then this JIRA depends on HBASE-5174. Please comment on my proposal there. The patch in this JIRA is just specialized version of my proposal for HBASE-5174. > Redundant MonitoredTask instances in case of distributed log splitting retry > > > Key: HBASE-5136 > URL: https://issues.apache.org/jira/browse/HBASE-5136 > Project: HBase > Issue Type: Task >Reporter: Zhihong Yu >Assignee: Zhihong Yu > Attachments: 5136.txt > > > In case of log splitting retry, the following code would be executed multiple > times: > {code} > public long splitLogDistributed(final List logDirs) throws > IOException { > MonitoredTask status = TaskMonitor.get().createStatus( > "Doing distributed log split in " + logDirs); > {code} > leading to multiple MonitoredTask instances. > User may get confused by multiple distributed log splitting entries for the > same region server on master UI -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184503#comment-13184503 ] Phabricator commented on HBASE-5177: tedyu has commented on the revision "HBASE-5177 [jira] Add a non-caching version of getRegionLocation.". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/client/HTable.java:268 When reload is false, this new method becomes identical to the method on line 255. Should we deprecate the method on line 255 ? REVISION DETAIL https://reviews.facebook.net/D1197 > HTable needs a non cached version of getRegionLocation > -- > > Key: HBASE-5177 > URL: https://issues.apache.org/jira/browse/HBASE-5177 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.90.4 >Reporter: Pritam Damania >Assignee: Pritam Damania >Priority: Minor > > There is a need for a non caching version of getRegionLocation > on the client side. This API is needed to quickly lookup the regionserver > that hosts a particular region without using the heavy weight > getRegionsInfo() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5053) HCM Tests leak connections
[ https://issues.apache.org/jira/browse/HBASE-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal resolved HBASE-5053. Resolution: Fixed > HCM Tests leak connections > -- > > Key: HBASE-5053 > URL: https://issues.apache.org/jira/browse/HBASE-5053 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.0 >Reporter: nkeywal >Assignee: nkeywal >Priority: Minor > Fix For: 0.94.0 > > Attachments: 5053.patch, 5053.v2.patch, 5053.v2.patch > > > There are simple leaks and one more complex. > The complex one comes from the fact fact > HConnectionManager.HConnectionImplementation keeps a *reference* to the > configuration used for the creation. So if this configuration is updated > later, the HConnectionKey created initially will differ from the current one. > As a consequence, the close() will not find the connection anymore in the > list, and the connection won't be deleted. > I added a warning when a close does not find the connection in the list; but > I wonder if we should not copy the HConnectionKey instead of keeping a > reference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4602) Make the suite run in at least half the time
[ https://issues.apache.org/jira/browse/HBASE-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal resolved HBASE-4602. Resolution: Fixed seems to be working 4 times faster now => solved > Make the suite run in at least half the time > > > Key: HBASE-4602 > URL: https://issues.apache.org/jira/browse/HBASE-4602 > Project: HBase > Issue Type: Umbrella > Environment: All. >Reporter: nkeywal >Assignee: nkeywal > Attachments: tests_list.xlsx > > > - Cutting down on the number of cluster spinups by coalescing related tests > rather than have each spin up its own cluster > - Make cluster start/stop faster > - Rewriting long-running tests so they do not need to be run on a cluster; > e.g. by instead mocking expected signals/messages > - Move long running tests out of the unit test suite to instead run as part > of the recently introduced 'integration test' step -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5030) Some tests do not close the HFile.Reader they use, leaving some file descriptors open
[ https://issues.apache.org/jira/browse/HBASE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5030: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Some tests do not close the HFile.Reader they use, leaving some file > descriptors open > - > > Key: HBASE-5030 > URL: https://issues.apache.org/jira/browse/HBASE-5030 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.0 >Reporter: nkeywal >Assignee: nkeywal >Priority: Trivial > Attachments: 5030.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5181) Improve error message when Master fail-over happens and ZK unassigned node contains stale znode(s)
[ https://issues.apache.org/jira/browse/HBASE-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu reassigned HBASE-5181: - Assignee: Mubarak Seyed > Improve error message when Master fail-over happens and ZK unassigned node > contains stale znode(s) > -- > > Key: HBASE-5181 > URL: https://issues.apache.org/jira/browse/HBASE-5181 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.92.0, 0.90.5 >Reporter: Mubarak Seyed >Assignee: Mubarak Seyed >Priority: Minor > Labels: noob > > When master fail-over happens, if we have number of RITs under > /hbase/unassigned and if we have stale znode(s) (encoded region names) under > /hbase/unassigned, we are getting > {code} > 2011-12-30 10:27:35,623 INFO org.apache.hadoop.hbase.master.HMaster: Master > startup proceeding: master failover > 2011-12-30 10:27:36,002 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to > process 1717 regions in transition > 2011-12-30 10:27:36,004 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > java.lang.ArrayIndexOutOfBoundsException: -256 > at > org.apache.hadoop.hbase.executor.RegionTransitionData.readFields(RegionTransitionData.java:148) > > at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:105) > at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) > at > org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198) > > at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:743) > at > org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransition(AssignmentManager.java:262) > > at > org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:223) > > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:401) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283) > {code} > and there is no clue on how to clean-up the stale znode(s) from unassigned > using zkCli.sh (del /hbase/unassigned/). It would be good if > we include the bad region name in IOException from > RegionTransitionData.readFields(). > {code} > @Override > public void readFields(DataInput in) throws IOException { > // the event type byte > eventType = EventType.values()[in.readShort()]; > // the timestamp > stamp = in.readLong(); > // the encoded name of the region being transitioned > regionName = Bytes.readByteArray(in); > // remaining fields are optional so prefixed with boolean > // the name of the regionserver sending the data > if (in.readBoolean()) { > byte [] versionedBytes = Bytes.readByteArray(in); > this.origin = ServerName.parseVersionedServerName(versionedBytes); > } > if (in.readBoolean()) { > this.payload = Bytes.readByteArray(in); > } > } > {code} > If the code execution has survived until regionName then we can include the > regionName in IOException with error message to clean-up the stale znode(s) > under /hbase/unassigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5177) HTable needs a non cached version of getRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184498#comment-13184498 ] Phabricator commented on HBASE-5177: pritamdamania has added reviewers to the revision "HBASE-5177 [jira] Add a non-caching version of getRegionLocation.". Added Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D1197 > HTable needs a non cached version of getRegionLocation > -- > > Key: HBASE-5177 > URL: https://issues.apache.org/jira/browse/HBASE-5177 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.90.4 >Reporter: Pritam Damania >Assignee: Pritam Damania >Priority: Minor > > There is a need for a non caching version of getRegionLocation > on the client side. This API is needed to quickly lookup the regionserver > that hosts a particular region without using the heavy weight > getRegionsInfo() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5182) TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly
[ https://issues.apache.org/jira/browse/HBASE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184485#comment-13184485 ] Scott Chen commented on HBASE-5182: --- Wow. That's super fast. Thanks, Zhihong :) > TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly > -- > > Key: HBASE-5182 > URL: https://issues.apache.org/jira/browse/HBASE-5182 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Scott Chen >Assignee: Scott Chen >Priority: Minor > Fix For: 0.94.0 > > Attachments: hbase-5182.txt > > > TBoundedThreadPoolServer does not take the configured threadKeepAliveTimeSec. > It uses the default value instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5180) [book] book.xml - fixed scanner example
[ https://issues.apache.org/jira/browse/HBASE-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184484#comment-13184484 ] Hudson commented on HBASE-5180: --- Integrated in HBase-TRUNK #2621 (See [https://builds.apache.org/job/HBase-TRUNK/2621/]) hbase-5180 book.xml - the scanner example wasn't closing the ResultScanner. That's not good practice. > [book] book.xml - fixed scanner example > --- > > Key: HBASE-5180 > URL: https://issues.apache.org/jira/browse/HBASE-5180 > Project: HBase > Issue Type: Bug >Reporter: Doug Meil >Assignee: Doug Meil > Attachments: book_HBASE_5180.xml.patch > > > book.xml - the scanner example wasn't closing the ResultScanner! that's bad > practice. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5129) book is inconsistent regarding disabling - major compaction
[ https://issues.apache.org/jira/browse/HBASE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184483#comment-13184483 ] Hudson commented on HBASE-5129: --- Integrated in HBase-TRUNK #2621 (See [https://builds.apache.org/job/HBase-TRUNK/2621/]) hbase-5129 [BOOK] configuration.xml - changed the major compaction disable instruction from Long.MAX_VALUE to 0. > book is inconsistent regarding disabling - major compaction > --- > > Key: HBASE-5129 > URL: https://issues.apache.org/jira/browse/HBASE-5129 > Project: HBase > Issue Type: Bug > Components: documentation >Affects Versions: 0.90.1 >Reporter: Mikael Sitruk >Assignee: Doug Meil >Priority: Minor > Attachments: configuration_HBASE_5129.xml.patch > > > It seems that the book has some inconsistencies regarding the way to disable > major compactions > According to the book in chapter 2.6.1.1. HBase Default Configuration > hbase.hregion.majorcompaction - The time (in miliseconds) between 'major' > compactions of all HStoreFiles in a region. Default: 1 day. Set to 0 to > disable automated major compactions. > Default: 8640 > (http://hbase.apache.org/book.html#hbase_default_configurations) > According to the book at chapter 2.8.2.8. Managed Compactions > "A common administrative technique is to manage major compactions manually, > rather than letting HBase do it. By default, > HConstants.MAJOR_COMPACTION_PERIOD is one day and major compactions may kick > in when you least desire it - especially on a busy system. To "turn off" > automatic major compactions set the value to Long.MAX_VALUE." > According to the code org.apache.hadoop.hbase.regionserver.Store.java, "0" is > the right answer. > (affect all documentation from 0.90.1) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5177) HTable needs a non cached version of getRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritam Damania reassigned HBASE-5177: - Assignee: Pritam Damania > HTable needs a non cached version of getRegionLocation > -- > > Key: HBASE-5177 > URL: https://issues.apache.org/jira/browse/HBASE-5177 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.90.4 >Reporter: Pritam Damania >Assignee: Pritam Damania >Priority: Minor > > There is a need for a non caching version of getRegionLocation > on the client side. This API is needed to quickly lookup the regionserver > that hosts a particular region without using the heavy weight > getRegionsInfo() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5182) TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly
[ https://issues.apache.org/jira/browse/HBASE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5182: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks for the patch Scott. > TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly > -- > > Key: HBASE-5182 > URL: https://issues.apache.org/jira/browse/HBASE-5182 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Scott Chen >Assignee: Scott Chen >Priority: Minor > Fix For: 0.94.0 > > Attachments: hbase-5182.txt > > > TBoundedThreadPoolServer does not take the configured threadKeepAliveTimeSec. > It uses the default value instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5167) We shouldn't be injecting 'Killing [daemon]' into logs, when we aren't doing that.
[ https://issues.apache.org/jira/browse/HBASE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184482#comment-13184482 ] Hudson commented on HBASE-5167: --- Integrated in HBase-TRUNK #2621 (See [https://builds.apache.org/job/HBase-TRUNK/2621/]) HBASE-5167 We shouldn't be injecting 'Killing [daemon]' into logs, when we aren't doing that. stack : Files : * /hbase/trunk/bin/hbase-daemon.sh > We shouldn't be injecting 'Killing [daemon]' into logs, when we aren't doing > that. > -- > > Key: HBASE-5167 > URL: https://issues.apache.org/jira/browse/HBASE-5167 > Project: HBase > Issue Type: Improvement > Components: scripts >Affects Versions: 0.92.0 >Reporter: Harsh J >Assignee: Harsh J >Priority: Trivial > Fix For: 0.94.0 > > Attachments: HBASE-5167.patch > > > HBASE-4209 changed the behavior of the scripts such that we do not kill the > daemons away anymore. We should have also changed the message shown in the > logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5182) TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly
[ https://issues.apache.org/jira/browse/HBASE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5182: -- Fix Version/s: 0.94.0 Hadoop Flags: Reviewed > TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly > -- > > Key: HBASE-5182 > URL: https://issues.apache.org/jira/browse/HBASE-5182 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Scott Chen >Assignee: Scott Chen >Priority: Minor > Fix For: 0.94.0 > > Attachments: hbase-5182.txt > > > TBoundedThreadPoolServer does not take the configured threadKeepAliveTimeSec. > It uses the default value instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5181) Improve error message when Master fail-over happens and ZK unassigned node contains stale znode(s)
[ https://issues.apache.org/jira/browse/HBASE-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184478#comment-13184478 ] Mubarak Seyed commented on HBASE-5181: -- Working on corporate approval to contribute this patch. Thanks. > Improve error message when Master fail-over happens and ZK unassigned node > contains stale znode(s) > -- > > Key: HBASE-5181 > URL: https://issues.apache.org/jira/browse/HBASE-5181 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.92.0, 0.90.5 >Reporter: Mubarak Seyed >Priority: Minor > Labels: noob > > When master fail-over happens, if we have number of RITs under > /hbase/unassigned and if we have stale znode(s) (encoded region names) under > /hbase/unassigned, we are getting > {code} > 2011-12-30 10:27:35,623 INFO org.apache.hadoop.hbase.master.HMaster: Master > startup proceeding: master failover > 2011-12-30 10:27:36,002 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to > process 1717 regions in transition > 2011-12-30 10:27:36,004 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > java.lang.ArrayIndexOutOfBoundsException: -256 > at > org.apache.hadoop.hbase.executor.RegionTransitionData.readFields(RegionTransitionData.java:148) > > at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:105) > at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) > at > org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198) > > at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:743) > at > org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransition(AssignmentManager.java:262) > > at > org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:223) > > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:401) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283) > {code} > and there is no clue on how to clean-up the stale znode(s) from unassigned > using zkCli.sh (del /hbase/unassigned/). It would be good if > we include the bad region name in IOException from > RegionTransitionData.readFields(). > {code} > @Override > public void readFields(DataInput in) throws IOException { > // the event type byte > eventType = EventType.values()[in.readShort()]; > // the timestamp > stamp = in.readLong(); > // the encoded name of the region being transitioned > regionName = Bytes.readByteArray(in); > // remaining fields are optional so prefixed with boolean > // the name of the regionserver sending the data > if (in.readBoolean()) { > byte [] versionedBytes = Bytes.readByteArray(in); > this.origin = ServerName.parseVersionedServerName(versionedBytes); > } > if (in.readBoolean()) { > this.payload = Bytes.readByteArray(in); > } > } > {code} > If the code execution has survived until regionName then we can include the > regionName in IOException with error message to clean-up the stale znode(s) > under /hbase/unassigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5182) TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly
[ https://issues.apache.org/jira/browse/HBASE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5182: -- Status: Patch Available (was: Open) > TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly > -- > > Key: HBASE-5182 > URL: https://issues.apache.org/jira/browse/HBASE-5182 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Scott Chen >Assignee: Scott Chen >Priority: Minor > Fix For: 0.94.0 > > Attachments: hbase-5182.txt > > > TBoundedThreadPoolServer does not take the configured threadKeepAliveTimeSec. > It uses the default value instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira