[jira] [Commented] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan
[ https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177912#comment-13177912 ] Hudson commented on HBASE-5112: --- Integrated in HBase-0.92-security #55 (See [https://builds.apache.org/job/HBase-0.92-security/55/]) HBASE-5112 TestReplication#queueFailover flaky due to potentially uninitialized Scan (Jimmy Xiang) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java TestReplication#queueFailover flaky due to potentially uninitialized Scan - Key: HBASE-5112 URL: https://issues.apache.org/jira/browse/HBASE-5112 Project: HBase Issue Type: Test Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5112-v2.txt, hbase-5112.patch In TestReplication#queueFailover, the second scan is not reset for each new scan. Followed scan may not be able to scan the whole table. So it cannot get all the data and the test fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again
[ https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177911#comment-13177911 ] Hudson commented on HBASE-5100: --- Integrated in HBase-0.92-security #55 (See [https://builds.apache.org/job/HBase-0.92-security/55/]) HBASE-5100 Rollback of split could cause closed region to be opened again (Chunhui) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java Rollback of split could cause closed region to be opened again -- Key: HBASE-5100 URL: https://issues.apache.org/jira/browse/HBASE-5100 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.92.0, 0.94.0 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch If master sending close region to rs and region's split transaction concurrently happen, it may cause closed region to opened. See the detailed code in SplitTransaction#createDaughters {code} ListStoreFile hstoreFilesToSplit = null; try{ hstoreFilesToSplit = this.parent.close(false); if (hstoreFilesToSplit == null) { // The region was closed by a concurrent thread. We can't continue // with the split, instead we must just abandon the split. If we // reopen or split this could cause problems because the region has // probably already been moved to a different server, or is in the // process of moving to a different server. throw new IOException(Failed to close region: already closed by + another thread); } } finally { this.journal.add(JournalEntry.CLOSED_PARENT_REGION); } {code} when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes this.parent.initialize(); Although this region is not onlined in the regionserver, it may bring some potential problem. For example, in our environment, the closed parent region is rolled back sucessfully , and then starting compaction and split again. The parent region is f892dd6107b6b4130199582abc78e9c1 master log {code} 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780 2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. (offlining) 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING) 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. state=CLOSED, ts=1324830285347 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x13447f283f40e73 Creating (or updating) unassigned node for f892dd6107b6b4130199582abc78e9c1 with OFFLINE state 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177913#comment-13177913 ] Hudson commented on HBASE-5099: --- Integrated in HBase-0.92-security #55 (See [https://builds.apache.org/job/HBase-0.92-security/55/]) HBASE-5099 ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on (Jimmy) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, hbase-5099-v6.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again
[ https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177971#comment-13177971 ] Hudson commented on HBASE-5100: --- Integrated in HBase-TRUNK #2595 (See [https://builds.apache.org/job/HBase-TRUNK/2595/]) HBASE-5100 Rollback of split could cause closed region to be opened again (Chunhui) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java Rollback of split could cause closed region to be opened again -- Key: HBASE-5100 URL: https://issues.apache.org/jira/browse/HBASE-5100 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.92.0, 0.94.0 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch If master sending close region to rs and region's split transaction concurrently happen, it may cause closed region to opened. See the detailed code in SplitTransaction#createDaughters {code} ListStoreFile hstoreFilesToSplit = null; try{ hstoreFilesToSplit = this.parent.close(false); if (hstoreFilesToSplit == null) { // The region was closed by a concurrent thread. We can't continue // with the split, instead we must just abandon the split. If we // reopen or split this could cause problems because the region has // probably already been moved to a different server, or is in the // process of moving to a different server. throw new IOException(Failed to close region: already closed by + another thread); } } finally { this.journal.add(JournalEntry.CLOSED_PARENT_REGION); } {code} when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes this.parent.initialize(); Although this region is not onlined in the regionserver, it may bring some potential problem. For example, in our environment, the closed parent region is rolled back sucessfully , and then starting compaction and split again. The parent region is f892dd6107b6b4130199582abc78e9c1 master log {code} 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780 2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. (offlining) 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING) 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. state=CLOSED, ts=1324830285347 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x13447f283f40e73 Creating (or updating) unassigned node for f892dd6107b6b4130199582abc78e9c1 with OFFLINE state 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE,
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177991#comment-13177991 ] Hudson commented on HBASE-5099: --- Integrated in HBase-0.92 #221 (See [https://builds.apache.org/job/HBase-0.92/221/]) HBASE-5099 ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on (Jimmy) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, hbase-5099-v6.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan
[ https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177990#comment-13177990 ] Hudson commented on HBASE-5112: --- Integrated in HBase-0.92 #221 (See [https://builds.apache.org/job/HBase-0.92/221/]) HBASE-5112 TestReplication#queueFailover flaky due to potentially uninitialized Scan (Jimmy Xiang) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java TestReplication#queueFailover flaky due to potentially uninitialized Scan - Key: HBASE-5112 URL: https://issues.apache.org/jira/browse/HBASE-5112 Project: HBase Issue Type: Test Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5112-v2.txt, hbase-5112.patch In TestReplication#queueFailover, the second scan is not reset for each new scan. Followed scan may not be able to scan the whole table. So it cannot get all the data and the test fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5113) TestDrainingServer expects round robin region assignment but misses a config parameter
TestDrainingServer expects round robin region assignment but misses a config parameter -- Key: HBASE-5113 URL: https://issues.apache.org/jira/browse/HBASE-5113 Project: HBase Issue Type: Test Reporter: Zhihong Yu Fix For: 0.92.0, 0.94.0 From https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2595/testReport/org.apache.hadoop.hbase/TestDrainingServer/org_apache_hadoop_hbase_TestDrainingServer/, we can see that some region server didn't have any regions assigned. {code} // Assert that every regionserver has some regions on it. {code} It turns out that BulkEnabler has an internal knob whose value is false: {code} boolean roundRobinAssignment = this.server.getConfiguration().getBoolean( hbase.master.enabletable.roundrobin, false); {code} TestDrainingServer should have set this config parameter before calling admin.enableTable(TABLENAME) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan
[ https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178004#comment-13178004 ] Hudson commented on HBASE-5112: --- Integrated in HBase-TRUNK #2596 (See [https://builds.apache.org/job/HBase-TRUNK/2596/]) HBASE-5112 TestReplication#queueFailover flaky due to potentially uninitialized Scan (Jimmy Xiang) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java TestReplication#queueFailover flaky due to potentially uninitialized Scan - Key: HBASE-5112 URL: https://issues.apache.org/jira/browse/HBASE-5112 Project: HBase Issue Type: Test Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5112-v2.txt, hbase-5112.patch In TestReplication#queueFailover, the second scan is not reset for each new scan. Followed scan may not be able to scan the whole table. So it cannot get all the data and the test fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) utilize surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5064: -- Hadoop Flags: Reviewed Summary: utilize surefire tests parallelization (was: use surefire tests parallelization) utilize surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5113) TestDrainingServer expects round robin region assignment but misses a config parameter
[ https://issues.apache.org/jira/browse/HBASE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5113: -- Attachment: 5113.txt Looped TestDrainingServer 5 times using the patch and they all passed. TestDrainingServer expects round robin region assignment but misses a config parameter -- Key: HBASE-5113 URL: https://issues.apache.org/jira/browse/HBASE-5113 Project: HBase Issue Type: Test Reporter: Zhihong Yu Fix For: 0.92.0, 0.94.0 Attachments: 5113.txt From https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2595/testReport/org.apache.hadoop.hbase/TestDrainingServer/org_apache_hadoop_hbase_TestDrainingServer/, we can see that some region server didn't have any regions assigned. {code} // Assert that every regionserver has some regions on it. {code} It turns out that BulkEnabler has an internal knob whose value is false: {code} boolean roundRobinAssignment = this.server.getConfiguration().getBoolean( hbase.master.enabletable.roundrobin, false); {code} TestDrainingServer should have set this config parameter before calling admin.enableTable(TABLENAME) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5103) Fix improper master znode deserialization
[ https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178008#comment-13178008 ] Hudson commented on HBASE-5103: --- Integrated in HBase-TRUNK-security #56 (See [https://builds.apache.org/job/HBase-TRUNK-security/56/]) HBASE-5103 Fix improper master znode deserialization (Jonathan Hsieh) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java Fix improper master znode deserialization - Key: HBASE-5103 URL: https://issues.apache.org/jira/browse/HBASE-5103 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Minor Fix For: 0.92.0, 0.94.0 Attachments: hbase-5103.patch In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178010#comment-13178010 ] Hudson commented on HBASE-5099: --- Integrated in HBase-TRUNK-security #56 (See [https://builds.apache.org/job/HBase-TRUNK-security/56/]) HBASE-5099 ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on (Jimmy) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, hbase-5099-v6.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again
[ https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178007#comment-13178007 ] Hudson commented on HBASE-5100: --- Integrated in HBase-TRUNK-security #56 (See [https://builds.apache.org/job/HBase-TRUNK-security/56/]) HBASE-5100 Rollback of split could cause closed region to be opened again (Chunhui) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java Rollback of split could cause closed region to be opened again -- Key: HBASE-5100 URL: https://issues.apache.org/jira/browse/HBASE-5100 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.92.0, 0.94.0 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch If master sending close region to rs and region's split transaction concurrently happen, it may cause closed region to opened. See the detailed code in SplitTransaction#createDaughters {code} ListStoreFile hstoreFilesToSplit = null; try{ hstoreFilesToSplit = this.parent.close(false); if (hstoreFilesToSplit == null) { // The region was closed by a concurrent thread. We can't continue // with the split, instead we must just abandon the split. If we // reopen or split this could cause problems because the region has // probably already been moved to a different server, or is in the // process of moving to a different server. throw new IOException(Failed to close region: already closed by + another thread); } } finally { this.journal.add(JournalEntry.CLOSED_PARENT_REGION); } {code} when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes this.parent.initialize(); Although this region is not onlined in the regionserver, it may bring some potential problem. For example, in our environment, the closed parent region is rolled back sucessfully , and then starting compaction and split again. The parent region is f892dd6107b6b4130199582abc78e9c1 master log {code} 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780 2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. (offlining) 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING) 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. state=CLOSED, ts=1324830285347 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x13447f283f40e73 Creating (or updating) unassigned node for f892dd6107b6b4130199582abc78e9c1 with OFFLINE state 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE,
[jira] [Commented] (HBASE-5064) utilize surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178014#comment-13178014 ] Zhihong Yu commented on HBASE-5064: --- Integrated to TRUNK. Thanks for the hard work, N. utilize surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5113) TestDrainingServer expects round robin region assignment but misses a config parameter
[ https://issues.apache.org/jira/browse/HBASE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5113: -- Status: Patch Available (was: Open) TestDrainingServer expects round robin region assignment but misses a config parameter -- Key: HBASE-5113 URL: https://issues.apache.org/jira/browse/HBASE-5113 Project: HBase Issue Type: Test Reporter: Zhihong Yu Fix For: 0.92.0, 0.94.0 Attachments: 5113.txt From https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2595/testReport/org.apache.hadoop.hbase/TestDrainingServer/org_apache_hadoop_hbase_TestDrainingServer/, we can see that some region server didn't have any regions assigned. {code} // Assert that every regionserver has some regions on it. {code} It turns out that BulkEnabler has an internal knob whose value is false: {code} boolean roundRobinAssignment = this.server.getConfiguration().getBoolean( hbase.master.enabletable.roundrobin, false); {code} TestDrainingServer should have set this config parameter before calling admin.enableTable(TABLENAME) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) utilize surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178023#comment-13178023 ] Hudson commented on HBASE-5064: --- Integrated in HBase-TRUNK #2597 (See [https://builds.apache.org/job/HBase-TRUNK/2597/]) HBASE-5064 utilize surefire tests parallelization (N Keywal) tedyu : Files : * /hbase/trunk/dev-support/test-patch.sh * /hbase/trunk/pom.xml * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java * /hbase/trunk/src/test/resources/hbase-site.xml utilize surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) utilize surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178027#comment-13178027 ] Zhihong Yu commented on HBASE-5064: --- Reverted from TRUNK due to missing TestMetaReaderEditor in build 2597 I wasn't able to find out the test that caused surefire plugin to report timeout. utilize surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5105) TestImportTsv failed with hadoop 0.22
[ https://issues.apache.org/jira/browse/HBASE-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178030#comment-13178030 ] Zhihong Yu commented on HBASE-5105: --- The two test failure should be fixed by HBASE-5112 and HBASE-5113, respectively. Integrated to 0.92 and TRUNK. Thanks for the patch, Ming. TestImportTsv failed with hadoop 0.22 - Key: HBASE-5105 URL: https://issues.apache.org/jira/browse/HBASE-5105 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.0 Reporter: Ming Ma Assignee: Ming Ma Attachments: HBASE-5105-0.92.patch java.io.FileNotFoundException: File does not exist: /home/henkins/.m2/repository/org/apache/hadoop/hadoop-mapred/0.22-SNAPSHOT/hadoop-mapred-0.22-SNAPSHOT.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742) at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:331) at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:711) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:245) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:283) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:350) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:215) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:165) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4397) -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time
[ https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-4397: -- Hadoop Flags: Reviewed Summary: -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time (was: -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time) -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time - Key: HBASE-4397 URL: https://issues.apache.org/jira/browse/HBASE-4397 Project: HBase Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Fix For: 0.92.0, 0.94.0 Attachments: HBASE-4397-0.92.patch 1. Shutdown all RSs. 2. Bring all RS back online. The -ROOT-, .META. stay in offline state until timeout monitor force assignment 30 minutes later. That is because HMaster can't find a RS to assign the tables to in assign operation. 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, trying to assign elsewhere instead; retry=0 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148) at $Proxy9.openRegion(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123) at org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable location to assign region -ROOT-,,0.70236052 Possible fixes: 1. Have serverManager handle server online event similar to how RegionServerTracker.java calls servermanager.expireServer in the case server goes down. 2. Make timeoutMonitor handle the situation better. This is a special situation in the cluster. 30 minutes timeout can be skipped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4397) -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time
[ https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178031#comment-13178031 ] Zhihong Yu commented on HBASE-4397: --- Integrated to 0.92 and TRUNK. Thanks for the patch Ming. Thanks for the review, Lars. -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time - Key: HBASE-4397 URL: https://issues.apache.org/jira/browse/HBASE-4397 Project: HBase Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Fix For: 0.92.0, 0.94.0 Attachments: HBASE-4397-0.92.patch 1. Shutdown all RSs. 2. Bring all RS back online. The -ROOT-, .META. stay in offline state until timeout monitor force assignment 30 minutes later. That is because HMaster can't find a RS to assign the tables to in assign operation. 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, trying to assign elsewhere instead; retry=0 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148) at $Proxy9.openRegion(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123) at org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable location to assign region -ROOT-,,0.70236052 Possible fixes: 1. Have serverManager handle server online event similar to how RegionServerTracker.java calls servermanager.expireServer in the case server goes down. 2. Make timeoutMonitor handle the situation better. This is a special situation in the cluster. 30 minutes timeout can be skipped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178033#comment-13178033 ] Zhihong Yu commented on HBASE-4608: --- @Li: Please use '--no-prefix' to generate diff. Otherwise Hadoop QA won't be able to apply your patch. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Attachments: 4608v1.txt, 4608v5.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5113) TestDrainingServer expects round robin region assignment but misses a config parameter
[ https://issues.apache.org/jira/browse/HBASE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178041#comment-13178041 ] Hudson commented on HBASE-5113: --- Integrated in HBase-TRUNK #2598 (See [https://builds.apache.org/job/HBase-TRUNK/2598/]) HBASE-5113 TestDrainingServer expects round robin region assignment but misses a config parameter tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java TestDrainingServer expects round robin region assignment but misses a config parameter -- Key: HBASE-5113 URL: https://issues.apache.org/jira/browse/HBASE-5113 Project: HBase Issue Type: Test Reporter: Zhihong Yu Fix For: 0.92.0, 0.94.0 Attachments: 5113.txt From https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2595/testReport/org.apache.hadoop.hbase/TestDrainingServer/org_apache_hadoop_hbase_TestDrainingServer/, we can see that some region server didn't have any regions assigned. {code} // Assert that every regionserver has some regions on it. {code} It turns out that BulkEnabler has an internal knob whose value is false: {code} boolean roundRobinAssignment = this.server.getConfiguration().getBoolean( hbase.master.enabletable.roundrobin, false); {code} TestDrainingServer should have set this config parameter before calling admin.enableTable(TABLENAME) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) utilize surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178042#comment-13178042 ] nkeywal commented on HBASE-5064: The exact same test already timeouted in trunk in https://builds.apache.org/job/HBase-TRUNK/2588/testReport/ I don't think it's related to this patch. On Sat, Dec 31, 2011 at 4:28 PM, Zhihong Yu (Commented) (JIRA) utilize surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5100) Rollback of split could cause closed region to be opened again
[ https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5100: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Resolved as committed. Rollback of split could cause closed region to be opened again -- Key: HBASE-5100 URL: https://issues.apache.org/jira/browse/HBASE-5100 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.92.0, 0.94.0 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch If master sending close region to rs and region's split transaction concurrently happen, it may cause closed region to opened. See the detailed code in SplitTransaction#createDaughters {code} ListStoreFile hstoreFilesToSplit = null; try{ hstoreFilesToSplit = this.parent.close(false); if (hstoreFilesToSplit == null) { // The region was closed by a concurrent thread. We can't continue // with the split, instead we must just abandon the split. If we // reopen or split this could cause problems because the region has // probably already been moved to a different server, or is in the // process of moving to a different server. throw new IOException(Failed to close region: already closed by + another thread); } } finally { this.journal.add(JournalEntry.CLOSED_PARENT_REGION); } {code} when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes this.parent.initialize(); Although this region is not onlined in the regionserver, it may bring some potential problem. For example, in our environment, the closed parent region is rolled back sucessfully , and then starting compaction and split again. The parent region is f892dd6107b6b4130199582abc78e9c1 master log {code} 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780 2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. (offlining) 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING) 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. state=CLOSED, ts=1324830285347 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x13447f283f40e73 Creating (or updating) unassigned node for f892dd6107b6b4130199582abc78e9c1 with OFFLINE state 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for
[jira] [Commented] (HBASE-5064) utilize surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178044#comment-13178044 ] Zhihong Yu commented on HBASE-5064: --- TestLogRolling hung in build 2588. I don't see it hanging in build 2597. Anyway we need to find out why TestMetaReaderEditor wasn't executed. utilize surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5113) TestDrainingServer expects round robin region assignment but misses a config parameter
[ https://issues.apache.org/jira/browse/HBASE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178045#comment-13178045 ] Zhihong Yu commented on HBASE-5113: --- Integrated trivial patch to 0.92 and TRUNK. TestDrainingServer expects round robin region assignment but misses a config parameter -- Key: HBASE-5113 URL: https://issues.apache.org/jira/browse/HBASE-5113 Project: HBase Issue Type: Test Reporter: Zhihong Yu Fix For: 0.92.0, 0.94.0 Attachments: 5113.txt From https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2595/testReport/org.apache.hadoop.hbase/TestDrainingServer/org_apache_hadoop_hbase_TestDrainingServer/, we can see that some region server didn't have any regions assigned. {code} // Assert that every regionserver has some regions on it. {code} It turns out that BulkEnabler has an internal knob whose value is false: {code} boolean roundRobinAssignment = this.server.getConfiguration().getBoolean( hbase.master.enabletable.roundrobin, false); {code} TestDrainingServer should have set this config parameter before calling admin.enableTable(TABLENAME) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
[ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178051#comment-13178051 ] Zhihong Yu commented on HBASE-5094: --- +1 on patch. After discussing with Ramkrishna, the task of handling doubly assigned regions would be done in another JIRA. The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible. Key: HBASE-5094 URL: https://issues.apache.org/jira/browse/HBASE-5094 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: ramkrishna.s.vasudevan Priority: Critical Attachments: HBASE-5094_1.patch {code} RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey()); ServerName addressFromAM = this.services.getAssignmentManager() .getRegionServerOfRegion(e.getKey()); if (rit != null !rit.isClosing() !rit.isPendingClose()) { // Skip regions that were in transition unless CLOSING or // PENDING_CLOSE LOG.info(Skip assigning region + rit.toString()); } else if (addressFromAM != null !addressFromAM.equals(this.serverName)) { LOG.debug(Skip assigning region + e.getKey().getRegionNameAsString() + because it has been opened in + addressFromAM.getServerName()); } {code} In ServerShutDownHandler we try to get the address in the AM. This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side. But removal from RIT is completed on the master side. So this will trigger a new assignment. So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
[ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178063#comment-13178063 ] ramkrishna.s.vasudevan commented on HBASE-5094: --- Committed to trunk and 0.92. Thanks for the review Ted. The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible. Key: HBASE-5094 URL: https://issues.apache.org/jira/browse/HBASE-5094 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: ramkrishna.s.vasudevan Priority: Critical Attachments: HBASE-5094_1.patch {code} RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey()); ServerName addressFromAM = this.services.getAssignmentManager() .getRegionServerOfRegion(e.getKey()); if (rit != null !rit.isClosing() !rit.isPendingClose()) { // Skip regions that were in transition unless CLOSING or // PENDING_CLOSE LOG.info(Skip assigning region + rit.toString()); } else if (addressFromAM != null !addressFromAM.equals(this.serverName)) { LOG.debug(Skip assigning region + e.getKey().getRegionNameAsString() + because it has been opened in + addressFromAM.getServerName()); } {code} In ServerShutDownHandler we try to get the address in the AM. This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side. But removal from RIT is completed on the master side. So this will trigger a new assignment. So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) utilize surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178065#comment-13178065 ] Hudson commented on HBASE-5064: --- Integrated in HBase-TRUNK #2599 (See [https://builds.apache.org/job/HBase-TRUNK/2599/]) HBASE-5064 revert - need to figure out which test caused surefire to report timeout tedyu : Files : * /hbase/trunk/dev-support/test-patch.sh * /hbase/trunk/pom.xml * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java * /hbase/trunk/src/test/resources/hbase-site.xml utilize surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5105) TestImportTsv failed with hadoop 0.22
[ https://issues.apache.org/jira/browse/HBASE-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178067#comment-13178067 ] Hudson commented on HBASE-5105: --- Integrated in HBase-TRUNK #2599 (See [https://builds.apache.org/job/HBase-TRUNK/2599/]) HBASE-5105 TestImportTsv failed with hadoop 0.22 (Ming Ma) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java TestImportTsv failed with hadoop 0.22 - Key: HBASE-5105 URL: https://issues.apache.org/jira/browse/HBASE-5105 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.0 Reporter: Ming Ma Assignee: Ming Ma Attachments: HBASE-5105-0.92.patch java.io.FileNotFoundException: File does not exist: /home/henkins/.m2/repository/org/apache/hadoop/hadoop-mapred/0.22-SNAPSHOT/hadoop-mapred-0.22-SNAPSHOT.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742) at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:331) at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:711) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:245) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:283) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:350) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:215) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:165) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4397) -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time
[ https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178066#comment-13178066 ] Hudson commented on HBASE-4397: --- Integrated in HBase-TRUNK #2599 (See [https://builds.apache.org/job/HBase-TRUNK/2599/]) HBASE-4397 -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time (Ming Ma) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time - Key: HBASE-4397 URL: https://issues.apache.org/jira/browse/HBASE-4397 Project: HBase Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Fix For: 0.92.0, 0.94.0 Attachments: HBASE-4397-0.92.patch 1. Shutdown all RSs. 2. Bring all RS back online. The -ROOT-, .META. stay in offline state until timeout monitor force assignment 30 minutes later. That is because HMaster can't find a RS to assign the tables to in assign operation. 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, trying to assign elsewhere instead; retry=0 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148) at $Proxy9.openRegion(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123) at org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable location to assign region -ROOT-,,0.70236052 Possible fixes: 1. Have serverManager handle server online event similar to how RegionServerTracker.java calls servermanager.expireServer in the case server goes down. 2. Make timeoutMonitor handle the situation better. This is a special situation in the cluster. 30 minutes timeout can be skipped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Pi updated HBASE-4608: - Attachment: 4608v6.txt no prefix patch HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178075#comment-13178075 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ --- (Updated 2011-12-31 20:19:11.951711) Review request for hbase, Eli Collins and Todd Lipcon. Summary (updated) --- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs - src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing --- Thanks, Li HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan
[ https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5112: -- Attachment: org.apache.hadoop.hbase.replication.TestReplication-output.txt TestReplication still fails on Linux during a test suite run. Attached is test output. TestReplication#queueFailover flaky due to potentially uninitialized Scan - Key: HBASE-5112 URL: https://issues.apache.org/jira/browse/HBASE-5112 Project: HBase Issue Type: Test Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5112-v2.txt, hbase-5112.patch, org.apache.hadoop.hbase.replication.TestReplication-output.txt In TestReplication#queueFailover, the second scan is not reset for each new scan. Followed scan may not be able to scan the whole table. So it cannot get all the data and the test fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178079#comment-13178079 ] Lars Hofhansl commented on HBASE-4608: -- @Li: How big do you expect the in-memory dictionary to grow? I was wondering if the reading or writing process could give the compressor hints about when would be a good time to reset the dictionary (for example when memstore flush entry was found). The compressor can choose to ignore the hints and use some internal logic, or reset the dictionary when it got hinted. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan
[ https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178080#comment-13178080 ] Lars Hofhansl commented on HBASE-5112: -- There might be multiple issues. This patch is still good, I think. TestReplication#queueFailover flaky due to potentially uninitialized Scan - Key: HBASE-5112 URL: https://issues.apache.org/jira/browse/HBASE-5112 Project: HBase Issue Type: Test Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5112-v2.txt, hbase-5112.patch, org.apache.hadoop.hbase.replication.TestReplication-output.txt In TestReplication#queueFailover, the second scan is not reset for each new scan. Followed scan may not be able to scan the whole table. So it cannot get all the data and the test fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178084#comment-13178084 ] Li Pi commented on HBASE-4608: -- max size = 64k * around 100-200 bytes. Really not that big. Less than 100 megabytes. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178085#comment-13178085 ] Li Pi commented on HBASE-4608: -- I was thinking of replacing the 1-way associative with a a 127 sized LRU dictionary. should allow us to save a few bytes, and also be far more efficient with our eviction strategy. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178087#comment-13178087 ] Li Pi commented on HBASE-4608: -- Guava's mapmaker doesn't guarantee consistent eviction. You'd want to either use 2 LinkedHashMap's or your own LRU style system. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4397) -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time
[ https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178092#comment-13178092 ] Hudson commented on HBASE-4397: --- Integrated in HBase-TRUNK-security #57 (See [https://builds.apache.org/job/HBase-TRUNK-security/57/]) HBASE-4397 -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time (Ming Ma) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time - Key: HBASE-4397 URL: https://issues.apache.org/jira/browse/HBASE-4397 Project: HBase Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Fix For: 0.92.0, 0.94.0 Attachments: HBASE-4397-0.92.patch 1. Shutdown all RSs. 2. Bring all RS back online. The -ROOT-, .META. stay in offline state until timeout monitor force assignment 30 minutes later. That is because HMaster can't find a RS to assign the tables to in assign operation. 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, trying to assign elsewhere instead; retry=0 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148) at $Proxy9.openRegion(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123) at org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable location to assign region -ROOT-,,0.70236052 Possible fixes: 1. Have serverManager handle server online event similar to how RegionServerTracker.java calls servermanager.expireServer in the case server goes down. 2. Make timeoutMonitor handle the situation better. This is a special situation in the cluster. 30 minutes timeout can be skipped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5105) TestImportTsv failed with hadoop 0.22
[ https://issues.apache.org/jira/browse/HBASE-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178093#comment-13178093 ] Hudson commented on HBASE-5105: --- Integrated in HBase-TRUNK-security #57 (See [https://builds.apache.org/job/HBase-TRUNK-security/57/]) HBASE-5105 TestImportTsv failed with hadoop 0.22 (Ming Ma) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java TestImportTsv failed with hadoop 0.22 - Key: HBASE-5105 URL: https://issues.apache.org/jira/browse/HBASE-5105 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.0 Reporter: Ming Ma Assignee: Ming Ma Attachments: HBASE-5105-0.92.patch java.io.FileNotFoundException: File does not exist: /home/henkins/.m2/repository/org/apache/hadoop/hadoop-mapred/0.22-SNAPSHOT/hadoop-mapred-0.22-SNAPSHOT.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742) at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:331) at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:711) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:245) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:283) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:350) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:215) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:165) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5113) TestDrainingServer expects round robin region assignment but misses a config parameter
[ https://issues.apache.org/jira/browse/HBASE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178090#comment-13178090 ] Hudson commented on HBASE-5113: --- Integrated in HBase-TRUNK-security #57 (See [https://builds.apache.org/job/HBase-TRUNK-security/57/]) HBASE-5113 TestDrainingServer expects round robin region assignment but misses a config parameter tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java TestDrainingServer expects round robin region assignment but misses a config parameter -- Key: HBASE-5113 URL: https://issues.apache.org/jira/browse/HBASE-5113 Project: HBase Issue Type: Test Reporter: Zhihong Yu Fix For: 0.92.0, 0.94.0 Attachments: 5113.txt From https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2595/testReport/org.apache.hadoop.hbase/TestDrainingServer/org_apache_hadoop_hbase_TestDrainingServer/, we can see that some region server didn't have any regions assigned. {code} // Assert that every regionserver has some regions on it. {code} It turns out that BulkEnabler has an internal knob whose value is false: {code} boolean roundRobinAssignment = this.server.getConfiguration().getBoolean( hbase.master.enabletable.roundrobin, false); {code} TestDrainingServer should have set this config parameter before calling admin.enableTable(TABLENAME) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
[ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178091#comment-13178091 ] Hudson commented on HBASE-5094: --- Integrated in HBase-TRUNK-security #57 (See [https://builds.apache.org/job/HBase-TRUNK-security/57/]) HBASE-5094 The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible. (Ram) ramkrishna : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible. Key: HBASE-5094 URL: https://issues.apache.org/jira/browse/HBASE-5094 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: ramkrishna.s.vasudevan Priority: Critical Attachments: HBASE-5094_1.patch {code} RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey()); ServerName addressFromAM = this.services.getAssignmentManager() .getRegionServerOfRegion(e.getKey()); if (rit != null !rit.isClosing() !rit.isPendingClose()) { // Skip regions that were in transition unless CLOSING or // PENDING_CLOSE LOG.info(Skip assigning region + rit.toString()); } else if (addressFromAM != null !addressFromAM.equals(this.serverName)) { LOG.debug(Skip assigning region + e.getKey().getRegionNameAsString() + because it has been opened in + addressFromAM.getServerName()); } {code} In ServerShutDownHandler we try to get the address in the AM. This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side. But removal from RIT is completed on the master side. So this will trigger a new assignment. So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) utilize surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178089#comment-13178089 ] Hudson commented on HBASE-5064: --- Integrated in HBase-TRUNK-security #57 (See [https://builds.apache.org/job/HBase-TRUNK-security/57/]) HBASE-5064 revert - need to figure out which test caused surefire to report timeout HBASE-5064 utilize surefire tests parallelization (N Keywal) tedyu : Files : * /hbase/trunk/dev-support/test-patch.sh * /hbase/trunk/pom.xml * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java * /hbase/trunk/src/test/resources/hbase-site.xml tedyu : Files : * /hbase/trunk/dev-support/test-patch.sh * /hbase/trunk/pom.xml * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java * /hbase/trunk/src/test/resources/hbase-site.xml utilize surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
[ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178101#comment-13178101 ] Hudson commented on HBASE-5094: --- Integrated in HBase-TRUNK #2600 (See [https://builds.apache.org/job/HBase-TRUNK/2600/]) HBASE-5094 The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible. (Ram) ramkrishna : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible. Key: HBASE-5094 URL: https://issues.apache.org/jira/browse/HBASE-5094 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: ramkrishna.s.vasudevan Priority: Critical Attachments: HBASE-5094_1.patch {code} RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey()); ServerName addressFromAM = this.services.getAssignmentManager() .getRegionServerOfRegion(e.getKey()); if (rit != null !rit.isClosing() !rit.isPendingClose()) { // Skip regions that were in transition unless CLOSING or // PENDING_CLOSE LOG.info(Skip assigning region + rit.toString()); } else if (addressFromAM != null !addressFromAM.equals(this.serverName)) { LOG.debug(Skip assigning region + e.getKey().getRegionNameAsString() + because it has been opened in + addressFromAM.getServerName()); } {code} In ServerShutDownHandler we try to get the address in the AM. This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side. But removal from RIT is completed on the master side. So this will trigger a new assignment. So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan
[ https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178124#comment-13178124 ] Hadoop QA commented on HBASE-5112: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508999/org.apache.hadoop.hbase.replication.TestReplication-output.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 367 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/644//console This message is automatically generated. TestReplication#queueFailover flaky due to potentially uninitialized Scan - Key: HBASE-5112 URL: https://issues.apache.org/jira/browse/HBASE-5112 Project: HBase Issue Type: Test Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5112-v2.txt, hbase-5112.patch, org.apache.hadoop.hbase.replication.TestReplication-output.txt In TestReplication#queueFailover, the second scan is not reset for each new scan. Followed scan may not be able to scan the whole table. So it cannot get all the data and the test fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5113) TestDrainingServer expects round robin region assignment but misses a config parameter
[ https://issues.apache.org/jira/browse/HBASE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178123#comment-13178123 ] Hadoop QA commented on HBASE-5113: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508987/5113.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/643//console This message is automatically generated. TestDrainingServer expects round robin region assignment but misses a config parameter -- Key: HBASE-5113 URL: https://issues.apache.org/jira/browse/HBASE-5113 Project: HBase Issue Type: Test Reporter: Zhihong Yu Fix For: 0.92.0, 0.94.0 Attachments: 5113.txt From https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2595/testReport/org.apache.hadoop.hbase/TestDrainingServer/org_apache_hadoop_hbase_TestDrainingServer/, we can see that some region server didn't have any regions assigned. {code} // Assert that every regionserver has some regions on it. {code} It turns out that BulkEnabler has an internal knob whose value is false: {code} boolean roundRobinAssignment = this.server.getConfiguration().getBoolean( hbase.master.enabletable.roundrobin, false); {code} TestDrainingServer should have set this config parameter before calling admin.enableTable(TABLENAME) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira