date:20111231


[ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177912#comment-13177912
 ] 

Hudson commented on HBASE-5112:
---

Integrated in HBase-0.92-security #55 (See 
[https://builds.apache.org/job/HBase-0.92-security/55/])
HBASE-5112  TestReplication#queueFailover flaky due to potentially
   uninitialized Scan (Jimmy Xiang)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java


 TestReplication#queueFailover flaky due to potentially uninitialized Scan
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Test
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5112-v2.txt, hbase-5112.patch


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again


[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177911#comment-13177911
 ] 

Hudson commented on HBASE-5100:
---

Integrated in HBase-0.92-security #55 (See 
[https://builds.apache.org/job/HBase-0.92-security/55/])
HBASE-5100  Rollback of split could cause closed region to be opened again 
(Chunhui)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java


 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling

[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

[
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177913#comment-13177913
]

Hudson commented on HBASE-5099:
---

Integrated in HBase-0.92-security #55 (See
[https://builds.apache.org/job/HBase-0.92-security/55/])
HBASE-5099 ZK event thread waiting for root region assignment may block
server
shutdown handler for the region sever the root region was on
(Jimmy)

tedyu :
Files :
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
*
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java

ZK event thread waiting for root region assignment may block server shutdown
handler for the region sever the root region was on

Key: HBASE-5099
URL: https://issues.apache.org/jira/browse/HBASE-5099
Project: HBase
Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Fix For: 0.92.0, 0.94.0

Attachments: 5099.92, ZK-event-thread-waiting-for-root.png,
distributed-log-splitting-hangs.png, hbase-5099-v2.patch,
hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch,
hbase-5099-v6.patch, hbase-5099.patch

A RS died. The ServerShutdownHandler kicked in and started the logspliting.
SpliLogManager
installed the tasks asynchronously, then started to wait for them to complete.
The task znodes were not created actually. The requests were just queued.
At this time, the zookeeper connection expired. HMaster tried to recover the
expired ZK session.
During the recovery, a new zookeeper connection was created. However, this
master became the
new master again. It tried to assign root and meta.
Because the dead RS got the old root region, the master needs to wait for the
log splitting to complete.
This waiting holds the zookeeper event thread. So the async create split
task is never retried since
there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again


[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177971#comment-13177971
 ] 

Hudson commented on HBASE-5100:
---

Integrated in HBase-TRUNK #2595 (See 
[https://builds.apache.org/job/HBase-TRUNK/2595/])
HBASE-5100  Rollback of split could cause closed region to be opened again 
(Chunhui)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java


 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE,

[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

[
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177991#comment-13177991
]

Hudson commented on HBASE-5099:
---

Integrated in HBase-0.92 #221 (See
[https://builds.apache.org/job/HBase-0.92/221/])
HBASE-5099 ZK event thread waiting for root region assignment may block
server
shutdown handler for the region sever the root region was on
(Jimmy)

ZK event thread waiting for root region assignment may block server shutdown
handler for the region sever the root region was on

[jira] [Commented] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan


[ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177990#comment-13177990
 ] 

Hudson commented on HBASE-5112:
---

Integrated in HBase-0.92 #221 (See 
[https://builds.apache.org/job/HBase-0.92/221/])
HBASE-5112  TestReplication#queueFailover flaky due to potentially
   uninitialized Scan (Jimmy Xiang)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java


 TestReplication#queueFailover flaky due to potentially uninitialized Scan
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Test
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5112-v2.txt, hbase-5112.patch


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5113) TestDrainingServer expects round robin region assignment but misses a config parameter

2011-12-31 Thread Zhihong Yu (Created) (JIRA)

TestDrainingServer expects round robin region assignment but misses a config 
parameter
--

 Key: HBASE-5113
 URL: https://issues.apache.org/jira/browse/HBASE-5113
 Project: HBase
  Issue Type: Test
Reporter: Zhihong Yu
 Fix For: 0.92.0, 0.94.0


From 
https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2595/testReport/org.apache.hadoop.hbase/TestDrainingServer/org_apache_hadoop_hbase_TestDrainingServer/,
 we can see that some region server didn't have any regions assigned.
{code}
// Assert that every regionserver has some regions on it.
{code}
It turns out that BulkEnabler has an internal knob whose value is false:
{code}
  boolean roundRobinAssignment = this.server.getConfiguration().getBoolean(
  hbase.master.enabletable.roundrobin, false);
{code}
TestDrainingServer should have set this config parameter before calling 
admin.enableTable(TABLENAME)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan


[ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178004#comment-13178004
 ] 

Hudson commented on HBASE-5112:
---

Integrated in HBase-TRUNK #2596 (See 
[https://builds.apache.org/job/HBase-TRUNK/2596/])
HBASE-5112  TestReplication#queueFailover flaky due to potentially
   uninitialized Scan (Jimmy Xiang)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java


 TestReplication#queueFailover flaky due to potentially uninitialized Scan
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Test
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5112-v2.txt, hbase-5112.patch


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5064) utilize surefire tests parallelization


 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5064:
--

Hadoop Flags: Reviewed
 Summary: utilize surefire tests parallelization  (was: use surefire 
tests parallelization)

 utilize surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5113) TestDrainingServer expects round robin region assignment but misses a config parameter


 [ 
https://issues.apache.org/jira/browse/HBASE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5113:
--

Attachment: 5113.txt

Looped TestDrainingServer 5 times using the patch and they all passed.

 TestDrainingServer expects round robin region assignment but misses a config 
 parameter
 --

 Key: HBASE-5113
 URL: https://issues.apache.org/jira/browse/HBASE-5113
 Project: HBase
  Issue Type: Test
Reporter: Zhihong Yu
 Fix For: 0.92.0, 0.94.0

 Attachments: 5113.txt


 From 
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2595/testReport/org.apache.hadoop.hbase/TestDrainingServer/org_apache_hadoop_hbase_TestDrainingServer/,
  we can see that some region server didn't have any regions assigned.
 {code}
 // Assert that every regionserver has some regions on it.
 {code}
 It turns out that BulkEnabler has an internal knob whose value is false:
 {code}
   boolean roundRobinAssignment = 
 this.server.getConfiguration().getBoolean(
   hbase.master.enabletable.roundrobin, false);
 {code}
 TestDrainingServer should have set this config parameter before calling 
 admin.enableTable(TABLENAME)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5103) Fix improper master znode deserialization


[ 
https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178008#comment-13178008
 ] 

Hudson commented on HBASE-5103:
---

Integrated in HBase-TRUNK-security #56 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/56/])
HBASE-5103  Fix improper master znode deserialization (Jonathan Hsieh)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java


 Fix improper master znode deserialization
 -

 Key: HBASE-5103
 URL: https://issues.apache.org/jira/browse/HBASE-5103
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: hbase-5103.patch


 In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
 created as a versioned serialized version of ServerName
 {code}
  if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
   this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
 {code}
 There are a few user visible places where it is used but not deserialized 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

[
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178010#comment-13178010
]

Hudson commented on HBASE-5099:
---

Integrated in HBase-TRUNK-security #56 (See
[https://builds.apache.org/job/HBase-TRUNK-security/56/])
HBASE-5099 ZK event thread waiting for root region assignment may block
server
shutdown handler for the region sever the root region was on
(Jimmy)

tedyu :
Files :
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
*
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java

ZK event thread waiting for root region assignment may block server shutdown
handler for the region sever the root region was on

[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again


[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178007#comment-13178007
 ] 

Hudson commented on HBASE-5100:
---

Integrated in HBase-TRUNK-security #56 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/56/])
HBASE-5100  Rollback of split could cause closed region to be opened again 
(Chunhui)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java


 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE,

[jira] [Commented] (HBASE-5064) utilize surefire tests parallelization


[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178014#comment-13178014
 ] 

Zhihong Yu commented on HBASE-5064:
---

Integrated to TRUNK.

Thanks for the hard work, N.

 utilize surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5113) TestDrainingServer expects round robin region assignment but misses a config parameter


 [ 
https://issues.apache.org/jira/browse/HBASE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5113:
--

Status: Patch Available  (was: Open)

 TestDrainingServer expects round robin region assignment but misses a config 
 parameter
 --

 Key: HBASE-5113
 URL: https://issues.apache.org/jira/browse/HBASE-5113
 Project: HBase
  Issue Type: Test
Reporter: Zhihong Yu
 Fix For: 0.92.0, 0.94.0

 Attachments: 5113.txt


 From 
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2595/testReport/org.apache.hadoop.hbase/TestDrainingServer/org_apache_hadoop_hbase_TestDrainingServer/,
  we can see that some region server didn't have any regions assigned.
 {code}
 // Assert that every regionserver has some regions on it.
 {code}
 It turns out that BulkEnabler has an internal knob whose value is false:
 {code}
   boolean roundRobinAssignment = 
 this.server.getConfiguration().getBoolean(
   hbase.master.enabletable.roundrobin, false);
 {code}
 TestDrainingServer should have set this config parameter before calling 
 admin.enableTable(TABLENAME)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5064) utilize surefire tests parallelization


[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178023#comment-13178023
 ] 

Hudson commented on HBASE-5064:
---

Integrated in HBase-TRUNK #2597 (See 
[https://builds.apache.org/job/HBase-TRUNK/2597/])
HBASE-5064 utilize surefire tests parallelization (N Keywal)

tedyu : 
Files : 
* /hbase/trunk/dev-support/test-patch.sh
* /hbase/trunk/pom.xml
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java
* /hbase/trunk/src/test/resources/hbase-site.xml


 utilize surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5064) utilize surefire tests parallelization


[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178027#comment-13178027
 ] 

Zhihong Yu commented on HBASE-5064:
---

Reverted from TRUNK due to missing TestMetaReaderEditor in build 2597

I wasn't able to find out the test that caused surefire plugin to report 
timeout.

 utilize surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5105) TestImportTsv failed with hadoop 0.22


[ 
https://issues.apache.org/jira/browse/HBASE-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178030#comment-13178030
 ] 

Zhihong Yu commented on HBASE-5105:
---

The two test failure should be fixed by HBASE-5112 and HBASE-5113, respectively.

Integrated to 0.92 and TRUNK.

Thanks for the patch, Ming.

 TestImportTsv failed with hadoop 0.22
 -

 Key: HBASE-5105
 URL: https://issues.apache.org/jira/browse/HBASE-5105
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.0
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HBASE-5105-0.92.patch


 java.io.FileNotFoundException: File does not exist: 
 /home/henkins/.m2/repository/org/apache/hadoop/hadoop-mapred/0.22-SNAPSHOT/hadoop-mapred-0.22-SNAPSHOT.jar
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742)
   at 
 org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:331)
   at 
 org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:711)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:245)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:283)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:350)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
   at 
 org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:215)
   at 
 org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:165)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4397) -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time


 [ 
https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4397:
--

Hadoop Flags: Reviewed
 Summary: -ROOT-, .META. tables stay offline for too long in recovery 
phase after all RSs are shutdown at the same time  (was: -ROOT-, .META. 
tables stay offline for too long in recovery phase after all RSs are shutdown 
at the same time)

 -ROOT-, .META. tables stay offline for too long in recovery phase after all 
 RSs are shutdown at the same time
 -

 Key: HBASE-4397
 URL: https://issues.apache.org/jira/browse/HBASE-4397
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4397-0.92.patch


 1. Shutdown all RSs.
 2. Bring all RS back online.
 The -ROOT-, .META. stay in offline state until timeout monitor force 
 assignment 30 minutes later. That is because HMaster can't find a RS to 
 assign the tables to in assign operation.
 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
 Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, 
 trying to assign elsewhere instead; retry=0
 java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148)
 at $Proxy9.openRegion(Unknown Source)
 at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2011-09-13 13:25:52,743 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable 
 location to assign region -ROOT-,,0.70236052
 Possible fixes:
 1. Have serverManager handle server online event similar to how 
 RegionServerTracker.java calls servermanager.expireServer in the case server 
 goes down.
 2. Make timeoutMonitor handle the situation better. This is a special 
 situation in the cluster. 30 minutes timeout can be skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4397) -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time


[ 
https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178031#comment-13178031
 ] 

Zhihong Yu commented on HBASE-4397:
---

Integrated to 0.92 and TRUNK.

Thanks for the patch Ming.

Thanks for the review, Lars.

 -ROOT-, .META. tables stay offline for too long in recovery phase after all 
 RSs are shutdown at the same time
 -

 Key: HBASE-4397
 URL: https://issues.apache.org/jira/browse/HBASE-4397
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4397-0.92.patch


 1. Shutdown all RSs.
 2. Bring all RS back online.
 The -ROOT-, .META. stay in offline state until timeout monitor force 
 assignment 30 minutes later. That is because HMaster can't find a RS to 
 assign the tables to in assign operation.
 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
 Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, 
 trying to assign elsewhere instead; retry=0
 java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148)
 at $Proxy9.openRegion(Unknown Source)
 at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2011-09-13 13:25:52,743 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable 
 location to assign region -ROOT-,,0.70236052
 Possible fixes:
 1. Have serverManager handle server online event similar to how 
 RegionServerTracker.java calls servermanager.expireServer in the case server 
 goes down.
 2. Make timeoutMonitor handle the situation better. This is a special 
 situation in the cluster. 30 minutes timeout can be skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178033#comment-13178033
 ] 

Zhihong Yu commented on HBASE-4608:
---

@Li:
Please use '--no-prefix' to generate diff.
Otherwise Hadoop QA won't be able to apply your patch.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt, 4608v5.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5113) TestDrainingServer expects round robin region assignment but misses a config parameter

2011-12-31 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178041#comment-13178041
 ] 

Hudson commented on HBASE-5113:
---

Integrated in HBase-TRUNK #2598 (See 
[https://builds.apache.org/job/HBase-TRUNK/2598/])
HBASE-5113  TestDrainingServer expects round robin region assignment but 
misses a
   config parameter

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java


 TestDrainingServer expects round robin region assignment but misses a config 
 parameter
 --

 Key: HBASE-5113
 URL: https://issues.apache.org/jira/browse/HBASE-5113
 Project: HBase
  Issue Type: Test
Reporter: Zhihong Yu
 Fix For: 0.92.0, 0.94.0

 Attachments: 5113.txt


 From 
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2595/testReport/org.apache.hadoop.hbase/TestDrainingServer/org_apache_hadoop_hbase_TestDrainingServer/,
  we can see that some region server didn't have any regions assigned.
 {code}
 // Assert that every regionserver has some regions on it.
 {code}
 It turns out that BulkEnabler has an internal knob whose value is false:
 {code}
   boolean roundRobinAssignment = 
 this.server.getConfiguration().getBoolean(
   hbase.master.enabletable.roundrobin, false);
 {code}
 TestDrainingServer should have set this config parameter before calling 
 admin.enableTable(TABLENAME)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5064) utilize surefire tests parallelization

2011-12-31 Thread nkeywal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178042#comment-13178042
 ] 

nkeywal commented on HBASE-5064:


The exact same test already timeouted in trunk in
https://builds.apache.org/job/HBase-TRUNK/2588/testReport/
I don't think it's related to this patch.

On Sat, Dec 31, 2011 at 4:28 PM, Zhihong Yu (Commented) (JIRA) 



 utilize surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5100) Rollback of split could cause closed region to be opened again


 [ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5100:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Resolved as committed.

 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for

[jira] [Commented] (HBASE-5064) utilize surefire tests parallelization


[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178044#comment-13178044
 ] 

Zhihong Yu commented on HBASE-5064:
---

TestLogRolling hung in build 2588.

I don't see it hanging in build 2597.

Anyway we need to find out why TestMetaReaderEditor wasn't executed.

 utilize surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5113) TestDrainingServer expects round robin region assignment but misses a config parameter


[ 
https://issues.apache.org/jira/browse/HBASE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178045#comment-13178045
 ] 

Zhihong Yu commented on HBASE-5113:
---

Integrated trivial patch to 0.92 and TRUNK.

 TestDrainingServer expects round robin region assignment but misses a config 
 parameter
 --

 Key: HBASE-5113
 URL: https://issues.apache.org/jira/browse/HBASE-5113
 Project: HBase
  Issue Type: Test
Reporter: Zhihong Yu
 Fix For: 0.92.0, 0.94.0

 Attachments: 5113.txt


 From 
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2595/testReport/org.apache.hadoop.hbase/TestDrainingServer/org_apache_hadoop_hbase_TestDrainingServer/,
  we can see that some region server didn't have any regions assigned.
 {code}
 // Assert that every regionserver has some regions on it.
 {code}
 It turns out that BulkEnabler has an internal knob whose value is false:
 {code}
   boolean roundRobinAssignment = 
 this.server.getConfiguration().getBoolean(
   hbase.master.enabletable.roundrobin, false);
 {code}
 TestDrainingServer should have set this config parameter before calling 
 admin.enableTable(TABLENAME)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

2011-12-31 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178051#comment-13178051
 ] 

Zhihong Yu commented on HBASE-5094:
---

+1 on patch.

After discussing with Ramkrishna, the task of handling doubly assigned regions 
would be done in another JIRA.

 The META can hold an entry for a region with a different server name from the 
 one actually in the AssignmentManager thus making the region inaccessible.
 

 Key: HBASE-5094
 URL: https://issues.apache.org/jira/browse/HBASE-5094
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Priority: Critical
 Attachments: HBASE-5094_1.patch


 {code}
 RegionState rit = 
 this.services.getAssignmentManager().isRegionInTransition(e.getKey());
 ServerName addressFromAM = this.services.getAssignmentManager()
 .getRegionServerOfRegion(e.getKey());
 if (rit != null  !rit.isClosing()  !rit.isPendingClose()) {
   // Skip regions that were in transition unless CLOSING or
   // PENDING_CLOSE
   LOG.info(Skip assigning region  + rit.toString());
 } else if (addressFromAM != null
  !addressFromAM.equals(this.serverName)) {
   LOG.debug(Skip assigning region 
 + e.getKey().getRegionNameAsString()
 +  because it has been opened in 
 + addressFromAM.getServerName());
   }
 {code}
 In ServerShutDownHandler we try to get the address in the AM.  This address 
 is initially null because it is not yet updated after the region was opened 
 .i.e. the CAll back after node deletion is not yet done in the master side.
 But removal from RIT is completed on the master side.  So this will trigger a 
 new assignment.
 So there is a small window between the online region is actually added in to 
 the online list and the ServerShutdownHandler where we check the existing 
 address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.


[ 
https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178063#comment-13178063
 ] 

ramkrishna.s.vasudevan commented on HBASE-5094:
---

Committed to trunk and 0.92.

Thanks for the review Ted.

 The META can hold an entry for a region with a different server name from the 
 one actually in the AssignmentManager thus making the region inaccessible.
 

 Key: HBASE-5094
 URL: https://issues.apache.org/jira/browse/HBASE-5094
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Priority: Critical
 Attachments: HBASE-5094_1.patch


 {code}
 RegionState rit = 
 this.services.getAssignmentManager().isRegionInTransition(e.getKey());
 ServerName addressFromAM = this.services.getAssignmentManager()
 .getRegionServerOfRegion(e.getKey());
 if (rit != null  !rit.isClosing()  !rit.isPendingClose()) {
   // Skip regions that were in transition unless CLOSING or
   // PENDING_CLOSE
   LOG.info(Skip assigning region  + rit.toString());
 } else if (addressFromAM != null
  !addressFromAM.equals(this.serverName)) {
   LOG.debug(Skip assigning region 
 + e.getKey().getRegionNameAsString()
 +  because it has been opened in 
 + addressFromAM.getServerName());
   }
 {code}
 In ServerShutDownHandler we try to get the address in the AM.  This address 
 is initially null because it is not yet updated after the region was opened 
 .i.e. the CAll back after node deletion is not yet done in the master side.
 But removal from RIT is completed on the master side.  So this will trigger a 
 new assignment.
 So there is a small window between the online region is actually added in to 
 the online list and the ServerShutdownHandler where we check the existing 
 address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5064) utilize surefire tests parallelization


[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178065#comment-13178065
 ] 

Hudson commented on HBASE-5064:
---

Integrated in HBase-TRUNK #2599 (See 
[https://builds.apache.org/job/HBase-TRUNK/2599/])
HBASE-5064 revert - need to figure out which test caused surefire to report 
timeout

tedyu : 
Files : 
* /hbase/trunk/dev-support/test-patch.sh
* /hbase/trunk/pom.xml
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java
* /hbase/trunk/src/test/resources/hbase-site.xml


 utilize surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5105) TestImportTsv failed with hadoop 0.22


[ 
https://issues.apache.org/jira/browse/HBASE-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178067#comment-13178067
 ] 

Hudson commented on HBASE-5105:
---

Integrated in HBase-TRUNK #2599 (See 
[https://builds.apache.org/job/HBase-TRUNK/2599/])
HBASE-5105  TestImportTsv failed with hadoop 0.22 (Ming Ma)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java


 TestImportTsv failed with hadoop 0.22
 -

 Key: HBASE-5105
 URL: https://issues.apache.org/jira/browse/HBASE-5105
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.0
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HBASE-5105-0.92.patch


 java.io.FileNotFoundException: File does not exist: 
 /home/henkins/.m2/repository/org/apache/hadoop/hadoop-mapred/0.22-SNAPSHOT/hadoop-mapred-0.22-SNAPSHOT.jar
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742)
   at 
 org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:331)
   at 
 org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:711)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:245)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:283)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:350)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
   at 
 org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:215)
   at 
 org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:165)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4397) -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time

2011-12-31 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178066#comment-13178066
 ] 

Hudson commented on HBASE-4397:
---

Integrated in HBase-TRUNK #2599 (See 
[https://builds.apache.org/job/HBase-TRUNK/2599/])
HBASE-4397  -ROOT-, .META. tables stay offline for too long in recovery 
phase after all RSs
   are shutdown at the same time (Ming Ma)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


 -ROOT-, .META. tables stay offline for too long in recovery phase after all 
 RSs are shutdown at the same time
 -

 Key: HBASE-4397
 URL: https://issues.apache.org/jira/browse/HBASE-4397
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4397-0.92.patch


 1. Shutdown all RSs.
 2. Bring all RS back online.
 The -ROOT-, .META. stay in offline state until timeout monitor force 
 assignment 30 minutes later. That is because HMaster can't find a RS to 
 assign the tables to in assign operation.
 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
 Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, 
 trying to assign elsewhere instead; retry=0
 java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148)
 at $Proxy9.openRegion(Unknown Source)
 at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2011-09-13 13:25:52,743 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable 
 location to assign region -ROOT-,,0.70236052
 Possible fixes:
 1. Have serverManager handle server online event similar to how 
 RegionServerTracker.java calls servermanager.expireServer in the case server 
 goes down.
 2. Make timeoutMonitor handle the situation better. This is a special 
 situation in the cluster. 30 minutes timeout can be skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4608) HLog Compression

2011-12-31 Thread Li Pi (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Pi updated HBASE-4608:
-

Attachment: 4608v6.txt

no prefix patch

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178075#comment-13178075
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2011-12-31 20:19:11.951711)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary (updated)
---

HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java
 PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan


 [ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5112:
--

Attachment: org.apache.hadoop.hbase.replication.TestReplication-output.txt

TestReplication still fails on Linux during a test suite run.
Attached is test output.

 TestReplication#queueFailover flaky due to potentially uninitialized Scan
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Test
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5112-v2.txt, hbase-5112.patch, 
 org.apache.hadoop.hbase.replication.TestReplication-output.txt


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-31 Thread Lars Hofhansl (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178079#comment-13178079
 ] 

Lars Hofhansl commented on HBASE-4608:
--

@Li: How big do you expect the in-memory dictionary to grow?
I was wondering if the reading or writing process could give the compressor 
hints about when would be a good time to reset the dictionary (for example when 
memstore flush entry was found).
The compressor can choose to ignore the hints and use some internal logic, or 
reset the dictionary when it got hinted.


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan

2011-12-31 Thread Lars Hofhansl (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178080#comment-13178080
 ] 

Lars Hofhansl commented on HBASE-5112:
--

There might be multiple issues. This patch is still good, I think.

 TestReplication#queueFailover flaky due to potentially uninitialized Scan
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Test
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5112-v2.txt, hbase-5112.patch, 
 org.apache.hadoop.hbase.replication.TestReplication-output.txt


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-31 Thread Li Pi (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178084#comment-13178084
 ] 

Li Pi commented on HBASE-4608:
--

max size = 64k * around 100-200 bytes. Really not that big. Less than 100 
megabytes.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-31 Thread Li Pi (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178085#comment-13178085
 ] 

Li Pi commented on HBASE-4608:
--

I was thinking of replacing the 1-way associative with a a 127 sized LRU 
dictionary. should allow us to save a few bytes, and also be far more efficient 
with our eviction strategy.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-31 Thread Li Pi (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178087#comment-13178087
 ] 

Li Pi commented on HBASE-4608:
--

Guava's mapmaker doesn't guarantee consistent eviction. You'd want to either 
use 2 LinkedHashMap's or your own LRU style system.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4397) -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time


[ 
https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178092#comment-13178092
 ] 

Hudson commented on HBASE-4397:
---

Integrated in HBase-TRUNK-security #57 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/57/])
HBASE-4397  -ROOT-, .META. tables stay offline for too long in recovery 
phase after all RSs
   are shutdown at the same time (Ming Ma)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


 -ROOT-, .META. tables stay offline for too long in recovery phase after all 
 RSs are shutdown at the same time
 -

 Key: HBASE-4397
 URL: https://issues.apache.org/jira/browse/HBASE-4397
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4397-0.92.patch


 1. Shutdown all RSs.
 2. Bring all RS back online.
 The -ROOT-, .META. stay in offline state until timeout monitor force 
 assignment 30 minutes later. That is because HMaster can't find a RS to 
 assign the tables to in assign operation.
 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
 Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, 
 trying to assign elsewhere instead; retry=0
 java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148)
 at $Proxy9.openRegion(Unknown Source)
 at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2011-09-13 13:25:52,743 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable 
 location to assign region -ROOT-,,0.70236052
 Possible fixes:
 1. Have serverManager handle server online event similar to how 
 RegionServerTracker.java calls servermanager.expireServer in the case server 
 goes down.
 2. Make timeoutMonitor handle the situation better. This is a special 
 situation in the cluster. 30 minutes timeout can be skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5105) TestImportTsv failed with hadoop 0.22


[ 
https://issues.apache.org/jira/browse/HBASE-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178093#comment-13178093
 ] 

Hudson commented on HBASE-5105:
---

Integrated in HBase-TRUNK-security #57 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/57/])
HBASE-5105  TestImportTsv failed with hadoop 0.22 (Ming Ma)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java


 TestImportTsv failed with hadoop 0.22
 -

 Key: HBASE-5105
 URL: https://issues.apache.org/jira/browse/HBASE-5105
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.0
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HBASE-5105-0.92.patch


 java.io.FileNotFoundException: File does not exist: 
 /home/henkins/.m2/repository/org/apache/hadoop/hadoop-mapred/0.22-SNAPSHOT/hadoop-mapred-0.22-SNAPSHOT.jar
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742)
   at 
 org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:331)
   at 
 org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:711)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:245)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:283)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:350)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
   at 
 org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:215)
   at 
 org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:165)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5113) TestDrainingServer expects round robin region assignment but misses a config parameter


[ 
https://issues.apache.org/jira/browse/HBASE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178090#comment-13178090
 ] 

Hudson commented on HBASE-5113:
---

Integrated in HBase-TRUNK-security #57 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/57/])
HBASE-5113  TestDrainingServer expects round robin region assignment but 
misses a
   config parameter

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java


 TestDrainingServer expects round robin region assignment but misses a config 
 parameter
 --

 Key: HBASE-5113
 URL: https://issues.apache.org/jira/browse/HBASE-5113
 Project: HBase
  Issue Type: Test
Reporter: Zhihong Yu
 Fix For: 0.92.0, 0.94.0

 Attachments: 5113.txt


 From 
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2595/testReport/org.apache.hadoop.hbase/TestDrainingServer/org_apache_hadoop_hbase_TestDrainingServer/,
  we can see that some region server didn't have any regions assigned.
 {code}
 // Assert that every regionserver has some regions on it.
 {code}
 It turns out that BulkEnabler has an internal knob whose value is false:
 {code}
   boolean roundRobinAssignment = 
 this.server.getConfiguration().getBoolean(
   hbase.master.enabletable.roundrobin, false);
 {code}
 TestDrainingServer should have set this config parameter before calling 
 admin.enableTable(TABLENAME)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.


[ 
https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178091#comment-13178091
 ] 

Hudson commented on HBASE-5094:
---

Integrated in HBase-TRUNK-security #57 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/57/])
HBASE-5094 The META can hold an entry for a region with a different server 
name from the one actually in the AssignmentManager thus making the region 
inaccessible. (Ram)

ramkrishna : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


 The META can hold an entry for a region with a different server name from the 
 one actually in the AssignmentManager thus making the region inaccessible.
 

 Key: HBASE-5094
 URL: https://issues.apache.org/jira/browse/HBASE-5094
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Priority: Critical
 Attachments: HBASE-5094_1.patch


 {code}
 RegionState rit = 
 this.services.getAssignmentManager().isRegionInTransition(e.getKey());
 ServerName addressFromAM = this.services.getAssignmentManager()
 .getRegionServerOfRegion(e.getKey());
 if (rit != null  !rit.isClosing()  !rit.isPendingClose()) {
   // Skip regions that were in transition unless CLOSING or
   // PENDING_CLOSE
   LOG.info(Skip assigning region  + rit.toString());
 } else if (addressFromAM != null
  !addressFromAM.equals(this.serverName)) {
   LOG.debug(Skip assigning region 
 + e.getKey().getRegionNameAsString()
 +  because it has been opened in 
 + addressFromAM.getServerName());
   }
 {code}
 In ServerShutDownHandler we try to get the address in the AM.  This address 
 is initially null because it is not yet updated after the region was opened 
 .i.e. the CAll back after node deletion is not yet done in the master side.
 But removal from RIT is completed on the master side.  So this will trigger a 
 new assignment.
 So there is a small window between the online region is actually added in to 
 the online list and the ServerShutdownHandler where we check the existing 
 address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5064) utilize surefire tests parallelization


[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178089#comment-13178089
 ] 

Hudson commented on HBASE-5064:
---

Integrated in HBase-TRUNK-security #57 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/57/])
HBASE-5064 revert - need to figure out which test caused surefire to report 
timeout
HBASE-5064 utilize surefire tests parallelization (N Keywal)

tedyu : 
Files : 
* /hbase/trunk/dev-support/test-patch.sh
* /hbase/trunk/pom.xml
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java
* /hbase/trunk/src/test/resources/hbase-site.xml

tedyu : 
Files : 
* /hbase/trunk/dev-support/test-patch.sh
* /hbase/trunk/pom.xml
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java
* /hbase/trunk/src/test/resources/hbase-site.xml


 utilize surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.