[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again

2011-12-30 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177585#comment-13177585
 ] 

Hadoop QA commented on HBASE-5100:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508916/5100-v2.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/639//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/639//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/639//console

This message is automatically generated.

 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100-v2.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, 

[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-30 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Patch Available  (was: Open)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-30 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Open  (was: Patch Available)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-30 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Attachment: 5064.v20.patch

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177603#comment-13177603
 ] 

Hudson commented on HBASE-5099:
---

Integrated in HBase-TRUNK #2593 (See 
[https://builds.apache.org/job/HBase-TRUNK/2593/])
HBASE-5099  ZK event thread waiting for root region assignment may block 
server
   shutdown handler for the region sever the root region was on 
(Jimmy)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java


 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5100) Rollback of split could cause closed region to be opened again

2011-12-30 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5100:
--

Attachment: 5100-double-exeception.txt

Patch that covers runtime exception coming out of parent.close(false)

 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
 

[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-30 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177613#comment-13177613
 ] 

Hadoop QA commented on HBASE-5064:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508924/5064.v20.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 16 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.master.TestDistributedLogSplitting

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/640//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/640//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/640//console

This message is automatically generated.

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again

2011-12-30 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177622#comment-13177622
 ] 

Hadoop QA commented on HBASE-5100:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12508926/5100-double-exeception.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/641//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/641//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/641//console

This message is automatically generated.

 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 

[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-30 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177637#comment-13177637
 ] 

nkeywal commented on HBASE-5064:


the v20 is ok for commit imho.

There are two processes by default, and 4 on hadoop-qa. It possible to change 
the number of processes used by specifying 
-Dsurefire.secondPartThreadCount=WhatYouWant on mvn command line.  Using 
-Dsurefire.secondPartThreadCount=1 means no parallelization.


 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4955) Use the official versions of surefire junit

2011-12-30 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177644#comment-13177644
 ] 

nkeywal commented on HBASE-4955:


We're now using 2.12-TRUNK-HBASE-2.

It's a private version, built on the 2.12 trunk (i.e.: it does not contain 
eveyrthing that will be in 2.12 final).

Surefire: Could be for Surefire 2.12. Issues to monitor are:
329 (category support): fixed, we use the official implementation from the trunk
773 (forked processes not killed after timeout): not fixed in trunk, not fixed 
in our version
786 (@Category with forkMode=always): fixed, we use the official implementation 
from the trunk
791 (incorrect elapsed time on test failure): fixed, we use the official 
implementation from the trunk
793 (incorrect time in the XML report): Not fixed (reopen) in trunk, partial 
fixed in our version.
760 (does not take into account the test method): fixed, we use the official 
implementation from the trunk
798 (print immediately the test class name): not fixed in trunk, not fixed in 
our version
799 (Allow test parallelization when forkMode=always): fixed in trunk, fixed in 
our version with some minimal differences.
800 (redirectTestOutputToFile not taken into account): not yet fix on trunk, 
fixed in our version
806 (Ignore selection criteria when -Dtest= is specified):  not fixed in trunk, 
not fixed in our version
813 (Randomly wrong tests count and empty summary files): fixed in trunk, fixed 
in our version 

800  793 are the more important to monitor, it's the only ones that are fixed 
in our version but not on trunk.

 Use the official versions of surefire  junit
 -

 Key: HBASE-4955
 URL: https://issues.apache.org/jira/browse/HBASE-4955
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor

 We currently use private versions for Surefire  JUnit since HBASE-4763.
 This JIRA traks what we need to move to official versions.
 Surefire 2.11 is just out, but, after some tests, it does not contain all 
 what we need.
 JUnit. Could be for JUnit 4.11. Issue to monitor:
 https://github.com/KentBeck/junit/issues/359: fixed in our version, no 
 feedback for an integration on trunk
 Surefire: Could be for Surefire 2.12. Issues to monitor are:
 329 (category support): fixed, we use the official implementation from the 
 trunk
 786 (@Category with forkMode=always): fixed, we use the official 
 implementation from the trunk
 791 (incorrect elapsed time on test failure): fixed, we use the official 
 implementation from the trunk
 793 (incorrect time in the XML report): Not fixed (reopen) on trunk, fixed on 
 our version.
 760 (does not take into account the test method): fixed in trunk, not fixed 
 in our version
 798 (print immediately the test class name): not fixed in trunk, not fixed in 
 our version
 799 (Allow test parallelization when forkMode=always): not fixed in trunk, 
 not fixed in our version
 800 (redirectTestOutputToFile not taken into account): not yet fix on trunk, 
 fixed on our version
 800  793 are the more important to monitor, it's the only ones that are 
 fixed in our version but not on trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5102) Change the default value of the property hbase.connection.per.config to false in hbase-default.xml

2011-12-30 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5102:
--

Attachment: 5102.addendum

Addendum that removes stale connection in HBaseAdmin ctor

 Change the default value of  the property hbase.connection.per.config to 
 false in hbase-default.xml
 -

 Key: HBASE-5102
 URL: https://issues.apache.org/jira/browse/HBASE-5102
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.90.6

 Attachments: 5102.addendum, HBASE-5102.patch


 The property hbase.connection.per.config has a default value of true in 
 hbase-default.xml. In HConnectionManager we try to assign false as the 
 default value if no value is specified.  Better to make it uniform. 
 As per Ted's suggestion making it false in the hbase-default.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again

2011-12-30 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177683#comment-13177683
 ] 

Zhihong Yu commented on HBASE-5100:
---

See discussion 'detecting presence of exception inside finally block' on 
sea...@yahoogroups.com where I polled Java developers on my proposed formation.

 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an 

[jira] [Commented] (HBASE-5109) Fix TestAvroServer so that it waits properly for the modifyTable operation to complete

2011-12-30 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177695#comment-13177695
 ] 

Zhihong Yu commented on HBASE-5109:
---

@Ming:
A loop was introduced by the following checkin:

r1186531 | stack | 2011-10-19 15:05:37 -0700 (Wed, 19 Oct 2011) | 1 line

HBASE-4621 TestAvroServer fails quite often intermittently

Is your patch still needed ?
TestAvroServer hasn't failed for quite a while.

 Fix TestAvroServer so that it waits properly for the modifyTable operation to 
 complete
 --

 Key: HBASE-5109
 URL: https://issues.apache.org/jira/browse/HBASE-5109
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HBASE-5109-0.92.patch


 TestAvroServer has the following issue
  
 impl.modifyTable(tableAname, tableA);
 // It can take a while for the change to take effect. Wait here a while.
 while(impl.describeTable(tableAname) == null ) {
   Threads.sleep(100);
 }
 assertTrue(impl.describeTable(tableAname).maxFileSize == 123456L);
  
 impl.describeTable(tableAname) returns the default maxSize 256M right away as 
 modifyTable is async. Before HBASE-4328 is fixed, we can fix the test code to 
 wait for say max of 5 seconds to check if 
 impl.describeTable(tableAname).maxFileSize is uploaded to 123456L. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5099:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5110) code enhancement - remove unnecessary if-checks in every loop in HLog class

2011-12-30 Thread Mikael Sitruk (Created) (JIRA)
code enhancement - remove unnecessary if-checks in every loop in HLog class
---

 Key: HBASE-5110
 URL: https://issues.apache.org/jira/browse/HBASE-5110
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.90.4, 0.90.2, 0.90.1, 0.92.0
Reporter: Mikael Sitruk
Priority: Minor


The HLog class (method findMemstoresWithEditsEqualOrOlderThan) has unnecessary 
if check in a loop.

 static byte [][] findMemstoresWithEditsEqualOrOlderThan(final long 
oldestWALseqid,
  final Mapbyte [], Long regionsToSeqids) {
//  This method is static so it can be unit tested the easier.
Listbyte [] regions = null;
for (Map.Entrybyte [], Long e: regionsToSeqids.entrySet()) {
  if (e.getValue().longValue() = oldestWALseqid) {
if (regions == null) regions = new ArrayListbyte []();
regions.add(e.getKey());
  }
}
return regions == null?
  null: regions.toArray(new byte [][] {HConstants.EMPTY_BYTE_ARRAY});
  }

The following change is suggested

  static byte [][] findMemstoresWithEditsEqualOrOlderThan(final long 
oldestWALseqid,
  final Mapbyte [], Long regionsToSeqids) {
//  This method is static so it can be unit tested the easier.
Listbyte [] regions = new ArrayListbyte []();
for (Map.Entrybyte [], Long e: regionsToSeqids.entrySet()) {
  if (e.getValue().longValue() = oldestWALseqid) {
regions.add(e.getKey());
  }
}
return regions.size() == 0?
  null: regions.toArray(new byte [][] {HConstants.EMPTY_BYTE_ARRAY});
  }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4397) -ROOT-, .META. table stay offline for too long in the case of all RSs are shutdown at the same time

2011-12-30 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177736#comment-13177736
 ] 

Zhihong Yu commented on HBASE-4397:
---

+1 on patch. 

 -ROOT-, .META. table stay offline for too long in the case of all RSs are 
 shutdown at the same time
 ---

 Key: HBASE-4397
 URL: https://issues.apache.org/jira/browse/HBASE-4397
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HBASE-4397-0.92.patch


 1. Shutdown all RSs.
 2. Bring all RS back online.
 The -ROOT-, .META. stay in offline state until timeout monitor force 
 assignment 30 minutes later. That is because HMaster can't find a RS to 
 assign the tables to in assign operation.
 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
 Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, 
 trying to assign elsewhere instead; retry=0
 java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148)
 at $Proxy9.openRegion(Unknown Source)
 at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2011-09-13 13:25:52,743 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable 
 location to assign region -ROOT-,,0.70236052
 Possible fixes:
 1. Have serverManager handle server online event similar to how 
 RegionServerTracker.java calls servermanager.expireServer in the case server 
 goes down.
 2. Make timeoutMonitor handle the situation better. This is a special 
 situation in the cluster. 30 minutes timeout can be skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5110) code enhancement - remove unnecessary if-checks in every loop in HLog class

2011-12-30 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177750#comment-13177750
 ] 

Todd Lipcon commented on HBASE-5110:


why? This isn't a hot code path...

 code enhancement - remove unnecessary if-checks in every loop in HLog class
 ---

 Key: HBASE-5110
 URL: https://issues.apache.org/jira/browse/HBASE-5110
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.90.1, 0.90.2, 0.90.4, 0.92.0
Reporter: Mikael Sitruk
Priority: Minor

 The HLog class (method findMemstoresWithEditsEqualOrOlderThan) has 
 unnecessary if check in a loop.
  static byte [][] findMemstoresWithEditsEqualOrOlderThan(final long 
 oldestWALseqid,
   final Mapbyte [], Long regionsToSeqids) {
 //  This method is static so it can be unit tested the easier.
 Listbyte [] regions = null;
 for (Map.Entrybyte [], Long e: regionsToSeqids.entrySet()) {
   if (e.getValue().longValue() = oldestWALseqid) {
 if (regions == null) regions = new ArrayListbyte []();
 regions.add(e.getKey());
   }
 }
 return regions == null?
   null: regions.toArray(new byte [][] {HConstants.EMPTY_BYTE_ARRAY});
   }
 The following change is suggested
   static byte [][] findMemstoresWithEditsEqualOrOlderThan(final long 
 oldestWALseqid,
   final Mapbyte [], Long regionsToSeqids) {
 //  This method is static so it can be unit tested the easier.
 Listbyte [] regions = new ArrayListbyte []();
 for (Map.Entrybyte [], Long e: regionsToSeqids.entrySet()) {
   if (e.getValue().longValue() = oldestWALseqid) {
 regions.add(e.getKey());
   }
 }
 return regions.size() == 0?
   null: regions.toArray(new byte [][] {HConstants.EMPTY_BYTE_ARRAY});
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4397) -ROOT-, .META. table stay offline for too long in the case of all RSs are shutdown at the same time

2011-12-30 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177756#comment-13177756
 ] 

Lars Hofhansl commented on HBASE-4397:
--

Nice find and patch... +1

(As a sidenote... Do we have to rethink this entire ROOT and META huh hah? 
There isn't a week going by without some new bug about races between splitting 
and assignment, or the master being stuck assigning ROOT/META, or similar 
cases. There are too many players that need to be kept in synch: The FS, 
ROOT/META, Zookeekper).


 -ROOT-, .META. table stay offline for too long in the case of all RSs are 
 shutdown at the same time
 ---

 Key: HBASE-4397
 URL: https://issues.apache.org/jira/browse/HBASE-4397
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HBASE-4397-0.92.patch


 1. Shutdown all RSs.
 2. Bring all RS back online.
 The -ROOT-, .META. stay in offline state until timeout monitor force 
 assignment 30 minutes later. That is because HMaster can't find a RS to 
 assign the tables to in assign operation.
 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
 Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, 
 trying to assign elsewhere instead; retry=0
 java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148)
 at $Proxy9.openRegion(Unknown Source)
 at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2011-09-13 13:25:52,743 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable 
 location to assign region -ROOT-,,0.70236052
 Possible fixes:
 1. Have serverManager handle server online event similar to how 
 RegionServerTracker.java calls servermanager.expireServer in the case server 
 goes down.
 2. Make timeoutMonitor handle the situation better. This is a special 
 situation in the cluster. 30 minutes timeout can be skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4397) -ROOT-, .META. table stay offline for too long in the case of all RSs are shutdown at the same time

2011-12-30 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4397:
--

Status: Patch Available  (was: Open)

 -ROOT-, .META. table stay offline for too long in the case of all RSs are 
 shutdown at the same time
 ---

 Key: HBASE-4397
 URL: https://issues.apache.org/jira/browse/HBASE-4397
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HBASE-4397-0.92.patch


 1. Shutdown all RSs.
 2. Bring all RS back online.
 The -ROOT-, .META. stay in offline state until timeout monitor force 
 assignment 30 minutes later. That is because HMaster can't find a RS to 
 assign the tables to in assign operation.
 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
 Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, 
 trying to assign elsewhere instead; retry=0
 java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148)
 at $Proxy9.openRegion(Unknown Source)
 at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2011-09-13 13:25:52,743 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable 
 location to assign region -ROOT-,,0.70236052
 Possible fixes:
 1. Have serverManager handle server online event similar to how 
 RegionServerTracker.java calls servermanager.expireServer in the case server 
 goes down.
 2. Make timeoutMonitor handle the situation better. This is a special 
 situation in the cluster. 30 minutes timeout can be skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again

2011-12-30 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177785#comment-13177785
 ] 

stack commented on HBASE-5100:
--

Whats happening now in this issue?  There is a v2.  Is that now the candidate 
fix?

 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
 

[jira] [Commented] (HBASE-5110) code enhancement - remove unnecessary if-checks in every loop in HLog class

2011-12-30 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177841#comment-13177841
 ] 

Todd Lipcon commented on HBASE-5110:


Ah, I missed that thread... I just wanted to clarify if this is for readability 
or performance... do you see this function getting called a lot in a write 
workload? Your comments on the mailing list thread indicate that it's 
performance sensitive, but I don't see how that would be the case.

 code enhancement - remove unnecessary if-checks in every loop in HLog class
 ---

 Key: HBASE-5110
 URL: https://issues.apache.org/jira/browse/HBASE-5110
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.90.1, 0.90.2, 0.90.4, 0.92.0
Reporter: Mikael Sitruk
Priority: Minor

 The HLog class (method findMemstoresWithEditsEqualOrOlderThan) has 
 unnecessary if check in a loop.
  static byte [][] findMemstoresWithEditsEqualOrOlderThan(final long 
 oldestWALseqid,
   final Mapbyte [], Long regionsToSeqids) {
 //  This method is static so it can be unit tested the easier.
 Listbyte [] regions = null;
 for (Map.Entrybyte [], Long e: regionsToSeqids.entrySet()) {
   if (e.getValue().longValue() = oldestWALseqid) {
 if (regions == null) regions = new ArrayListbyte []();
 regions.add(e.getKey());
   }
 }
 return regions == null?
   null: regions.toArray(new byte [][] {HConstants.EMPTY_BYTE_ARRAY});
   }
 The following change is suggested
   static byte [][] findMemstoresWithEditsEqualOrOlderThan(final long 
 oldestWALseqid,
   final Mapbyte [], Long regionsToSeqids) {
 //  This method is static so it can be unit tested the easier.
 Listbyte [] regions = new ArrayListbyte []();
 for (Map.Entrybyte [], Long e: regionsToSeqids.entrySet()) {
   if (e.getValue().longValue() = oldestWALseqid) {
 regions.add(e.getKey());
   }
 }
 return regions.size() == 0?
   null: regions.toArray(new byte [][] {HConstants.EMPTY_BYTE_ARRAY});
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Zhihong Yu (Reopened) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reopened HBASE-5099:
---


0.92 Jenkins builds have failed 4 times in a roll.

TestReplication#queueFailover failed in builds 217 and 218.
It failed consistently on MacBook as well.

Rolling back the patches.

 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-30 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177845#comment-13177845
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2011-12-31 00:20:40.770066)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
---

WritableContext makes things cleaner. Some space optimizations to make 
compression even more efficient.


Summary
---

Heres what I have so far. Things are written, and should work. I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with 
something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java
 PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177846#comment-13177846
 ] 

Zhihong Yu commented on HBASE-5099:
---

I reverted 0.92 patch.

Now TestReplication passes on Mac.

Let's find out if the patch is related to replication test failure or not.

Keeping TRUNK patch in TRUNK for now since trunk build 2594 passed.

 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177847#comment-13177847
 ] 

Jimmy Xiang commented on HBASE-5099:


TestReplication is flaky.  But it works on my ubuntu box.
Let me take a look.

 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2011-12-30 Thread Shrijeet Paliwal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177849#comment-13177849
 ] 

Shrijeet Paliwal commented on HBASE-5041:
-

I will update this Jira with new Patch post holidays.

 Major compaction on non existing table does not throw error 
 

 Key: HBASE-5041
 URL: https://issues.apache.org/jira/browse/HBASE-5041
 Project: HBase
  Issue Type: Bug
  Components: regionserver, shell
Affects Versions: 0.90.3
Reporter: Shrijeet Paliwal
Assignee: Shrijeet Paliwal
 Fix For: 0.92.0, 0.94.0, 0.90.6

 Attachments: 0001-HBASE-5041-Throw-error-if-table-does-not-exist.patch


 Following will not complain even if fubar does not exist
 {code}
 echo major_compact 'fubar' | $HBASE_HOME/bin/hbase shell
 {code}
 The downside for this defect is that major compaction may be skipped due to
 a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177850#comment-13177850
 ] 

Zhihong Yu commented on HBASE-5099:
---

Please read through the test output of 0.92 builds 217 and 218.
With patch 5099.92, the test failure is reproducible on MacBook.

Another validation is to deploy patch 5099.92 to real clusters and see if 
replication works.

 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-30 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177852#comment-13177852
 ] 

Zhihong Yu commented on HBASE-4608:
---

@Li:
Do you want submit latest patch to Hadoop QA ?

Thanks

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177860#comment-13177860
 ] 

Jimmy Xiang commented on HBASE-5099:


I tried to debug this testcase but it doesn't stop at the changes I did.

 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177863#comment-13177863
 ] 

Zhihong Yu commented on HBASE-5099:
---

Test scripts from HBASE-4480 would be useful in reproducing the test failure.
You can run TestReplication#queueFailover in a loop (on different OSes).

 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again

2011-12-30 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177868#comment-13177868
 ] 

chunhui shen commented on HBASE-5100:
-

@Zhihong
I think both are ok now.
I agree to commit 5100-double-exeception.txt since it is more understand 
understandable.

 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
 

[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-30 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177873#comment-13177873
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2011-12-31 02:06:00.510532)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
---

fixed a failing test.


Summary
---

Heres what I have so far. Things are written, and should work. I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with 
something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java
 PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt, 4608v5.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4608) HLog Compression

2011-12-30 Thread Li Pi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Pi updated HBASE-4608:
-

Attachment: 4608v5.txt

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt, 4608v5.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4608) HLog Compression

2011-12-30 Thread Li Pi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Pi updated HBASE-4608:
-

Release Note: Patch for WAL Compression.
  Status: Patch Available  (was: Open)

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt, 4608v5.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-30 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177874#comment-13177874
 ] 

Li Pi commented on HBASE-4608:
--

Yup. good time to do it.

On Fri, Dec 30, 2011 at 4:35 PM, Zhihong Yu (Commented) (JIRA)


 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt, 4608v5.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177877#comment-13177877
 ] 

Hudson commented on HBASE-5099:
---

Integrated in HBase-0.92 #219 (See 
[https://builds.apache.org/job/HBase-0.92/219/])
HBASE-5099 revert due to continuous 0.92 build failures

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java


 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again

2011-12-30 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177879#comment-13177879
 ] 

Zhihong Yu commented on HBASE-5100:
---

Thanks for the feedback, Chunhui.

I integrated double exception patch to 0.92 and TRUNK.

Thanks for initial patch, Chunhui.

Thanks for the review, Stack.

 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an 

[jira] [Created] (HBASE-5111) Upgrade zookeeper to 3.4.2 release

2011-12-30 Thread Zhihong Yu (Created) (JIRA)
Upgrade zookeeper to 3.4.2 release
--

 Key: HBASE-5111
 URL: https://issues.apache.org/jira/browse/HBASE-5111
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu


Zookeeper 3.4.2 has just been released.
We should upgrade to this release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5111) Upgrade zookeeper to 3.4.2 release

2011-12-30 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5111:
--

Fix Version/s: 0.94.0
   0.92.0

 Upgrade zookeeper to 3.4.2 release
 --

 Key: HBASE-5111
 URL: https://issues.apache.org/jira/browse/HBASE-5111
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
 Fix For: 0.92.0, 0.94.0


 Zookeeper 3.4.2 has just been released.
 We should upgrade to this release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-30 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177884#comment-13177884
 ] 

Zhihong Yu commented on HBASE-5064:
---

I got the following running test suite on Linux:
{code}
Failed tests:   
testLogRollOnDatanodeDeath(org.apache.hadoop.hbase.regionserver.wal.TestLogRolling):
 LowReplication Roller should've been disabled
  testMultipleResubmits(org.apache.hadoop.hbase.master.TestSplitLogManager): 
expected:2 but was:3

Tests run: 781, Failures: 2, Errors: 0, Skipped: 9
{code}
where:
{code}
open files  (-n) 32768
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) unlimited
{code}
I think we can give v20 a chance on Jenkins.
At the moment test suite reliability is more important than speed, IMHO.

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v20.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4397) -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time

2011-12-30 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4397:
--

Fix Version/s: 0.94.0
   0.92.0
  Summary: -ROOT-, .META. tables stay offline for too long in 
recovery phase after all RSs are shutdown at the same time  (was: -ROOT-, 
.META. table stay offline for too long in the case of all RSs are shutdown at 
the same time)

 -ROOT-, .META. tables stay offline for too long in recovery phase after 
 all RSs are shutdown at the same time
 -

 Key: HBASE-4397
 URL: https://issues.apache.org/jira/browse/HBASE-4397
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4397-0.92.patch


 1. Shutdown all RSs.
 2. Bring all RS back online.
 The -ROOT-, .META. stay in offline state until timeout monitor force 
 assignment 30 minutes later. That is because HMaster can't find a RS to 
 assign the tables to in assign operation.
 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
 Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, 
 trying to assign elsewhere instead; retry=0
 java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148)
 at $Proxy9.openRegion(Unknown Source)
 at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2011-09-13 13:25:52,743 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable 
 location to assign region -ROOT-,,0.70236052
 Possible fixes:
 1. Have serverManager handle server online event similar to how 
 RegionServerTracker.java calls servermanager.expireServer in the case server 
 goes down.
 2. Make timeoutMonitor handle the situation better. This is a special 
 situation in the cluster. 30 minutes timeout can be skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5112) TestReplication#queueFailover flaky due to code error

2011-12-30 Thread Jimmy Xiang (Created) (JIRA)
TestReplication#queueFailover flaky due to code error
-

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


In TestReplication#queueFailover, the second scan is not reset for each new 
scan.  Followed scan may not be able to scan the whole table.
So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5112) TestReplication#queueFailover flaky due to code error

2011-12-30 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5112:
---

Attachment: hbase-5112.patch

 TestReplication#queueFailover flaky due to code error
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: hbase-5112.patch


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177886#comment-13177886
 ] 

Jimmy Xiang commented on HBASE-5099:


TestReplication#queueFailover has a bug that's why it is flaky:

https://issues.apache.org/jira/browse/HBASE-5112

 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5112) TestReplication#queueFailover flaky due to code error

2011-12-30 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5112:
---

Status: Patch Available  (was: Open)

 TestReplication#queueFailover flaky due to code error
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: hbase-5112.patch


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5055) Build against hadoop 0.22 broken

2011-12-30 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177888#comment-13177888
 ] 

Hudson commented on HBASE-5055:
---

Integrated in HBase-0.92-security #54 (See 
[https://builds.apache.org/job/HBase-0.92-security/54/])
HBASE-5055 Build against hadoop 0.22 broken - remove import of 
DFSClient.DFSInputStream (Ming Ma)

tedyu : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java


 Build against hadoop 0.22 broken
 

 Key: HBASE-5055
 URL: https://issues.apache.org/jira/browse/HBASE-5055
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Zhihong Yu
Assignee: stack
Priority: Blocker
 Fix For: 0.92.0, 0.94.0

 Attachments: 5055.txt, HBASE-5055-0.92.patch


 I got the following when compiling TRUNK against hadoop 0.22:
 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile 
 (default-compile) on project hbase: Compilation failure: Compilation failure:
 [ERROR] 
 /Users/zhihyu/trunk-hbase/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java:[37,39]
  cannot find symbol
 [ERROR] symbol  : class DFSInputStream
 [ERROR] location: class org.apache.hadoop.hdfs.DFSClient
 [ERROR] 
 [ERROR] 
 /Users/zhihyu/trunk-hbase/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java:[109,37]
  cannot find symbol
 [ERROR] symbol  : class DFSInputStream
 [ERROR] location: class 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.WALReader.WALReaderFSDataInputStream
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5103) Fix improper master znode deserialization

2011-12-30 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177889#comment-13177889
 ] 

Hudson commented on HBASE-5103:
---

Integrated in HBase-0.92-security #54 (See 
[https://builds.apache.org/job/HBase-0.92-security/54/])
HBASE-5103  Fix improper master znode deserialization (Jonathan Hsieh)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java


 Fix improper master znode deserialization
 -

 Key: HBASE-5103
 URL: https://issues.apache.org/jira/browse/HBASE-5103
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: hbase-5103.patch


 In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
 created as a versioned serialized version of ServerName
 {code}
  if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
   this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
 {code}
 There are a few user visible places where it is used but not deserialized 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5112) TestReplication#queueFailover flaky due to code error

2011-12-30 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177887#comment-13177887
 ] 

Jimmy Xiang commented on HBASE-5112:


@Ted, could you please give this patch a try on your MacBook?  I could not 
reproduce the failure on my box.
I looked into the code carefully and this fix should make this testcase not 
flaky any more.

 TestReplication#queueFailover flaky due to code error
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: hbase-5112.patch


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177890#comment-13177890
 ] 

Hudson commented on HBASE-5099:
---

Integrated in HBase-0.92-security #54 (See 
[https://builds.apache.org/job/HBase-0.92-security/54/])
HBASE-5099 revert due to continuous 0.92 build failures
HBASE-5099  ZK event thread waiting for root region assignment may block server
   shutdown handler for the region sever the root region was on 
(Jimmy)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java


 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5112) TestReplication#queueFailover flaky due to code error

2011-12-30 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5112:
--

Fix Version/s: 0.94.0
   0.92.0
   Issue Type: Test  (was: Bug)

 TestReplication#queueFailover flaky due to code error
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Test
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: hbase-5112.patch


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan

2011-12-30 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5112:
--

Hadoop Flags: Reviewed
 Summary: TestReplication#queueFailover flaky due to potentially 
uninitialized Scan  (was: TestReplication#queueFailover flaky due to code error)

 TestReplication#queueFailover flaky due to potentially uninitialized Scan
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Test
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: hbase-5112.patch


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5112) TestReplication#queueFailover flaky due to code error

2011-12-30 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177891#comment-13177891
 ] 

Lars Hofhansl commented on HBASE-5112:
--

Nice find. +1 on patch.

 TestReplication#queueFailover flaky due to code error
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Test
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: hbase-5112.patch


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan

2011-12-30 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5112:
--

Attachment: 5112-v2.txt

I propose this patch based on Jimmy's where Thread is set as Daemon.

 TestReplication#queueFailover flaky due to potentially uninitialized Scan
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Test
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5112-v2.txt, hbase-5112.patch


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan

2011-12-30 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177893#comment-13177893
 ] 

Zhihong Yu commented on HBASE-5112:
---

I looped TestReplication#queueFailover 5 times using both 5112-v2.txt and 
5099.92 - no error

I am looping TestReplication itself 5 more times.

Will integrate both 5112 and 5099 if there is no error.

Thanks for the New Year present, Jimmy.

 TestReplication#queueFailover flaky due to potentially uninitialized Scan
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Test
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5112-v2.txt, hbase-5112.patch


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan

2011-12-30 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177895#comment-13177895
 ] 

Lars Hofhansl commented on HBASE-5112:
--

+1 on v2

 TestReplication#queueFailover flaky due to potentially uninitialized Scan
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Test
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5112-v2.txt, hbase-5112.patch


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan

2011-12-30 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177897#comment-13177897
 ] 

Zhihong Yu commented on HBASE-5112:
---

Integrated to 0.92 and TRUNK.

Thanks for the patch, Jimmy.

Thanks for the review, Lars.

 TestReplication#queueFailover flaky due to potentially uninitialized Scan
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Test
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5112-v2.txt, hbase-5112.patch


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5108) ICV puts memstore before writing WAL first -- by default; make the default be 'correct' and let better perf be optional

2011-12-30 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177899#comment-13177899
 ] 

Lars Hofhansl commented on HBASE-5108:
--

To be very precise, what happens (ICV, increment, append) is that the WAL is 
written with the lock held, but the sync request is issued after the lock is 
released.

So what could happen is that the other clients see the updated value in the 
memstore (in fact they do see it right away - see HBASE-4583). Now, if the 
region server dies before the sync was executed the clients might have based 
their logic upon uncommitted state.
We cannot roll back the memstore state for ICVs because the operation is not 
idempotent (and for various other reasons also explained in HBASE-4583, all 
client scanners see the updates immediately).

I am somewhat torn on this one. This failure scenario is pretty rare, and the 
performance implication of doing 100% correct would be significant. Maybe for 
ICVs there should be three different options: (1) write WAL synchronously, (2) 
don't write WAL, a new option (3) do a best effort WAL write.


 ICV puts memstore before writing WAL first -- by default; make the default be 
 'correct' and let better perf be optional
 ---

 Key: HBASE-5108
 URL: https://issues.apache.org/jira/browse/HBASE-5108
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical

 See this thread up on the list and Lars' note on the end: 
 http://search-hadoop.com/m/Y6xTRp6sxq1/%2522Help+regarding+RowLock%2522subj=Help+regarding+RowLock
 I thought it was just ICV that did the memstore put first.  This issue is 
 about making it so the described behavior is optional and that the default 
 out of the box goes for correctness -- i.e. write WAL first and then memstore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5108) ICV puts memstore before writing WAL first -- by default; make the default be 'correct' and let better perf be optional

2011-12-30 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177901#comment-13177901
 ] 

Lars Hofhansl commented on HBASE-5108:
--

Where #3 would be the current behavior.

 ICV puts memstore before writing WAL first -- by default; make the default be 
 'correct' and let better perf be optional
 ---

 Key: HBASE-5108
 URL: https://issues.apache.org/jira/browse/HBASE-5108
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical

 See this thread up on the list and Lars' note on the end: 
 http://search-hadoop.com/m/Y6xTRp6sxq1/%2522Help+regarding+RowLock%2522subj=Help+regarding+RowLock
 I thought it was just ICV that did the memstore put first.  This issue is 
 about making it so the described behavior is optional and that the default 
 out of the box goes for correctness -- i.e. write WAL first and then memstore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177900#comment-13177900
 ] 

Zhihong Yu commented on HBASE-5099:
---

Integrated 5099.92 to 0.92 branch again.

 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again

2011-12-30 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177908#comment-13177908
 ] 

Hudson commented on HBASE-5100:
---

Integrated in HBase-0.92 #220 (See 
[https://builds.apache.org/job/HBase-0.92/220/])
HBASE-5100  Rollback of split could cause closed region to be opened again 
(Chunhui)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java


 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100-double-exeception.txt, 5100-v2.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE,