[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5120:
--

Attachment: HBASE-5120_4.patch

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
> Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
> HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,08

[jira] [Updated] (HBASE-5168) Backport HBASE-5100 - Rollback of split could cause closed region to be opened again

2012-01-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5168:
--

Attachment: HBASE-5100_0.90.patch

> Backport HBASE-5100 - Rollback of split could cause closed region to be 
> opened again
> 
>
> Key: HBASE-5168
> URL: https://issues.apache.org/jira/browse/HBASE-5168
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
> Attachments: HBASE-5100_0.90.patch
>
>
> Considering the importance of the defect merging it to 0.90.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5152) Region is on service before completing initialization when doing rollback of split, it will affect read correctness

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183969#comment-13183969
 ] 

Hudson commented on HBASE-5152:
---

Integrated in HBase-TRUNK #2617 (See 
[https://builds.apache.org/job/HBase-TRUNK/2617/])
HBASE-5152  Region is on service before completing initialization when 
doing rollback of split,
   it will affect read correctness (Chunhui)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


> Region is on service before completing initialization when doing rollback of 
> split, it will affect read correctness 
> 
>
> Key: HBASE-5152
> URL: https://issues.apache.org/jira/browse/HBASE-5152
> Project: HBase
>  Issue Type: Bug
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 5152-v2.txt, hbase-5152.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5052) The path where a dynamically loaded coprocessor jar is copied on the local file system depends on the region name (and implicitly, the start key)

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183971#comment-13183971
 ] 

Hudson commented on HBASE-5052:
---

Integrated in HBase-TRUNK #2617 (See 
[https://builds.apache.org/job/HBase-TRUNK/2617/])
HBASE-5052 The path where a dynamically loaded coprocessor jar is copied on 
the local file system depends on the region name (and implicitly, the start key)

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java


> The path where a dynamically loaded coprocessor jar is copied on the local 
> file system depends on the region name (and implicitly, the start key)
> -
>
> Key: HBASE-5052
> URL: https://issues.apache.org/jira/browse/HBASE-5052
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Affects Versions: 0.92.0
>Reporter: Andrei Dragomir
>Assignee: Andrei Dragomir
> Fix For: 0.92.0
>
> Attachments: HBASE-5052.patch
>
>
> When loading a coprocessor from hdfs, the jar file gets copied to a path on 
> the local filesystem, which depends on the region name, and the region start 
> key. The name is "cleaned", but not enough, so when you have filesystem 
> unfriendly characters (/?:, etc), the coprocessor is not loaded, and an error 
> is thrown

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183972#comment-13183972
 ] 

Hudson commented on HBASE-5121:
---

Integrated in HBase-TRUNK #2617 (See 
[https://builds.apache.org/job/HBase-TRUNK/2617/])
HBASE-5121 MajorCompaction may affect scan's correctness (chunhui shen and 
Lars H)

larsh : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanner.java


> MajorCompaction may affect scan's correctness
> -
>
> Key: HBASE-5121
> URL: https://issues.apache.org/jira/browse/HBASE-5121
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.94.0, 0.92.1
>
> Attachments: 5121-0.92.txt, 5121-suggest.txt, 
> 5121-trunk-combined.txt, 5121.90, hbase-5121-testcase.patch, 
> hbase-5121.patch, hbase-5121v2.patch
>
>
> In our test, there are two families' keyvalue for one row.
> But we could find a infrequent problem when doing scan's next if 
> majorCompaction happens concurrently.
> In the client's two continuous doing scan.next():
> 1.First time, scan's next returns the result where family A is null.
> 2.Second time, scan's next returns the result where family B is null.
> The two next()'s result have the same row.
> If there are more families, I think the scenario will be more strange...
> We find the reason is that storescanner.peek() is changed after 
> majorCompaction if there are delete type KeyValue.
> This change causes the PriorityQueue of RegionScanner's heap 
> is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183970#comment-13183970
 ] 

Hudson commented on HBASE-5141:
---

Integrated in HBase-TRUNK #2617 (See 
[https://builds.apache.org/job/HBase-TRUNK/2617/])
HBASE-5141 Memory leak in MonitoredRPCHandlerImpl -- REDO
HBASE-5141 Memory leak in MonitoredRPCHandlerImpl -- REVERT. OVER-COMMITTED.  
REVERTING ALL SO CAN REDO COMMIT
HBASE-5141 Memory leak in MonitoredRPCHandlerImpl

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java


> Memory leak in MonitoredRPCHandlerImpl
> --
>
> Key: HBASE-5141
> URL: https://issues.apache.org/jira/browse/HBASE-5141
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-5141-v2.patch, HBASE-5141.patch, Screen Shot 
> 2012-01-06 at 3.03.09 PM.png
>
>
> I got a pretty reliable way of OOME'ing my region servers. Using a big 
> payload (64MB in my case), a default heap and default number of handlers, 
> it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB 
> reference and once a compaction kicks in it kills everything.
> The issue is that even after the RPC call is done, the packet still lives in 
> MonitoredRPCHandlerImpl.
> Will attach a screen shot of jprofiler's analysis in a moment.
> This is a blocker for 0.92.0, anyone using a high number of handlers and 
> bigish values will kill themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183973#comment-13183973
 ] 

Hudson commented on HBASE-5041:
---

Integrated in HBase-TRUNK #2617 (See 
[https://builds.apache.org/job/HBase-TRUNK/2617/])
HBASE-5041  Major compaction on non existing table does not throw error 
(Shrijeet)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java


> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>Assignee: Shrijeet Paliwal
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch
>
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5134) Remove getRegionServerWithoutRetries and getRegionServerWithRetries from HConnection Interface

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183974#comment-13183974
 ] 

Hudson commented on HBASE-5134:
---

Integrated in HBase-TRUNK #2617 (See 
[https://builds.apache.org/job/HBase-TRUNK/2617/])
HBASE-5134 Remove getRegionServerWithoutRetries and 
getRegionServerWithRetries from HConnection Interface

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HBaseConfiguration.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ConnectionUtils.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/ExecRPCInvoker.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionServerBulkLoad.java


> Remove getRegionServerWithoutRetries and getRegionServerWithRetries from 
> HConnection Interface
> --
>
> Key: HBASE-5134
> URL: https://issues.apache.org/jira/browse/HBASE-5134
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>Assignee: stack
> Fix For: 0.94.0
>
> Attachments: 5134-v2.txt, 5134-v3.txt, 5134-v4.txt, 5134-v5.txt, 
> 5134-v6.txt, 5134-v6.txt
>
>
> Its broke having these meta methods in HConnection.  They take 
> ServerCallables which themselves have HConnections inevitably.   It makes for 
> a tangle in the model and frustrates being able to do mocked implemenations 
> of HConnection.  These methods better belong in something like 
> HConnectionManager, or elsewhere altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5172) HTableInterface should extend java.io.Closeable

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183976#comment-13183976
 ] 

Hudson commented on HBASE-5172:
---

Integrated in HBase-TRUNK #2617 (See 
[https://builds.apache.org/job/HBase-TRUNK/2617/])
HBASE-5172 HTableInterface should extend java.io.Closeable

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java


> HTableInterface should extend java.io.Closeable
> ---
>
> Key: HBASE-5172
> URL: https://issues.apache.org/jira/browse/HBASE-5172
> Project: HBase
>  Issue Type: Bug
>Reporter: Zhihong Yu
>Assignee: stack
> Fix For: 0.94.0
>
> Attachments: 5172.txt
>
>
> Ioan Eugen Stan found this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5173) Commit hbase-4480 findHangingTest.sh script under dev-support

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183979#comment-13183979
 ] 

Hudson commented on HBASE-5173:
---

Integrated in HBase-TRUNK #2617 (See 
[https://builds.apache.org/job/HBase-TRUNK/2617/])
HBASE-5173 Commit hbase-4480 findHangingTest.sh script under dev-support

stack : 
Files : 
* /hbase/trunk/dev-support/findHangingTest.sh


> Commit hbase-4480 findHangingTest.sh script under dev-support
> -
>
> Key: HBASE-5173
> URL: https://issues.apache.org/jira/browse/HBASE-5173
> Project: HBase
>  Issue Type: Task
>Reporter: stack
> Fix For: 0.94.0
>
> Attachments: 5173.txt
>
>
> See hbase-4480 for the script from Ted

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183980#comment-13183980
 ] 

Hudson commented on HBASE-5088:
---

Integrated in HBase-TRUNK #2617 (See 
[https://builds.apache.org/job/HBase-TRUNK/2617/])
HBASE-5088  addendum
HBASE-5088 A concurrency issue on SoftValueSortedMap (Jieshan Bean and Lars H)

larsh : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java

larsh : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java


> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 5088-0.90.txt, 5088-0.92-trunk-addendum.txt, 
> 5088-final3.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, 
> HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4480) Testing script to simplify local testing

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183977#comment-13183977
 ] 

Hudson commented on HBASE-4480:
---

Integrated in HBase-TRUNK #2617 (See 
[https://builds.apache.org/job/HBase-TRUNK/2617/])
HBASE-5173 Commit hbase-4480 findHangingTest.sh script under dev-support


> Testing script to simplify local testing
> 
>
> Key: HBASE-4480
> URL: https://issues.apache.org/jira/browse/HBASE-4480
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Jesse Yates
>Priority: Minor
>  Labels: test
> Fix For: 0.94.0
>
> Attachments: HBASE-4480.patch, HBASE-4480_v2.patch, 
> HBASE-4480_v3.patch, HBASE-4480_v4.patch, findHangingTest.sh, 
> runtest-no-npe-check.sh, runtest.sh, runtest2.sh
>
>
> As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
> http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
> script that would handle more of the finer points of running/checking our 
> test suite.
> This script should:
> (1) Allow people to determine which tests are hanging/taking a long time to 
> run
> (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
> running the whole suite that caused the failure
> (3) Allow people to specify to run just unit tests or also integration tests 
> (essentially wrapping calls to 'maven test' and 'maven verify').
> This script should just be a convenience script - running tests directly from 
> maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183978#comment-13183978
 ] 

Hudson commented on HBASE-5137:
---

Integrated in HBase-TRUNK #2617 (See 
[https://builds.apache.org/job/HBase-TRUNK/2617/])
HBASE-5137 MasterFileSystem.splitLog() should abort even if 
waitOnSafeMode() throws IOException(Ram & Ted)

ramkrishna : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java


> MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws 
> IOException
> 
>
> Key: HBASE-5137
> URL: https://issues.apache.org/jira/browse/HBASE-5137
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5137-trunk.txt, HBASE-5137.patch, HBASE-5137.patch
>
>
> I am not sure if this bug was already raised in JIRA.
> In our test cluster we had a scenario where the RS had gone down and 
> ServerShutDownHandler started with splitLog.
> But as the HDFS was down the check waitOnSafeMode throws IOException.
> {code}
> try {
> // If FS is in safe mode, just wait till out of it.
> FSUtils.waitOnSafeMode(conf,
>   conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));  
> splitter.splitLog();
>   } catch (OrphanHLogAfterSplitException e) {
> {code}
> We catch the exception
> {code}
> } catch (IOException e) {
>   checkFileSystem();
>   LOG.error("Failed splitting " + logDir.toString(), e);
> }
> {code}
> So the HLog split itself did not happen. We encontered like 4 regions that 
> was recently splitted in the crashed RS was lost.
> Can we abort the Master in such scenarios? Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3949) Add "Master" link to RegionServer pages

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183975#comment-13183975
 ] 

Hudson commented on HBASE-3949:
---

Integrated in HBase-TRUNK #2617 (See 
[https://builds.apache.org/job/HBase-TRUNK/2617/])
HBASE-3949. Add "Master" link to RegionServer pages. Contributed by Gregory 
Chanan.

todd : 
Files : 
* 
/hbase/trunk/src/main/jamon/org/apache/hbase/tmpl/regionserver/RSStatusTmpl.jamon
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSStatusServlet.java


> Add "Master" link to RegionServer pages
> ---
>
> Key: HBASE-3949
> URL: https://issues.apache.org/jira/browse/HBASE-3949
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.90.3, 0.92.0
>Reporter: Lars George
>Assignee: Gregory Chanan
>Priority: Minor
>  Labels: noob
> Fix For: 0.94.0
>
>
> Use the ZK info where the master is to add a UI link on the top of each 
> RegionServer page. Currently you cannot navigate directly to the Master UI 
> once you are on a RS page.
> Not sure if the info port is exposed OTTOMH, but we could either use the RS 
> local config setting for that or add it to ZK to enable lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-11 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183989#comment-13183989
 ] 

Hadoop QA commented on HBASE-5120:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510165/HBASE-5120_4.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -147 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 79 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.io.hfile.TestLruBlockCache
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/726//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/726//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/726//console

This message is automatically generated.

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
> Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
> HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

2012-01-11 Thread Jieshan Bean (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-5153:


Attachment: HBASE-5153-V3.patch

> HConnection re-creation in HTable after HConnection abort
> -
>
> Key: HBASE-5153
> URL: https://issues.apache.org/jira/browse/HBASE-5153
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.90.6
>
> Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
> HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
> share a same connection, once this connection got abort in one thread, the 
> other threads will got a 
> "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal 
> HTable instance cann't be continue to use. The connection in HTable should be 
> recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable 
> instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

2012-01-11 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184031#comment-13184031
 ] 

jirapos...@reviews.apache.org commented on HBASE-5128:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/
---

(Updated 2012-01-11 12:46:37.524636)


Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel 
Cryans.


Changes
---

Fixed bug link.  Added JD.  

JD -- the code that is similar to merging is 

- #handleOverlapGroup
- inMeta && !inHdfs && isDeployed  (in another rev I've added an unassign and 
believe I still have the disable/delete problem).


Summary
---

I'm posting a preliminary version that I'm currently testing on real clusters. 
The tests are flakey on the 0.90 branch (so there is something async that I 
didn't synchronize properly), and there are a few more TODO's I want to knock 
out before this is ready for full review to be considered for committing. It's 
got some problems I need some advice figuring out.

Problem 1:

In the unit tests, I have a few cases where I fabricate new regions and try to 
force the overlapping regions to be closed. For some of these, I cannot delete 
a table after it is repaired without causing subsequent tests to fail. I think 
this is due to a few things:

1) The disable table handler uses in-memory assignment manager state while 
delete uses in META assignment information.
2) Currently I'm using the sneaky closeRegion that purposely doesn't go through 
the master and in turn doesn't modify in-memory state – disable uses out of 
date in-memory region assignments. If I use the unassign method sends RIT 
transitions to the master, but which ends up attempting to assign it again, 
causing timing/transient states.

What is a good way to clear the HMaster's assignment manager's assignment data 
for particular regions or to force it to re-read from META? (without modifying 
the 0.90 HBase's it is meant to repair).

Problem 2:

Sometimes test fail reporting HOLE_IN_REGION_CHAIN and 
SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused 
with each other and basically something is still happening asynchronously. I 
think this is the new region is being assigned and is still transitioning. 
Sound about right? To make the unit test deterministic, should hbck wait for 
these to settle or should just the unit test wait?


This addresses bug HBASE-5128.
https://issues.apache.org/jira/browse/HBASE-5128


Diffs
-

  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
  src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 
29e8bb2 
  
src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
  src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
  
src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 
3e8729d 
  
src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 
11a1151 
  
src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java
 4a09ce2 

Diff: https://reviews.apache.org/r/3435/diff


Testing
---

All unit tests pass sometimes.  Some fail sometimes (generally the cases that 
fabricate new regions).  

Not ready for commit.


Thanks,

jmhsieh



> [uber hbck] Enable hbck to automatically repair table integrity problems as 
> well as region consistency problems while online.
> -
>
> Key: HBASE-5128
> URL: https://issues.apache.org/jira/browse/HBASE-5128
> Project: HBase
>  Issue Type: New Feature
>  Components: hbck
>Affects Versions: 0.92.0, 0.90.5
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region 
> consistency and table integrity invariant violations.  However with '-fix' it 
> can only automatically repair region consistency cases having to do with 
> deployment problems.  This updated version should be able to handle all cases 
> (including a new orphan regiondir case).  When complete will likely deprecate 
> the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the 
> file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency 
> and
>  * table integrity.  
>  * 

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

2012-01-11 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184033#comment-13184033
 ] 

Hadoop QA commented on HBASE-5153:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510179/HBASE-5153-V3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/727//console

This message is automatically generated.

> HConnection re-creation in HTable after HConnection abort
> -
>
> Key: HBASE-5153
> URL: https://issues.apache.org/jira/browse/HBASE-5153
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.90.6
>
> Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
> HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
> share a same connection, once this connection got abort in one thread, the 
> other threads will got a 
> "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal 
> HTable instance cann't be continue to use. The connection in HTable should be 
> recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable 
> instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184092#comment-13184092
 ] 

Zhihong Yu commented on HBASE-5153:
---

@Jieshan:
Can you prepare a patch for trunk ?

> HConnection re-creation in HTable after HConnection abort
> -
>
> Key: HBASE-5153
> URL: https://issues.apache.org/jira/browse/HBASE-5153
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.90.6
>
> Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
> HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
> share a same connection, once this connection got abort in one thread, the 
> other threads will got a 
> "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal 
> HTable instance cann't be continue to use. The connection in HTable should be 
> recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable 
> instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5163) TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.")

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5163:
--

Summary: TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on 
Jenkins or hadoop QA ("The directory is already locked.")  (was: 
TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on central build or 
hadoop QA on trunk ("The directory is already locked."))

> TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or 
> hadoop QA ("The directory is already locked.")
> --
>
> Key: HBASE-5163
> URL: https://issues.apache.org/jira/browse/HBASE-5163
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.94.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Attachments: 5163.patch
>
>
> The stack is typically:
> {noformat}
>  type="java.io.IOException">java.io.IOException: Cannot lock storage 
> /tmp/19e3e634-8980-4923-9e72-a5b900a71d63/dfscluster_32a46f7b-24ef-488f-bd33-915959e001f4/dfs/data/data3.
>  The directory is already locked.
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:290)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1467)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:460)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:470)
> // ...
> {noformat}
> It can be reproduced without parallelization or without executing the other 
> tests in the class. It seems to fail about 5% of the time.
> This comes from the naming policy for the directories in 
> MiniDFSCluster#startDataNode. It depends on the number of nodes *currently* 
> in the cluster, and does not take into account previous starts/stops:
> {noformat}
>for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) {
>   if (manageDfsDirs) {
> File dir1 = new File(data_dir, "data"+(2*i+1));
> File dir2 = new File(data_dir, "data"+(2*i+2));
> dir1.mkdirs();
> dir2.mkdirs();
>   // [...]
> {noformat}
> This means that it if we want to stop/start a datanode, we should always stop 
> the last one, if not the names will conflict. This test exhibits the behavior:
> {noformat}
>   @Test
>   public void testMiniDFSCluster_startDataNode() throws Exception {
> assertTrue( dfsCluster.getDataNodes().size() == 2 );
> // Works, as we kill the last datanode, we can now start a datanode
> dfsCluster.stopDataNode(1);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
> // Fails, as it's not the last datanode, the directory will conflict on
> //  creation
> dfsCluster.stopDataNode(0);
> try {
>   dfsCluster
> .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
>   fail("There should be an exception because the directory already 
> exists");
> } catch (IOException e) {
>   assertTrue( e.getMessage().contains("The directory is already 
> locked."));
>   LOG.info("Expected (!) exception caught " + e.getMessage());
> }
> // Works, as we kill the last datanode, we can now restart 2 datanodes
> // This makes us back with 2 nodes
> dfsCluster.stopDataNode(0);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 2, true, null, null);
>   }
> {noformat}
> And then this behavior is randomly triggered in testLogRollOnDatanodeDeath 
> because when we do
> {noformat}
> DatanodeInfo[] pipeline = getPipeline(log);
> assertTrue(pipeline.length == fs.getDefaultReplication());
> {noformat}
> and then kill the datanodes in the pipeline, we will have:
>  - most of the time: pipeline = 1 & 2, so after killing 1&2 we can start a 
> new datanode that will reuse the available 2's directory.
>  - sometimes: pipeline = 1 & 3. In this case,when we try to launch the new 
> datanode, it fails because it want

[jira] [Commented] (HBASE-5163) TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.")

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184113#comment-13184113
 ] 

Zhihong Yu commented on HBASE-5163:
---

Integrated to TRUNK.

Thanks for the patch, N.

Thanks for the review, Stack.

> TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or 
> hadoop QA ("The directory is already locked.")
> --
>
> Key: HBASE-5163
> URL: https://issues.apache.org/jira/browse/HBASE-5163
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.94.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Attachments: 5163.patch
>
>
> The stack is typically:
> {noformat}
>  type="java.io.IOException">java.io.IOException: Cannot lock storage 
> /tmp/19e3e634-8980-4923-9e72-a5b900a71d63/dfscluster_32a46f7b-24ef-488f-bd33-915959e001f4/dfs/data/data3.
>  The directory is already locked.
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:290)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1467)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:460)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:470)
> // ...
> {noformat}
> It can be reproduced without parallelization or without executing the other 
> tests in the class. It seems to fail about 5% of the time.
> This comes from the naming policy for the directories in 
> MiniDFSCluster#startDataNode. It depends on the number of nodes *currently* 
> in the cluster, and does not take into account previous starts/stops:
> {noformat}
>for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) {
>   if (manageDfsDirs) {
> File dir1 = new File(data_dir, "data"+(2*i+1));
> File dir2 = new File(data_dir, "data"+(2*i+2));
> dir1.mkdirs();
> dir2.mkdirs();
>   // [...]
> {noformat}
> This means that it if we want to stop/start a datanode, we should always stop 
> the last one, if not the names will conflict. This test exhibits the behavior:
> {noformat}
>   @Test
>   public void testMiniDFSCluster_startDataNode() throws Exception {
> assertTrue( dfsCluster.getDataNodes().size() == 2 );
> // Works, as we kill the last datanode, we can now start a datanode
> dfsCluster.stopDataNode(1);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
> // Fails, as it's not the last datanode, the directory will conflict on
> //  creation
> dfsCluster.stopDataNode(0);
> try {
>   dfsCluster
> .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
>   fail("There should be an exception because the directory already 
> exists");
> } catch (IOException e) {
>   assertTrue( e.getMessage().contains("The directory is already 
> locked."));
>   LOG.info("Expected (!) exception caught " + e.getMessage());
> }
> // Works, as we kill the last datanode, we can now restart 2 datanodes
> // This makes us back with 2 nodes
> dfsCluster.stopDataNode(0);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 2, true, null, null);
>   }
> {noformat}
> And then this behavior is randomly triggered in testLogRollOnDatanodeDeath 
> because when we do
> {noformat}
> DatanodeInfo[] pipeline = getPipeline(log);
> assertTrue(pipeline.length == fs.getDefaultReplication());
> {noformat}
> and then kill the datanodes in the pipeline, we will have:
>  - most of the time: pipeline = 1 & 2, so after killing 1&2 we can start a 
> new datanode that will reuse the available 2's directory.
>  - sometimes: pipeline = 1 & 3. In this case,when we try to launch the new 
> datanode, it fails because it wants to use the same directory as the still 
> alive '2'.
> There are two ways of fixing the test:
> 1) Fix the naming rule in MiniDF

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184120#comment-13184120
 ] 

Zhihong Yu commented on HBASE-5179:
---

{code}
+  private final Set processingDeadServers = new 
HashSet();
{code}
The field name above sounds like method name. How about naming it 
deadServersUnderProcessing ? Related method names should be changed as well.

{code}
+   * Called on startup. Figures whether a fresh cluster start of we are joining
{code}
should read 'start or we are'.

For ServerManager.java and DeadServer.java:
{code}
+  public Set getProcessingDeadServers() {
+return this.deadservers.cloneProcessingDeadServers();
+  }
{code}
The method should be called cloneDeadServersUnderProcessing().

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5179:
--

Status: Patch Available  (was: Open)

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5163) TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.")

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184147#comment-13184147
 ] 

Hudson commented on HBASE-5163:
---

Integrated in HBase-TRUNK #2618 (See 
[https://builds.apache.org/job/HBase-TRUNK/2618/])
HBASE-5163 TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on 
Jenkins or hadoop QA ("The directory is already locked.") (N Keywal)

tedyu : 
Files : 
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java


> TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or 
> hadoop QA ("The directory is already locked.")
> --
>
> Key: HBASE-5163
> URL: https://issues.apache.org/jira/browse/HBASE-5163
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.94.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Attachments: 5163.patch
>
>
> The stack is typically:
> {noformat}
>  type="java.io.IOException">java.io.IOException: Cannot lock storage 
> /tmp/19e3e634-8980-4923-9e72-a5b900a71d63/dfscluster_32a46f7b-24ef-488f-bd33-915959e001f4/dfs/data/data3.
>  The directory is already locked.
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:290)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1467)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:460)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:470)
> // ...
> {noformat}
> It can be reproduced without parallelization or without executing the other 
> tests in the class. It seems to fail about 5% of the time.
> This comes from the naming policy for the directories in 
> MiniDFSCluster#startDataNode. It depends on the number of nodes *currently* 
> in the cluster, and does not take into account previous starts/stops:
> {noformat}
>for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) {
>   if (manageDfsDirs) {
> File dir1 = new File(data_dir, "data"+(2*i+1));
> File dir2 = new File(data_dir, "data"+(2*i+2));
> dir1.mkdirs();
> dir2.mkdirs();
>   // [...]
> {noformat}
> This means that it if we want to stop/start a datanode, we should always stop 
> the last one, if not the names will conflict. This test exhibits the behavior:
> {noformat}
>   @Test
>   public void testMiniDFSCluster_startDataNode() throws Exception {
> assertTrue( dfsCluster.getDataNodes().size() == 2 );
> // Works, as we kill the last datanode, we can now start a datanode
> dfsCluster.stopDataNode(1);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
> // Fails, as it's not the last datanode, the directory will conflict on
> //  creation
> dfsCluster.stopDataNode(0);
> try {
>   dfsCluster
> .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
>   fail("There should be an exception because the directory already 
> exists");
> } catch (IOException e) {
>   assertTrue( e.getMessage().contains("The directory is already 
> locked."));
>   LOG.info("Expected (!) exception caught " + e.getMessage());
> }
> // Works, as we kill the last datanode, we can now restart 2 datanodes
> // This makes us back with 2 nodes
> dfsCluster.stopDataNode(0);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 2, true, null, null);
>   }
> {noformat}
> And then this behavior is randomly triggered in testLogRollOnDatanodeDeath 
> because when we do
> {noformat}
> DatanodeInfo[] pipeline = getPipeline(log);
> assertTrue(pipeline.length == fs.getDefaultReplication());
> {noformat}
> and then kill the datanodes in the pipeline, we will have:
>  - most of the time: pipeline = 1 & 2, so after killing 1&2 we can start a 
> new datanode that will reuse the available

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184155#comment-13184155
 ] 

Hadoop QA commented on HBASE-5179:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510164/hbase-5179.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -147 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 78 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/728//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/728//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/728//console

This message is automatically generated.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-11 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184164#comment-13184164
 ] 

ramkrishna.s.vasudevan commented on HBASE-5155:
---

I could not upload the patch today as still some test case is failing.  Will 
upload it tomorrow.

> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> ---
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>   LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
>   fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
>   LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>   MetaEditor.addDaughter(catalogTracker, daughter, null);
>   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>   // there then something wonky about the split -- things will keep going
>   // but could be missing references to parent region.
>   // And assign it.
>   assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
>   this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184170#comment-13184170
 ] 

ramkrishna.s.vasudevan commented on HBASE-5179:
---

@Chunhui
Is this issue applicable for 0.90.6? If so can you prepare a patch for 0.90 
also?

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5115) Change HBase "color" from purple to "International Orange (Engineering)"

2012-01-11 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5115:
-

Attachment: 01_orange.svg
01_orange.png

> Change HBase "color" from purple to "International Orange (Engineering)"
> 
>
> Key: HBASE-5115
> URL: https://issues.apache.org/jira/browse/HBASE-5115
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Attachments: 01_orange.png, 01_orange.svg
>
>
> See http://en.wikipedia.org/wiki/International_orange  See the bit about the 
> color of the golden gate bridge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5115) Change HBase "color" from purple to "International Orange (Engineering)"

2012-01-11 Thread stack (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reassigned HBASE-5115:


Assignee: stack

> Change HBase "color" from purple to "International Orange (Engineering)"
> 
>
> Key: HBASE-5115
> URL: https://issues.apache.org/jira/browse/HBASE-5115
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Attachments: 01_orange.png, 01_orange.svg
>
>
> See http://en.wikipedia.org/wiki/International_orange  See the bit about the 
> color of the golden gate bridge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5115) Change HBase "color" from purple to "International Orange (Engineering)"

2012-01-11 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184171#comment-13184171
 ] 

stack commented on HBASE-5115:
--

Here is logo done in IA(Engineering).

> Change HBase "color" from purple to "International Orange (Engineering)"
> 
>
> Key: HBASE-5115
> URL: https://issues.apache.org/jira/browse/HBASE-5115
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Attachments: 01_orange.png, 01_orange.svg, H_orange.png, H_orange.svg
>
>
> See http://en.wikipedia.org/wiki/International_orange  See the bit about the 
> color of the golden gate bridge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5115) Change HBase "color" from purple to "International Orange (Engineering)"

2012-01-11 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5115:
-

Attachment: H_orange.svg
H_orange.png

> Change HBase "color" from purple to "International Orange (Engineering)"
> 
>
> Key: HBASE-5115
> URL: https://issues.apache.org/jira/browse/HBASE-5115
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Attachments: 01_orange.png, 01_orange.svg, H_orange.png, H_orange.svg
>
>
> See http://en.wikipedia.org/wiki/International_orange  See the bit about the 
> color of the golden gate bridge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184181#comment-13184181
 ] 

ramkrishna.s.vasudevan commented on HBASE-5179:
---

@Chunhui
Can you take a look at HBAE-4748.  It is similar to this but there the data 
loss was w.r.t META leading to more critical data loss.  But it is quite rare 
but still possible.  Do you have any suggestions for that?

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3565) Add a metric to keep track of slow HLog appends

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-3565:
--

Status: Patch Available  (was: Open)

> Add a metric to keep track of slow HLog appends
> ---
>
> Key: HBASE-3565
> URL: https://issues.apache.org/jira/browse/HBASE-3565
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Reporter: Benoit Sigoure
>Assignee: Mubarak Seyed
>  Labels: monitoring
> Fix For: 0.94.0
>
> Attachments: HBASE-3565.trunk.v1.patch
>
>
> Whenever an edit takes too long to be written to an HLog, HBase logs a 
> warning such as this one:
> {code}
> 2011-02-23 20:03:14,703 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
> IPC Server handler 21 on 60020 took 15065ms appending an edit to hlog; 
> editcount=126050
> {code}
> I would like to have a counter incremented each time this happens and this 
> counter exposed via the metrics stuff in HBase so I can collect it in my 
> monitoring system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5179:
--

Attachment: 5179-v2.txt

Chunhui's patch for TRUNK with minor renaming.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-11 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184203#comment-13184203
 ] 

ramkrishna.s.vasudevan commented on HBASE-5120:
---

Latest patch available.. 

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
> Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
> HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiti

[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184205#comment-13184205
 ] 

Zhihong Yu commented on HBASE-5120:
---

Can you change LOG.debug() to LOG.error() in deleteClosingOrClosedNode() ?

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
> Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
> HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.handler.

[jira] [Issue Comment Edited] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-11 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184205#comment-13184205
 ] 

Zhihong Yu edited comment on HBASE-5120 at 1/11/12 5:17 PM:


Can you change LOG.debug() to LOG.error() in deleteClosingOrClosedNode() ?
{code}
+LOG.debug("The deletion of the CLOSED node for the region "
++ region.getEncodedName() + " returned " + deleteNode);
{code}

  was (Author: zhi...@ebaysf.com):
Can you change LOG.debug() to LOG.error() in deleteClosingOrClosedNode() ?
  
> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
> Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
> HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

2012-01-11 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184211#comment-13184211
 ] 

stack commented on HBASE-5153:
--

Patch looks good.  I like your addition of a specific Exception for closed 
state.

Does this have to be public Jieshan?

{code}
getRegionServerWithRetries
{code}

Same for processBatch and getRegionLocation.

If public should be in HTableInterface but they seem implementation methods 
rather than something that should be part of public interface.

A style nit -- i.e. not important but if you are going to redo the patch you 
miight want to address it -- is that you do this in 
handleConnectionClosedException

{code}
+if (ioe instanceof ConnectionClosedException) {
{code}

and the whole method is dealing with the case where above is true.  I'd suggest 
that you might do:

{code}
if (!(ioe instanceof ConnectionClosedException)) return;
{code}

... then you save a whole indent and its clear that the method is all about 
dealing with ConnectionClosedException.

Is it right including this in HTable?

{code}
getPauseTime
{code}

In trunk that is in a new ConnectionUtils class.  Maybe you have to do it for 
0.90?

I'm wondering if the class ConnectionClosedException needs to be public also?  
Its only used in this package, right?




> HConnection re-creation in HTable after HConnection abort
> -
>
> Key: HBASE-5153
> URL: https://issues.apache.org/jira/browse/HBASE-5153
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.90.6
>
> Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
> HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
> share a same connection, once this connection got abort in one thread, the 
> other threads will got a 
> "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal 
> HTable instance cann't be continue to use. The connection in HTable should be 
> recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable 
> instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3565) Add a metric to keep track of slow HLog appends

2012-01-11 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184220#comment-13184220
 ] 

Hadoop QA commented on HBASE-3565:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12510132/HBASE-3565.trunk.v1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -147 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 78 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplicationPeer
  org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/729//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/729//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/729//console

This message is automatically generated.

> Add a metric to keep track of slow HLog appends
> ---
>
> Key: HBASE-3565
> URL: https://issues.apache.org/jira/browse/HBASE-3565
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Reporter: Benoit Sigoure
>Assignee: Mubarak Seyed
>  Labels: monitoring
> Fix For: 0.94.0
>
> Attachments: HBASE-3565.trunk.v1.patch
>
>
> Whenever an edit takes too long to be written to an HLog, HBase logs a 
> warning such as this one:
> {code}
> 2011-02-23 20:03:14,703 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
> IPC Server handler 21 on 60020 took 15065ms appending an edit to hlog; 
> editcount=126050
> {code}
> I would like to have a counter incremented each time this happens and this 
> counter exposed via the metrics stuff in HBase so I can collect it in my 
> monitoring system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5150) Fail in a thread may not fail a test, clean up log splitting test

2012-01-11 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184221#comment-13184221
 ] 

Jimmy Xiang commented on HBASE-5150:


Those failed tests passed on my local box.

> Fail in a thread may not fail a test, clean up log splitting test
> -
>
> Key: HBASE-5150
> URL: https://issues.apache.org/jira/browse/HBASE-5150
> Project: HBase
>  Issue Type: Test
>Affects Versions: 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Attachments: hbase-5150.txt, hbase_5150_v3.patch
>
>
> This is to clean up some tests for HBASE-5081.  The Assert.fail method in a 
> separate thread will terminate the thread, but may not fail the test.
> We can use callable, so that we can get the error in getting the result. 
> Some documentation to explain the test will be helpful too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5120:
--

Status: Patch Available  (was: Open)

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
> Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
> HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHa

[jira] [Assigned] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-11 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5120:
-

Assignee: ramkrishna.s.vasudevan

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
> Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
> HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTable

[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5120:
--

Attachment: HBASE-5120_5.patch

Changed debug to error.

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
> Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
> HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting 

[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5120:
--

Status: Open  (was: Patch Available)

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
> Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
> HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions 

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184227#comment-13184227
 ] 

ramkrishna.s.vasudevan commented on HBASE-5179:
---

Patch looks good to me.. Tomorrow will try out in the cluster.


> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5150) Fail in a thread may not fail a test, clean up log splitting test

2012-01-11 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184234#comment-13184234
 ] 

Jimmy Xiang commented on HBASE-5150:


@Prakash and Ted, are you ok with this patch? I changed the 3sec wait time to 
2sec.

> Fail in a thread may not fail a test, clean up log splitting test
> -
>
> Key: HBASE-5150
> URL: https://issues.apache.org/jira/browse/HBASE-5150
> Project: HBase
>  Issue Type: Test
>Affects Versions: 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Attachments: hbase-5150.txt, hbase_5150_v3.patch
>
>
> This is to clean up some tests for HBASE-5081.  The Assert.fail method in a 
> separate thread will terminate the thread, but may not fail the test.
> We can use callable, so that we can get the error in getting the result. 
> Some documentation to explain the test will be helpful too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5179:
--

Attachment: 5179-90.txt

Chunhui's patch rebased for 0.90

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184239#comment-13184239
 ] 

Hadoop QA commented on HBASE-5179:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510206/5179-v2.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -147 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 78 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestSplitLogManager
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/730//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/730//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/730//console

This message is automatically generated.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184240#comment-13184240
 ] 

Hadoop QA commented on HBASE-5179:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510215/5179-90.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/732//console

This message is automatically generated.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5179:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510215/5179-90.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/732//console

This message is automatically generated.)

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184244#comment-13184244
 ] 

Zhihong Yu commented on HBASE-5179:
---

I ran the following on MacBook and they passed:
{code}
 1143  mt -Dtest=TestSplitLogManager
 1145  mt -Dtest=TestAdmin#testShouldCloseTheRegionBasedOnTheEncodedRegionName
{code}

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184251#comment-13184251
 ] 

Zhihong Yu commented on HBASE-5120:
---

+1 on patch v5.

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
> Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
> HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.h

[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-11 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184258#comment-13184258
 ] 

Hadoop QA commented on HBASE-5120:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510211/HBASE-5120_5.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -147 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 79 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/731//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/731//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/731//console

This message is automatically generated.

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
> Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
> HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> 

[jira] [Commented] (HBASE-5139) Compute (weighted) median using AggregateProtocol

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184264#comment-13184264
 ] 

Zhihong Yu commented on HBASE-5139:
---

I am going to integrate patch v2 if there is no objection.

> Compute (weighted) median using AggregateProtocol
> -
>
> Key: HBASE-5139
> URL: https://issues.apache.org/jira/browse/HBASE-5139
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zhihong Yu
>Assignee: Zhihong Yu
> Attachments: 5139-v2.txt
>
>
> Suppose cf:cq1 stores numeric values and optionally cf:cq2 stores weights. 
> This task finds out the median value among the values of cf:cq1 (See 
> http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/R.basic/html/weighted.median.html)
> This can be done in two passes.
> The first pass utilizes AggregateProtocol where the following tuple is 
> returned from each region:
> (partial-sum-of-values, partial-sum-of-weights)
> The start rowkey (supplied by coprocessor framework) would be used to sort 
> the tuples. This way we can determine which region (called R) contains the 
> (weighted) median. partial-sum-of-weights can be 0 if unweighted median is 
> sought
> The second pass involves scanning the table, beginning with startrow of 
> region R and computing partial (weighted) sum until the threshold of S/2 is 
> crossed. The (weighted) median is returned.
> However, this approach wouldn't work if there is mutation in the underlying 
> table between pass one and pass two.
> In that case, sequential scanning seems to be the solution which is slower 
> than the above approach.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4224) Need a flush by regionserver rather than by table option

2012-01-11 Thread Harsh J (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184281#comment-13184281
 ] 

Harsh J commented on HBASE-4224:


[Dropping by from the dev lists…, have not followed otherwise]

I'd certainly like reading flushAllRegions() over flushRegions(null). Can we 
not also have it as a utility function in HRServer instead if HRI/f, if the 
interface changing is much to be worried about?

> Need a flush by regionserver rather than by table option
> 
>
> Key: HBASE-4224
> URL: https://issues.apache.org/jira/browse/HBASE-4224
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: stack
>Assignee: Akash Ashok
> Attachments: HBase-4224-v2.patch, HBase-4224.patch
>
>
> This evening needed to clean out logs on the cluster.  logs are by 
> regionserver.  to let go of logs, we need to have all edits emptied from 
> memory.  only flush is by table or region.  We need to be able to flush the 
> regionserver.  Need to add this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4440) add an option to presplit table to PerformanceEvaluation

2012-01-11 Thread Sujee Maniyam (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184284#comment-13184284
 ] 

Sujee Maniyam commented on HBASE-4440:
--

so you are proposing that  

1) whether we use presplit option or not, table has to be recreated for all 
write-mode tests.  

This changes the behavior for all write-tests.   Currently table is only 
created if it doesn't exist.

2) or pre-split should try to split the table without re-creating it.




> add an option to presplit table to PerformanceEvaluation
> 
>
> Key: HBASE-4440
> URL: https://issues.apache.org/jira/browse/HBASE-4440
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Sujee Maniyam
>Assignee: Sujee Maniyam
>Priority: Minor
>  Labels: benchmark
> Fix For: 0.94.0
>
> Attachments: PerformanceEvaluation.java, 
> PerformanceEvaluation_HBASE_4440.patch, 
> PerformanceEvaluation_HBASE_4440_2.patch
>
>
> PerformanceEvaluation a quick way to 'benchmark' a HBase cluster.  The 
> current 'write*' operations do not pre-split the table.  Pre splitting the 
> table will really boost the insert performance.
> It would be nice to have an option to enable pre-splitting table before the 
> inserts begin.
> it would look something like:
> (a) hbase ...PerformanceEvaluation   --presplit=10 
> (b) hbase ...PerformanceEvaluation   --presplit 
> (b) will try to presplit the table on some default value (say number of 
> region servers)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184287#comment-13184287
 ] 

stack commented on HBASE-5179:
--

Its hard to do a test for this?

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184286#comment-13184286
 ] 

stack commented on HBASE-5179:
--

I agree with the spirit of this class.  Good stuff Chunhui.

This is awkward name for a method, getDeadServersUnderProcessing.  Should it be 
getDeadServers?  Does it need to be a public method?  Seems fine that it be 
package private.

Is serversWithoutSplitLog a good name for a local variable?  Should it be 
deadServers with a comment saying that deadServers are processed by 
servershutdownhandler and it will be taking care of the log splitting?

Is this right -- for trunk?

{code}
-  } else if 
(!serverManager.isServerOnline(regionLocation.getServerName())) {
+  } else if (!onlineServers.contains(regionLocation.getHostname())) {

Online servers is keyed by a ServerName, not a hostname.

What is a deadServersUnderProcessing?  Does DeadServers keep list of all 
servers that ever died?  Is that a good idea?  Shouldn't finish remove item 
from deadservers rather than just from deadServersUnderProcessing

Change  name of this method, cloneProcessingDeadServers.  Just call it 
getDeadServers?  That its a clone is an internal implementation detail?






> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3565) Add metrics to keep track of slow HLog appends

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184288#comment-13184288
 ] 

Zhihong Yu commented on HBASE-3565:
---

Integrated to TRUNK.

Thanks for the patch Mubarak.

Thanks for the review, Stack.

> Add metrics to keep track of slow HLog appends
> --
>
> Key: HBASE-3565
> URL: https://issues.apache.org/jira/browse/HBASE-3565
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Reporter: Benoit Sigoure
>Assignee: Mubarak Seyed
>  Labels: monitoring
> Fix For: 0.94.0
>
> Attachments: HBASE-3565.trunk.v1.patch
>
>
> Whenever an edit takes too long to be written to an HLog, HBase logs a 
> warning such as this one:
> {code}
> 2011-02-23 20:03:14,703 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
> IPC Server handler 21 on 60020 took 15065ms appending an edit to hlog; 
> editcount=126050
> {code}
> I would like to have a counter incremented each time this happens and this 
> counter exposed via the metrics stuff in HBase so I can collect it in my 
> monitoring system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3565) Add metrics to keep track of slow HLog appends

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-3565:
--

Summary: Add metrics to keep track of slow HLog appends  (was: Add a metric 
to keep track of slow HLog appends)

> Add metrics to keep track of slow HLog appends
> --
>
> Key: HBASE-3565
> URL: https://issues.apache.org/jira/browse/HBASE-3565
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Reporter: Benoit Sigoure
>Assignee: Mubarak Seyed
>  Labels: monitoring
> Fix For: 0.94.0
>
> Attachments: HBASE-3565.trunk.v1.patch
>
>
> Whenever an edit takes too long to be written to an HLog, HBase logs a 
> warning such as this one:
> {code}
> 2011-02-23 20:03:14,703 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
> IPC Server handler 21 on 60020 took 15065ms appending an edit to hlog; 
> editcount=126050
> {code}
> I would like to have a counter incremented each time this happens and this 
> counter exposed via the metrics stuff in HBase so I can collect it in my 
> monitoring system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184286#comment-13184286
 ] 

Zhihong Yu edited comment on HBASE-5179 at 1/11/12 7:07 PM:


I agree with the spirit of this class.  Good stuff Chunhui.

This is awkward name for a method, getDeadServersUnderProcessing.  Should it be 
getDeadServers?  Does it need to be a public method?  Seems fine that it be 
package private.

Is serversWithoutSplitLog a good name for a local variable?  Should it be 
deadServers with a comment saying that deadServers are processed by 
servershutdownhandler and it will be taking care of the log splitting?

Is this right -- for trunk?

{code}
-  } else if 
(!serverManager.isServerOnline(regionLocation.getServerName())) {
+  } else if (!onlineServers.contains(regionLocation.getHostname())) {
{code}
Online servers is keyed by a ServerName, not a hostname.

What is a deadServersUnderProcessing?  Does DeadServers keep list of all 
servers that ever died?  Is that a good idea?  Shouldn't finish remove item 
from deadservers rather than just from deadServersUnderProcessing

Change  name of this method, cloneProcessingDeadServers.  Just call it 
getDeadServers?  That its a clone is an internal implementation detail?






  was (Author: stack):
I agree with the spirit of this class.  Good stuff Chunhui.

This is awkward name for a method, getDeadServersUnderProcessing.  Should it be 
getDeadServers?  Does it need to be a public method?  Seems fine that it be 
package private.

Is serversWithoutSplitLog a good name for a local variable?  Should it be 
deadServers with a comment saying that deadServers are processed by 
servershutdownhandler and it will be taking care of the log splitting?

Is this right -- for trunk?

{code}
-  } else if 
(!serverManager.isServerOnline(regionLocation.getServerName())) {
+  } else if (!onlineServers.contains(regionLocation.getHostname())) {

Online servers is keyed by a ServerName, not a hostname.

What is a deadServersUnderProcessing?  Does DeadServers keep list of all 
servers that ever died?  Is that a good idea?  Shouldn't finish remove item 
from deadservers rather than just from deadServersUnderProcessing

Change  name of this method, cloneProcessingDeadServers.  Just call it 
getDeadServers?  That its a clone is an internal implementation detail?





  
> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4440) add an option to presplit table to PerformanceEvaluation

2012-01-11 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184294#comment-13184294
 ] 

Jean-Daniel Cryans commented on HBASE-4440:
---

bq. whether we use presplit option or not, table has to be recreated for all 
write-mode tests.

No, it shouldn't be different from the default behavior of not recreating the 
table.

bq. or pre-split should try to split the table without re-creating it.

It should not.

Code speaks more than words, here's what I'm using for testing 0.92:

{code}
  private boolean checkTable(HBaseAdmin admin) throws IOException {
HTableDescriptor tableDescriptor = getTableDescriptor();
boolean tableExists = admin.tableExists(tableDescriptor.getName());
if (!tableExists) {
  if (this.presplitRegions > 0) {
byte[][] splits = getSplits();
for (int i=0; i < splits.length; i++) {
  LOG.debug(" split " + i + ": " + Bytes.toStringBinary(splits[i]));
}
admin.createTable(tableDescriptor, splits);
LOG.info ("Table created with " + this.presplitRegions + " splits");
  }
  else {
admin.createTable(tableDescriptor);
LOG.info("Table " + tableDescriptor + " created");
  }
}
return !tableExists;
  }
{code}

> add an option to presplit table to PerformanceEvaluation
> 
>
> Key: HBASE-4440
> URL: https://issues.apache.org/jira/browse/HBASE-4440
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Sujee Maniyam
>Assignee: Sujee Maniyam
>Priority: Minor
>  Labels: benchmark
> Fix For: 0.94.0
>
> Attachments: PerformanceEvaluation.java, 
> PerformanceEvaluation_HBASE_4440.patch, 
> PerformanceEvaluation_HBASE_4440_2.patch
>
>
> PerformanceEvaluation a quick way to 'benchmark' a HBase cluster.  The 
> current 'write*' operations do not pre-split the table.  Pre splitting the 
> table will really boost the insert performance.
> It would be nice to have an option to enable pre-splitting table before the 
> inserts begin.
> it would look something like:
> (a) hbase ...PerformanceEvaluation   --presplit=10 
> (b) hbase ...PerformanceEvaluation   --presplit 
> (b) will try to presplit the table on some default value (say number of 
> region servers)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184296#comment-13184296
 ] 

Zhihong Yu commented on HBASE-5179:
---

@Stack:
The following code is for 0.90 branch:
{code}
-  } else if 
(!serverManager.isServerOnline(regionLocation.getServerName())) {
+  } else if (!onlineServers.contains(regionLocation.getHostname())) {
{code}

I agree that serversWithoutSplitLog isn't a very good name. It holds both 
online servers and dead servers. How about naming it knownServers ?

ServerManager.java already has:
{code}
  public Set getDeadServers() {
return this.deadservers.clone();
  }
{code}

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184303#comment-13184303
 ] 

Zhihong Yu commented on HBASE-5179:
---

TestRollingRestart fails in 0.90 with patch.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4440) add an option to presplit table to PerformanceEvaluation

2012-01-11 Thread Sujee Maniyam (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184309#comment-13184309
 ] 

Sujee Maniyam commented on HBASE-4440:
--

I see.  looks good.
If the table exists, and presplit option is supplied, it will have no effect.  
It might mislead the user in believing the pre-split option took effect, while 
in fact it didn't.
may be a WARN would suffice to notify the user?

> add an option to presplit table to PerformanceEvaluation
> 
>
> Key: HBASE-4440
> URL: https://issues.apache.org/jira/browse/HBASE-4440
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Sujee Maniyam
>Assignee: Sujee Maniyam
>Priority: Minor
>  Labels: benchmark
> Fix For: 0.94.0
>
> Attachments: PerformanceEvaluation.java, 
> PerformanceEvaluation_HBASE_4440.patch, 
> PerformanceEvaluation_HBASE_4440_2.patch
>
>
> PerformanceEvaluation a quick way to 'benchmark' a HBase cluster.  The 
> current 'write*' operations do not pre-split the table.  Pre splitting the 
> table will really boost the insert performance.
> It would be nice to have an option to enable pre-splitting table before the 
> inserts begin.
> it would look something like:
> (a) hbase ...PerformanceEvaluation   --presplit=10 
> (b) hbase ...PerformanceEvaluation   --presplit 
> (b) will try to presplit the table on some default value (say number of 
> region servers)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4440) add an option to presplit table to PerformanceEvaluation

2012-01-11 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184311#comment-13184311
 ] 

Jean-Daniel Cryans commented on HBASE-4440:
---

We could show a WARN, but I don't think we would need more than that. In fact, 
we could always show a message when the table exists saying something like: 
"Using the existing ${tablename} which has ${X} regions". 

About the pre-splitting itself, it seems that it creates N+1 regions and the 
first one has the end key 00 so it never gets data. Not a biggie, but 
could be fixed in another jira.

> add an option to presplit table to PerformanceEvaluation
> 
>
> Key: HBASE-4440
> URL: https://issues.apache.org/jira/browse/HBASE-4440
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Sujee Maniyam
>Assignee: Sujee Maniyam
>Priority: Minor
>  Labels: benchmark
> Fix For: 0.94.0
>
> Attachments: PerformanceEvaluation.java, 
> PerformanceEvaluation_HBASE_4440.patch, 
> PerformanceEvaluation_HBASE_4440_2.patch
>
>
> PerformanceEvaluation a quick way to 'benchmark' a HBase cluster.  The 
> current 'write*' operations do not pre-split the table.  Pre splitting the 
> table will really boost the insert performance.
> It would be nice to have an option to enable pre-splitting table before the 
> inserts begin.
> it would look something like:
> (a) hbase ...PerformanceEvaluation   --presplit=10 
> (b) hbase ...PerformanceEvaluation   --presplit 
> (b) will try to presplit the table on some default value (say number of 
> region servers)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5139) Compute (weighted) median using AggregateProtocol

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184320#comment-13184320
 ] 

Zhihong Yu commented on HBASE-5139:
---

Integrated to TRUNK.

> Compute (weighted) median using AggregateProtocol
> -
>
> Key: HBASE-5139
> URL: https://issues.apache.org/jira/browse/HBASE-5139
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zhihong Yu
>Assignee: Zhihong Yu
> Attachments: 5139-v2.txt
>
>
> Suppose cf:cq1 stores numeric values and optionally cf:cq2 stores weights. 
> This task finds out the median value among the values of cf:cq1 (See 
> http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/R.basic/html/weighted.median.html)
> This can be done in two passes.
> The first pass utilizes AggregateProtocol where the following tuple is 
> returned from each region:
> (partial-sum-of-values, partial-sum-of-weights)
> The start rowkey (supplied by coprocessor framework) would be used to sort 
> the tuples. This way we can determine which region (called R) contains the 
> (weighted) median. partial-sum-of-weights can be 0 if unweighted median is 
> sought
> The second pass involves scanning the table, beginning with startrow of 
> region R and computing partial (weighted) sum until the threshold of S/2 is 
> crossed. The (weighted) median is returned.
> However, this approach wouldn't work if there is mutation in the underlying 
> table between pass one and pass two.
> In that case, sequential scanning seems to be the solution which is slower 
> than the above approach.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3565) Add metrics to keep track of slow HLog appends

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184325#comment-13184325
 ] 

Hudson commented on HBASE-3565:
---

Integrated in HBase-TRUNK #2619 (See 
[https://builds.apache.org/job/HBase-TRUNK/2619/])
HBASE-3565 Add metrics to keep track of slow HLog appends (Mubarak)

tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java


> Add metrics to keep track of slow HLog appends
> --
>
> Key: HBASE-3565
> URL: https://issues.apache.org/jira/browse/HBASE-3565
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Reporter: Benoit Sigoure
>Assignee: Mubarak Seyed
>  Labels: monitoring
> Fix For: 0.94.0
>
> Attachments: HBASE-3565.trunk.v1.patch
>
>
> Whenever an edit takes too long to be written to an HLog, HBase logs a 
> warning such as this one:
> {code}
> 2011-02-23 20:03:14,703 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
> IPC Server handler 21 on 60020 took 15065ms appending an edit to hlog; 
> editcount=126050
> {code}
> I would like to have a counter incremented each time this happens and this 
> counter exposed via the metrics stuff in HBase so I can collect it in my 
> monitoring system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4440) add an option to presplit table to PerformanceEvaluation

2012-01-11 Thread Sujee Maniyam (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184329#comment-13184329
 ] 

Sujee Maniyam commented on HBASE-4440:
--

sounds good.  I will submit a patch.

couple of newbie logistical questions:

1) should I create a new patch against the trunk?  the original patch is 
already committed in trunk / 0.94.

2) do I leave old patch attachments in the JIRA or should I delete them (to 
reduce clutter)

thanks JD

> add an option to presplit table to PerformanceEvaluation
> 
>
> Key: HBASE-4440
> URL: https://issues.apache.org/jira/browse/HBASE-4440
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Sujee Maniyam
>Assignee: Sujee Maniyam
>Priority: Minor
>  Labels: benchmark
> Fix For: 0.94.0
>
> Attachments: PerformanceEvaluation.java, 
> PerformanceEvaluation_HBASE_4440.patch, 
> PerformanceEvaluation_HBASE_4440_2.patch
>
>
> PerformanceEvaluation a quick way to 'benchmark' a HBase cluster.  The 
> current 'write*' operations do not pre-split the table.  Pre splitting the 
> table will really boost the insert performance.
> It would be nice to have an option to enable pre-splitting table before the 
> inserts begin.
> it would look something like:
> (a) hbase ...PerformanceEvaluation   --presplit=10 
> (b) hbase ...PerformanceEvaluation   --presplit 
> (b) will try to presplit the table on some default value (say number of 
> region servers)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3565) Add metrics to keep track of slow HLog appends

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-3565:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add metrics to keep track of slow HLog appends
> --
>
> Key: HBASE-3565
> URL: https://issues.apache.org/jira/browse/HBASE-3565
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Reporter: Benoit Sigoure
>Assignee: Mubarak Seyed
>  Labels: monitoring
> Fix For: 0.94.0
>
> Attachments: HBASE-3565.trunk.v1.patch
>
>
> Whenever an edit takes too long to be written to an HLog, HBase logs a 
> warning such as this one:
> {code}
> 2011-02-23 20:03:14,703 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
> IPC Server handler 21 on 60020 took 15065ms appending an edit to hlog; 
> editcount=126050
> {code}
> I would like to have a counter incremented each time this happens and this 
> counter exposed via the metrics stuff in HBase so I can collect it in my 
> monitoring system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4440) add an option to presplit table to PerformanceEvaluation

2012-01-11 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184342#comment-13184342
 ] 

Jean-Daniel Cryans commented on HBASE-4440:
---

Please create a new jira, carry over some of our conversations we had here to 
justify it, and leave this jira like it is please.

Good stuff.

> add an option to presplit table to PerformanceEvaluation
> 
>
> Key: HBASE-4440
> URL: https://issues.apache.org/jira/browse/HBASE-4440
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Sujee Maniyam
>Assignee: Sujee Maniyam
>Priority: Minor
>  Labels: benchmark
> Fix For: 0.94.0
>
> Attachments: PerformanceEvaluation.java, 
> PerformanceEvaluation_HBASE_4440.patch, 
> PerformanceEvaluation_HBASE_4440_2.patch
>
>
> PerformanceEvaluation a quick way to 'benchmark' a HBase cluster.  The 
> current 'write*' operations do not pre-split the table.  Pre splitting the 
> table will really boost the insert performance.
> It would be nice to have an option to enable pre-splitting table before the 
> inserts begin.
> it would look something like:
> (a) hbase ...PerformanceEvaluation   --presplit=10 
> (b) hbase ...PerformanceEvaluation   --presplit 
> (b) will try to presplit the table on some default value (say number of 
> region servers)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3565) Add metrics to keep track of slow HLog appends

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184365#comment-13184365
 ] 

Hudson commented on HBASE-3565:
---

Integrated in HBase-TRUNK-security #73 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/73/])
HBASE-3565 Add metrics to keep track of slow HLog appends (Mubarak)

tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java


> Add metrics to keep track of slow HLog appends
> --
>
> Key: HBASE-3565
> URL: https://issues.apache.org/jira/browse/HBASE-3565
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Reporter: Benoit Sigoure
>Assignee: Mubarak Seyed
>  Labels: monitoring
> Fix For: 0.94.0
>
> Attachments: HBASE-3565.trunk.v1.patch
>
>
> Whenever an edit takes too long to be written to an HLog, HBase logs a 
> warning such as this one:
> {code}
> 2011-02-23 20:03:14,703 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
> IPC Server handler 21 on 60020 took 15065ms appending an edit to hlog; 
> editcount=126050
> {code}
> I would like to have a counter incremented each time this happens and this 
> counter exposed via the metrics stuff in HBase so I can collect it in my 
> monitoring system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5139) Compute (weighted) median using AggregateProtocol

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184367#comment-13184367
 ] 

Hudson commented on HBASE-5139:
---

Integrated in HBase-TRUNK-security #73 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/73/])
HBASE-5139 Compute (weighted) median using AggregateProtocol

tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java


> Compute (weighted) median using AggregateProtocol
> -
>
> Key: HBASE-5139
> URL: https://issues.apache.org/jira/browse/HBASE-5139
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zhihong Yu
>Assignee: Zhihong Yu
> Attachments: 5139-v2.txt
>
>
> Suppose cf:cq1 stores numeric values and optionally cf:cq2 stores weights. 
> This task finds out the median value among the values of cf:cq1 (See 
> http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/R.basic/html/weighted.median.html)
> This can be done in two passes.
> The first pass utilizes AggregateProtocol where the following tuple is 
> returned from each region:
> (partial-sum-of-values, partial-sum-of-weights)
> The start rowkey (supplied by coprocessor framework) would be used to sort 
> the tuples. This way we can determine which region (called R) contains the 
> (weighted) median. partial-sum-of-weights can be 0 if unweighted median is 
> sought
> The second pass involves scanning the table, beginning with startrow of 
> region R and computing partial (weighted) sum until the threshold of S/2 is 
> crossed. The (weighted) median is returned.
> However, this approach wouldn't work if there is mutation in the underlying 
> table between pass one and pass two.
> In that case, sequential scanning seems to be the solution which is slower 
> than the above approach.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5163) TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or hadoop QA ("The directory is already locked.")

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184366#comment-13184366
 ] 

Hudson commented on HBASE-5163:
---

Integrated in HBase-TRUNK-security #73 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/73/])
HBASE-5163 TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on 
Jenkins or hadoop QA ("The directory is already locked.") (N Keywal)

tedyu : 
Files : 
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java


> TestLogRolling#testLogRollOnDatanodeDeath fails sometimes on Jenkins or 
> hadoop QA ("The directory is already locked.")
> --
>
> Key: HBASE-5163
> URL: https://issues.apache.org/jira/browse/HBASE-5163
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.94.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Attachments: 5163.patch
>
>
> The stack is typically:
> {noformat}
>  type="java.io.IOException">java.io.IOException: Cannot lock storage 
> /tmp/19e3e634-8980-4923-9e72-a5b900a71d63/dfscluster_32a46f7b-24ef-488f-bd33-915959e001f4/dfs/data/data3.
>  The directory is already locked.
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
>   at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:290)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1467)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:460)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:470)
> // ...
> {noformat}
> It can be reproduced without parallelization or without executing the other 
> tests in the class. It seems to fail about 5% of the time.
> This comes from the naming policy for the directories in 
> MiniDFSCluster#startDataNode. It depends on the number of nodes *currently* 
> in the cluster, and does not take into account previous starts/stops:
> {noformat}
>for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) {
>   if (manageDfsDirs) {
> File dir1 = new File(data_dir, "data"+(2*i+1));
> File dir2 = new File(data_dir, "data"+(2*i+2));
> dir1.mkdirs();
> dir2.mkdirs();
>   // [...]
> {noformat}
> This means that it if we want to stop/start a datanode, we should always stop 
> the last one, if not the names will conflict. This test exhibits the behavior:
> {noformat}
>   @Test
>   public void testMiniDFSCluster_startDataNode() throws Exception {
> assertTrue( dfsCluster.getDataNodes().size() == 2 );
> // Works, as we kill the last datanode, we can now start a datanode
> dfsCluster.stopDataNode(1);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
> // Fails, as it's not the last datanode, the directory will conflict on
> //  creation
> dfsCluster.stopDataNode(0);
> try {
>   dfsCluster
> .startDataNodes(TEST_UTIL.getConfiguration(), 1, true, null, null);
>   fail("There should be an exception because the directory already 
> exists");
> } catch (IOException e) {
>   assertTrue( e.getMessage().contains("The directory is already 
> locked."));
>   LOG.info("Expected (!) exception caught " + e.getMessage());
> }
> // Works, as we kill the last datanode, we can now restart 2 datanodes
> // This makes us back with 2 nodes
> dfsCluster.stopDataNode(0);
> dfsCluster
>   .startDataNodes(TEST_UTIL.getConfiguration(), 2, true, null, null);
>   }
> {noformat}
> And then this behavior is randomly triggered in testLogRollOnDatanodeDeath 
> because when we do
> {noformat}
> DatanodeInfo[] pipeline = getPipeline(log);
> assertTrue(pipeline.length == fs.getDefaultReplication());
> {noformat}
> and then kill the datanodes in the pipeline, we will have:
>  - most of the time: pipeline = 1 & 2, so after killing 1&2 we can start a 
> new datanode that will reuse

[jira] [Commented] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184377#comment-13184377
 ] 

Zhihong Yu commented on HBASE-5136:
---

Can someone review the patch ?

Thanks

> Redundant MonitoredTask instances in case of distributed log splitting retry
> 
>
> Key: HBASE-5136
> URL: https://issues.apache.org/jira/browse/HBASE-5136
> Project: HBase
>  Issue Type: Task
>Reporter: Zhihong Yu
>Assignee: Zhihong Yu
> Attachments: 5136.txt
>
>
> In case of log splitting retry, the following code would be executed multiple 
> times:
> {code}
>   public long splitLogDistributed(final List logDirs) throws 
> IOException {
> MonitoredTask status = TaskMonitor.get().createStatus(
>   "Doing distributed log split in " + logDirs);
> {code}
> leading to multiple MonitoredTask instances.
> User may get confused by multiple distributed log splitting entries for the 
> same region server on master UI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5139) Compute (weighted) median using AggregateProtocol

2012-01-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184393#comment-13184393
 ] 

Hudson commented on HBASE-5139:
---

Integrated in HBase-TRUNK #2620 (See 
[https://builds.apache.org/job/HBase-TRUNK/2620/])
HBASE-5139 Compute (weighted) median using AggregateProtocol

tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java


> Compute (weighted) median using AggregateProtocol
> -
>
> Key: HBASE-5139
> URL: https://issues.apache.org/jira/browse/HBASE-5139
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zhihong Yu
>Assignee: Zhihong Yu
> Attachments: 5139-v2.txt
>
>
> Suppose cf:cq1 stores numeric values and optionally cf:cq2 stores weights. 
> This task finds out the median value among the values of cf:cq1 (See 
> http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/R.basic/html/weighted.median.html)
> This can be done in two passes.
> The first pass utilizes AggregateProtocol where the following tuple is 
> returned from each region:
> (partial-sum-of-values, partial-sum-of-weights)
> The start rowkey (supplied by coprocessor framework) would be used to sort 
> the tuples. This way we can determine which region (called R) contains the 
> (weighted) median. partial-sum-of-weights can be 0 if unweighted median is 
> sought
> The second pass involves scanning the table, beginning with startrow of 
> region R and computing partial (weighted) sum until the threshold of S/2 is 
> crossed. The (weighted) median is returned.
> However, this approach wouldn't work if there is mutation in the underlying 
> table between pass one and pass two.
> In that case, sequential scanning seems to be the solution which is slower 
> than the above approach.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

2012-01-11 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184400#comment-13184400
 ] 

jirapos...@reviews.apache.org commented on HBASE-5128:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4317
---



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java


Should be 'to end key'.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java


Should insert some text between newRegion and region.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java


This should be outside the for loop.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java


Space between > and 0.


- Ted


On 2012-01-11 12:46:37, jmhsieh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  ---
bq.  
bq.  (Updated 2012-01-11 12:46:37)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and 
Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real 
clusters. The tests are flakey on the 0.90 branch (so there is something async 
that I didn't synchronize properly), and there are a few more TODO's I want to 
knock out before this is ready for full review to be considered for committing. 
It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and 
try to force the overlapping regions to be closed. For some of these, I cannot 
delete a table after it is repaired without causing subsequent tests to fail. I 
think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while 
delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go 
through the master and in turn doesn't modify in-memory state – disable uses 
out of date in-memory region assignments. If I use the unassign method sends 
RIT transitions to the master, but which ends up attempting to assign it again, 
causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment 
data for particular regions or to force it to re-read from META? (without 
modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and 
SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused 
with each other and basically something is still happening asynchronously. I 
think this is the new region is being assigned and is still transitioning. 
Sound about right? To make the unit test deterministic, should hbck wait for 
these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.  https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
bq.src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
bq.src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 
29e8bb2 
bq.
src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java 
PRE-CREATION 
bq.src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
bq.src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java 
dbb97f8 
bq.
src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 
3e8729d 
bq.
src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 
11a1151 
bq.
src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java
 4a09ce2 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  All unit tests pass sometimes.  Some fail sometimes (generally the cases 
that fabricate new regions).  
bq.  
bq.  Not ready for commit.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.



> [uber hbck] Enable hbck to automatically repair table integrity problems as 
> well as region consistency problems while online.
> -
>
> Key: HBASE-5128
> URL: https://

[jira] [Updated] (HBASE-5167) We shouldn't be injecting 'Killing [daemon]' into logs, when we aren't doing that.

2012-01-11 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5167:
-

  Resolution: Fixed
Assignee: Harsh J
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed trunk.  Thanks Harsh.

> We shouldn't be injecting 'Killing [daemon]' into logs, when we aren't doing 
> that.
> --
>
> Key: HBASE-5167
> URL: https://issues.apache.org/jira/browse/HBASE-5167
> Project: HBase
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 0.92.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 0.94.0
>
> Attachments: HBASE-5167.patch
>
>
> HBASE-4209 changed the behavior of the scripts such that we do not kill the 
> daemons away anymore. We should have also changed the message shown in the 
> logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5168) Backport HBASE-5100 - Rollback of split could cause closed region to be opened again

2012-01-11 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184424#comment-13184424
 ] 

stack commented on HBASE-5168:
--

+1

> Backport HBASE-5100 - Rollback of split could cause closed region to be 
> opened again
> 
>
> Key: HBASE-5168
> URL: https://issues.apache.org/jira/browse/HBASE-5168
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
> Attachments: HBASE-5100_0.90.patch
>
>
> Considering the importance of the defect merging it to 0.90.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5180) [book] book.xml - fixed scanner example

2012-01-11 Thread Doug Meil (Created) (JIRA)
[book] book.xml - fixed scanner example
---

 Key: HBASE-5180
 URL: https://issues.apache.org/jira/browse/HBASE-5180
 Project: HBase
  Issue Type: Bug
Reporter: Doug Meil
Assignee: Doug Meil
 Attachments: book_HBASE_5180.xml.patch

book.xml - the scanner example wasn't closing the scanner!  that's bad practice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5180) [book] book.xml - fixed scanner example

2012-01-11 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-5180:
-

Attachment: book_HBASE_5180.xml.patch

> [book] book.xml - fixed scanner example
> ---
>
> Key: HBASE-5180
> URL: https://issues.apache.org/jira/browse/HBASE-5180
> Project: HBase
>  Issue Type: Bug
>Reporter: Doug Meil
>Assignee: Doug Meil
> Attachments: book_HBASE_5180.xml.patch
>
>
> book.xml - the scanner example wasn't closing the scanner!  that's bad 
> practice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5180) [book] book.xml - fixed scanner example

2012-01-11 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-5180:
-

Status: Patch Available  (was: Open)

> [book] book.xml - fixed scanner example
> ---
>
> Key: HBASE-5180
> URL: https://issues.apache.org/jira/browse/HBASE-5180
> Project: HBase
>  Issue Type: Bug
>Reporter: Doug Meil
>Assignee: Doug Meil
> Attachments: book_HBASE_5180.xml.patch
>
>
> book.xml - the scanner example wasn't closing the scanner!  that's bad 
> practice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5180) [book] book.xml - fixed scanner example

2012-01-11 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184428#comment-13184428
 ] 

stack commented on HBASE-5180:
--

+1

> [book] book.xml - fixed scanner example
> ---
>
> Key: HBASE-5180
> URL: https://issues.apache.org/jira/browse/HBASE-5180
> Project: HBase
>  Issue Type: Bug
>Reporter: Doug Meil
>Assignee: Doug Meil
> Attachments: book_HBASE_5180.xml.patch
>
>
> book.xml - the scanner example wasn't closing the ResultScanner!  that's bad 
> practice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5180) [book] book.xml - fixed scanner example

2012-01-11 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-5180:
-

Description: book.xml - the scanner example wasn't closing the 
ResultScanner!  that's bad practice.  (was: book.xml - the scanner example 
wasn't closing the scanner!  that's bad practice.)

> [book] book.xml - fixed scanner example
> ---
>
> Key: HBASE-5180
> URL: https://issues.apache.org/jira/browse/HBASE-5180
> Project: HBase
>  Issue Type: Bug
>Reporter: Doug Meil
>Assignee: Doug Meil
> Attachments: book_HBASE_5180.xml.patch
>
>
> book.xml - the scanner example wasn't closing the ResultScanner!  that's bad 
> practice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5180) [book] book.xml - fixed scanner example

2012-01-11 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-5180:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> [book] book.xml - fixed scanner example
> ---
>
> Key: HBASE-5180
> URL: https://issues.apache.org/jira/browse/HBASE-5180
> Project: HBase
>  Issue Type: Bug
>Reporter: Doug Meil
>Assignee: Doug Meil
> Attachments: book_HBASE_5180.xml.patch
>
>
> book.xml - the scanner example wasn't closing the ResultScanner!  that's bad 
> practice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5129) book is inconsistent regarding disabling - major compaction

2012-01-11 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-5129:
-

Assignee: Doug Meil

> book is inconsistent regarding disabling - major compaction
> ---
>
> Key: HBASE-5129
> URL: https://issues.apache.org/jira/browse/HBASE-5129
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.90.1
>Reporter: Mikael Sitruk
>Assignee: Doug Meil
>Priority: Minor
>
> It seems that the book has some inconsistencies regarding the way to disable 
> major compactions
> According to the book in chapter 2.6.1.1. HBase Default Configuration
> hbase.hregion.majorcompaction - The time (in miliseconds) between 'major' 
> compactions of all HStoreFiles in a region. Default: 1 day. Set to 0 to 
> disable automated major compactions.
> Default: 8640 
> (http://hbase.apache.org/book.html#hbase_default_configurations)
> According to the book at chapter 2.8.2.8. Managed Compactions
> "A common administrative technique is to manage major compactions manually, 
> rather than letting HBase do it. By default, 
> HConstants.MAJOR_COMPACTION_PERIOD is one day and major compactions may kick 
> in when you least desire it - especially on a busy system. To "turn off" 
> automatic major compactions set the value to Long.MAX_VALUE."
> According to the code org.apache.hadoop.hbase.regionserver.Store.java, "0" is 
> the right answer. 
> (affect all documentation from 0.90.1)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5129) book is inconsistent regarding disabling - major compaction

2012-01-11 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-5129:
-

Attachment: configuration_HBASE_5129.xml.patch

> book is inconsistent regarding disabling - major compaction
> ---
>
> Key: HBASE-5129
> URL: https://issues.apache.org/jira/browse/HBASE-5129
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.90.1
>Reporter: Mikael Sitruk
>Assignee: Doug Meil
>Priority: Minor
> Attachments: configuration_HBASE_5129.xml.patch
>
>
> It seems that the book has some inconsistencies regarding the way to disable 
> major compactions
> According to the book in chapter 2.6.1.1. HBase Default Configuration
> hbase.hregion.majorcompaction - The time (in miliseconds) between 'major' 
> compactions of all HStoreFiles in a region. Default: 1 day. Set to 0 to 
> disable automated major compactions.
> Default: 8640 
> (http://hbase.apache.org/book.html#hbase_default_configurations)
> According to the book at chapter 2.8.2.8. Managed Compactions
> "A common administrative technique is to manage major compactions manually, 
> rather than letting HBase do it. By default, 
> HConstants.MAJOR_COMPACTION_PERIOD is one day and major compactions may kick 
> in when you least desire it - especially on a busy system. To "turn off" 
> automatic major compactions set the value to Long.MAX_VALUE."
> According to the code org.apache.hadoop.hbase.regionserver.Store.java, "0" is 
> the right answer. 
> (affect all documentation from 0.90.1)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5129) book is inconsistent regarding disabling - major compaction

2012-01-11 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-5129:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> book is inconsistent regarding disabling - major compaction
> ---
>
> Key: HBASE-5129
> URL: https://issues.apache.org/jira/browse/HBASE-5129
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.90.1
>Reporter: Mikael Sitruk
>Assignee: Doug Meil
>Priority: Minor
> Attachments: configuration_HBASE_5129.xml.patch
>
>
> It seems that the book has some inconsistencies regarding the way to disable 
> major compactions
> According to the book in chapter 2.6.1.1. HBase Default Configuration
> hbase.hregion.majorcompaction - The time (in miliseconds) between 'major' 
> compactions of all HStoreFiles in a region. Default: 1 day. Set to 0 to 
> disable automated major compactions.
> Default: 8640 
> (http://hbase.apache.org/book.html#hbase_default_configurations)
> According to the book at chapter 2.8.2.8. Managed Compactions
> "A common administrative technique is to manage major compactions manually, 
> rather than letting HBase do it. By default, 
> HConstants.MAJOR_COMPACTION_PERIOD is one day and major compactions may kick 
> in when you least desire it - especially on a busy system. To "turn off" 
> automatic major compactions set the value to Long.MAX_VALUE."
> According to the code org.apache.hadoop.hbase.regionserver.Store.java, "0" is 
> the right answer. 
> (affect all documentation from 0.90.1)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5129) book is inconsistent regarding disabling - major compaction

2012-01-11 Thread Doug Meil (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184437#comment-13184437
 ] 

Doug Meil commented on HBASE-5129:
--

Thanks for the catch Mikael!

> book is inconsistent regarding disabling - major compaction
> ---
>
> Key: HBASE-5129
> URL: https://issues.apache.org/jira/browse/HBASE-5129
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.90.1
>Reporter: Mikael Sitruk
>Assignee: Doug Meil
>Priority: Minor
> Attachments: configuration_HBASE_5129.xml.patch
>
>
> It seems that the book has some inconsistencies regarding the way to disable 
> major compactions
> According to the book in chapter 2.6.1.1. HBase Default Configuration
> hbase.hregion.majorcompaction - The time (in miliseconds) between 'major' 
> compactions of all HStoreFiles in a region. Default: 1 day. Set to 0 to 
> disable automated major compactions.
> Default: 8640 
> (http://hbase.apache.org/book.html#hbase_default_configurations)
> According to the book at chapter 2.8.2.8. Managed Compactions
> "A common administrative technique is to manage major compactions manually, 
> rather than letting HBase do it. By default, 
> HConstants.MAJOR_COMPACTION_PERIOD is one day and major compactions may kick 
> in when you least desire it - especially on a busy system. To "turn off" 
> automatic major compactions set the value to Long.MAX_VALUE."
> According to the code org.apache.hadoop.hbase.regionserver.Store.java, "0" is 
> the right answer. 
> (affect all documentation from 0.90.1)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5129) book is inconsistent regarding disabling - major compaction

2012-01-11 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-5129:
-

Status: Patch Available  (was: Open)

> book is inconsistent regarding disabling - major compaction
> ---
>
> Key: HBASE-5129
> URL: https://issues.apache.org/jira/browse/HBASE-5129
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.90.1
>Reporter: Mikael Sitruk
>Assignee: Doug Meil
>Priority: Minor
> Attachments: configuration_HBASE_5129.xml.patch
>
>
> It seems that the book has some inconsistencies regarding the way to disable 
> major compactions
> According to the book in chapter 2.6.1.1. HBase Default Configuration
> hbase.hregion.majorcompaction - The time (in miliseconds) between 'major' 
> compactions of all HStoreFiles in a region. Default: 1 day. Set to 0 to 
> disable automated major compactions.
> Default: 8640 
> (http://hbase.apache.org/book.html#hbase_default_configurations)
> According to the book at chapter 2.8.2.8. Managed Compactions
> "A common administrative technique is to manage major compactions manually, 
> rather than letting HBase do it. By default, 
> HConstants.MAJOR_COMPACTION_PERIOD is one day and major compactions may kick 
> in when you least desire it - especially on a busy system. To "turn off" 
> automatic major compactions set the value to Long.MAX_VALUE."
> According to the code org.apache.hadoop.hbase.regionserver.Store.java, "0" is 
> the right answer. 
> (affect all documentation from 0.90.1)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5179:
--

Attachment: (was: 5179-90.txt)

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5181) Improve error message when Master fail-over happens and ZK unassigned node contains stale znode(s)

2012-01-11 Thread Mubarak Seyed (Created) (JIRA)
Improve error message when Master fail-over happens and ZK unassigned node 
contains stale znode(s)
--

 Key: HBASE-5181
 URL: https://issues.apache.org/jira/browse/HBASE-5181
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.5, 0.92.0
Reporter: Mubarak Seyed
Priority: Minor


When master fail-over happens, if we have number of RITs under 
/hbase/unassigned and if we have stale znode(s) (encoded region names) under 
/hbase/unassigned, we are getting

{code}
2011-12-30 10:27:35,623 INFO org.apache.hadoop.hbase.master.HMaster: Master 
startup proceeding: master failover 
2011-12-30 10:27:36,002 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Failed-over master needs to process 1717 regions in transition 
2011-12-30 10:27:36,004 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled 
exception. Starting shutdown. 
java.lang.ArrayIndexOutOfBoundsException: -256 
at 
org.apache.hadoop.hbase.executor.RegionTransitionData.readFields(RegionTransitionData.java:148)
 
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:105) 
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) 
at 
org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
 
at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:743) 
at 
org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransition(AssignmentManager.java:262)
 
at 
org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:223)
 
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:401) 
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283)
{code}

and there is no clue on how to clean-up the stale znode(s) from unassigned 
using zkCli.sh (del /hbase/unassigned/). It would be good if 
we include the bad region name in IOException from 
RegionTransitionData.readFields().

{code}

@Override
  public void readFields(DataInput in) throws IOException {
// the event type byte
eventType = EventType.values()[in.readShort()];
// the timestamp
stamp = in.readLong();
// the encoded name of the region being transitioned
regionName = Bytes.readByteArray(in);
// remaining fields are optional so prefixed with boolean
// the name of the regionserver sending the data
if (in.readBoolean()) {
  byte [] versionedBytes = Bytes.readByteArray(in);
  this.origin = ServerName.parseVersionedServerName(versionedBytes);
}
if (in.readBoolean()) {
  this.payload = Bytes.readByteArray(in);
}
  }
{code}

If the code execution has survived until regionName then we can include the 
regionName in IOException with error message to clean-up the stale znode(s) 
under /hbase/unassigned.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5179:
--

Attachment: 5179-90.txt

New patch for 0.90
Now TestRollingRestart passes.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5179:
--

Comment: was deleted

(was: TestRollingRestart fails in 0.90 with patch.)

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184448#comment-13184448
 ] 

Zhihong Yu commented on HBASE-5179:
---

I think the reason Chunhui introduced a new Set for the dead servers being 
processed is that DeadServer is supposed to remember dead servers:
{code}
   * Set of known dead servers.  On znode expiration, servers are added here.
{code}
DeadServer.cleanPreviousInstance() is called by ServerManager.checkIsDead() 
when the server becomes live again.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184450#comment-13184450
 ] 

Hadoop QA commented on HBASE-5179:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510261/5179-90.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/733//console

This message is automatically generated.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5179:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510261/5179-90.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/733//console

This message is automatically generated.)

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5182) TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly

2012-01-11 Thread Scott Chen (Created) (JIRA)
TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly
--

 Key: HBASE-5182
 URL: https://issues.apache.org/jira/browse/HBASE-5182
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Scott Chen
Priority: Minor


TBoundedThreadPoolServer does not take the configured threadKeepAliveTimeSec. 
It uses the default value instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-11 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5179:
--

Attachment: 5179-v3.txt

Patch v3 addresses Stack's comments

Some names are open to suggestion.

> Concurrent processing of processFaileOver and ServerShutdownHandler  may 
> cause region is assigned before completing split log, it would cause data loss
> ---
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5179-90.txt, 5179-v2.txt, 5179-v3.txt, hbase-5179.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5182) TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly

2012-01-11 Thread Scott Chen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated HBASE-5182:
--

Attachment: hbase-5182.txt

> TBoundedThreadPoolServer threadKeepAliveTimeSec is not configured properly
> --
>
> Key: HBASE-5182
> URL: https://issues.apache.org/jira/browse/HBASE-5182
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Scott Chen
>Priority: Minor
> Attachments: hbase-5182.txt
>
>
> TBoundedThreadPoolServer does not take the configured threadKeepAliveTimeSec. 
> It uses the default value instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   >