date:20141003


 [ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-12166:
--
Attachment: 12166.txt

A bit of cleanup while in here.

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
Assignee: stack
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
 This would seem to indicate that we successfully wrote zk that we are 
 recovering:
 {code}
 2014-10-03 01:57:47,672 DEBUG [MASTER_SERVER_OPERATIONS-asf900:37113-0]

[jira] [Commented] (HBASE-12151) Make dev scripts executable


[ 
https://issues.apache.org/jira/browse/HBASE-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157718#comment-14157718
 ] 

Hudson commented on HBASE-12151:


FAILURE: Integrated in HBase-TRUNK #5613 (See 
[https://builds.apache.org/job/HBase-TRUNK/5613/])
HBASE-12151 Set mode to 755 on executable scripts in dev-support directory 
(mstanleyjones: rev 7219471081ab5f65ad7ae3b2deeb3c1659922102)
* dev-support/jdiffHBasePublicAPI_common.sh
* dev-support/publish_hbase_website.sh
* dev-support/jenkinsEnv.sh
* dev-support/hbase_docker.sh


 Make dev scripts executable
 ---

 Key: HBASE-12151
 URL: https://issues.apache.org/jira/browse/HBASE-12151
 Project: HBase
  Issue Type: Bug
  Components: scripts
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
Priority: Minor
 Fix For: 2.0.0, 0.99.1

 Attachments: HBASE-12151.patch


 Is there any reason not to make dev-support/*.sh executable? It would make it 
 possible to sym-link to them from a directory in the executable path for 
 easier execution of the definitive scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator


 [ 
https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-11907:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to 0.98+

 Use the joni byte[] regex engine in place of j.u.regex in 
 RegexStringComparator
 ---

 Key: HBASE-11907
 URL: https://issues.apache.org/jira/browse/HBASE-11907
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, 
 HBASE-11907.patch


 The joni regex engine (https://github.com/jruby/joni), a Java port of 
 Oniguruma regexp library done by the JRuby project, is:
 - MIT licensed
 - Designed to work with byte[] arguments instead of String
 - Capable of handling UTF8 encoding
 - Regex syntax compatible
 - Interruptible
 - *About twice as fast as j.u.regex*
 - Has JRuby's jcodings library as a dependency, also MIT licensed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12164) Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate


[ 
https://issues.apache.org/jira/browse/HBASE-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157717#comment-14157717
 ] 

Hudson commented on HBASE-12164:


FAILURE: Integrated in HBase-TRUNK #5613 (See 
[https://builds.apache.org/job/HBase-TRUNK/5613/])
HBASE-12164 Check for presence of user Id in 
SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate (tedyu: rev 
a17614d5b27936c64af47d90408df007b1112d89)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.java


 Check for presence of user Id in 
 SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate
 

 Key: HBASE-12164
 URL: https://issues.apache.org/jira/browse/HBASE-12164
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: 12164-v1.txt, 12164-v1.txt


 Here is the code:
 {code}
 if (request.getFsToken().hasIdentifier()  
 request.getFsToken().hasPassword()) {
 {code}
 In test case, request.getFsToken().hasIdentifier() returns false, leading to 
 userToken being null.
 This would make secure bulk load unsuccessful because the body of 
 secureBulkLoadHFiles() is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-10153) improve VerifyReplication to compute BADROWS more accurately


[ 
https://issues.apache.org/jira/browse/HBASE-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157719#comment-14157719
 ] 

Hudson commented on HBASE-10153:


FAILURE: Integrated in HBase-TRUNK #5613 (See 
[https://builds.apache.org/job/HBase-TRUNK/5613/])
HBASE-10153 improve VerifyReplication to compute BADROWS more accurately 
(Jianwei) (tedyu: rev 8dbf7b22381dab18f9af13318c16181c42824d46)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java


 improve VerifyReplication to compute BADROWS more accurately
 

 Key: HBASE-10153
 URL: https://issues.apache.org/jira/browse/HBASE-10153
 Project: HBase
  Issue Type: Improvement
  Components: Operability, Replication
Affects Versions: 0.94.14
Reporter: cuijianwei
Assignee: cuijianwei
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: 10153-0.98.txt, 10153-v2-trunk.txt, 
 HBASE-10153-0.94-v1.patch, HBASE-10153-trunk.patch


 VerifyReplicaiton could compare the source table with its peer table and 
 compute BADROWS. However, the current BADROWS computing method might not be 
 accurate enough. For example, if source table contains rows as {r1, r2, r3, 
 r4} and peer table contains rows as {r1, r3, r4} BADROWS will be 3 because 
 'r2' in source table will make all the later row comparisons fail. Will it be 
 better if the BADROWS is computed to 1 in this situation? Maybe, we can 
 compute the BADROWS more accurately in merge comparison?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator


[ 
https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157726#comment-14157726
 ] 

Hudson commented on HBASE-11907:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #538 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/538/])
HBASE-11907 Use the joni byte[] regex engine in place of j.u.regex (apurtell: 
rev 579ce7a0d610352a7bcff5527ce24b04e8b2292a)
* hbase-protocol/src/main/protobuf/Comparator.proto
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ComparatorProtos.java
* hbase-client/pom.xml
* pom.xml
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/filter/RegexStringComparator.java


 Use the joni byte[] regex engine in place of j.u.regex in 
 RegexStringComparator
 ---

 Key: HBASE-11907
 URL: https://issues.apache.org/jira/browse/HBASE-11907
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, 
 HBASE-11907.patch


 The joni regex engine (https://github.com/jruby/joni), a Java port of 
 Oniguruma regexp library done by the JRuby project, is:
 - MIT licensed
 - Designed to work with byte[] arguments instead of String
 - Capable of handling UTF8 encoding
 - Regex syntax compatible
 - Interruptible
 - *About twice as fast as j.u.regex*
 - Has JRuby's jcodings library as a dependency, also MIT licensed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12164) Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate


[ 
https://issues.apache.org/jira/browse/HBASE-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157752#comment-14157752
 ] 

Hudson commented on HBASE-12164:


SUCCESS: Integrated in HBase-0.98 #566 (See 
[https://builds.apache.org/job/HBase-0.98/566/])
HBASE-12164 Check for presence of user Id in 
SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate (tedyu: rev 
0409d22a15d6656d0368b6343b7b3349d22bdd77)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.java


 Check for presence of user Id in 
 SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate
 

 Key: HBASE-12164
 URL: https://issues.apache.org/jira/browse/HBASE-12164
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: 12164-v1.txt, 12164-v1.txt


 Here is the code:
 {code}
 if (request.getFsToken().hasIdentifier()  
 request.getFsToken().hasPassword()) {
 {code}
 In test case, request.getFsToken().hasIdentifier() returns false, leading to 
 userToken being null.
 This would make secure bulk load unsuccessful because the body of 
 secureBulkLoadHFiles() is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12165) TestEndToEndSplitTransaction.testFromClientSideWhileSplitting fails


[ 
https://issues.apache.org/jira/browse/HBASE-12165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157754#comment-14157754
 ] 

Hudson commented on HBASE-12165:


FAILURE: Integrated in HBase-TRUNK #5614 (See 
[https://builds.apache.org/job/HBase-TRUNK/5614/])
HBASE-12165 TestEndToEndSplitTransaction.testFromClientSideWhileSplitting fails 
-- DEBUGGING STRINGS (stack: rev da9f2434b2ad9e85a7f726bb5334568ac772ec90)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java


 TestEndToEndSplitTransaction.testFromClientSideWhileSplitting fails
 ---

 Key: HBASE-12165
 URL: https://issues.apache.org/jira/browse/HBASE-12165
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0
Reporter: stack
Assignee: stack
 Fix For: 2.0.0, 0.99.1

 Attachments: 12165.debug.txt


 Test fails but exhibited fail reason is complaining about an NPE.
 java.lang.NullPointerException: null
   at 
 org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction.blockUntilRegionIsInMeta(TestEndToEndSplitTransaction.java:474)
   at 
 org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction.blockUntilRegionSplit(TestEndToEndSplitTransaction.java:451)
 Looks like we are timing out waiting on split but NPE obscures actual reason 
 for failure.
 Failed here 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12164) Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate


[ 
https://issues.apache.org/jira/browse/HBASE-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157781#comment-14157781
 ] 

Hudson commented on HBASE-12164:


SUCCESS: Integrated in HBase-1.0 #267 (See 
[https://builds.apache.org/job/HBase-1.0/267/])
HBASE-12164 Check for presence of user Id in 
SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate (tedyu: rev 
660f909a58986151f300ebf6c7fbbea963cb3cf3)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.java


 Check for presence of user Id in 
 SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate
 

 Key: HBASE-12164
 URL: https://issues.apache.org/jira/browse/HBASE-12164
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: 12164-v1.txt, 12164-v1.txt


 Here is the code:
 {code}
 if (request.getFsToken().hasIdentifier()  
 request.getFsToken().hasPassword()) {
 {code}
 In test case, request.getFsToken().hasIdentifier() returns false, leading to 
 userToken being null.
 This would make secure bulk load unsuccessful because the body of 
 secureBulkLoadHFiles() is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12165) TestEndToEndSplitTransaction.testFromClientSideWhileSplitting fails


[ 
https://issues.apache.org/jira/browse/HBASE-12165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157780#comment-14157780
 ] 

Hudson commented on HBASE-12165:


SUCCESS: Integrated in HBase-1.0 #267 (See 
[https://builds.apache.org/job/HBase-1.0/267/])
HBASE-12165 TestEndToEndSplitTransaction.testFromClientSideWhileSplitting fails 
-- DEBUGGING STRINGS (stack: rev 1dd70307018f9c259b42289ca615ac2d50c30565)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java


 TestEndToEndSplitTransaction.testFromClientSideWhileSplitting fails
 ---

 Key: HBASE-12165
 URL: https://issues.apache.org/jira/browse/HBASE-12165
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0
Reporter: stack
Assignee: stack
 Fix For: 2.0.0, 0.99.1

 Attachments: 12165.debug.txt


 Test fails but exhibited fail reason is complaining about an NPE.
 java.lang.NullPointerException: null
   at 
 org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction.blockUntilRegionIsInMeta(TestEndToEndSplitTransaction.java:474)
   at 
 org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction.blockUntilRegionSplit(TestEndToEndSplitTransaction.java:451)
 Looks like we are timing out waiting on split but NPE obscures actual reason 
 for failure.
 Failed here 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator


[ 
https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157784#comment-14157784
 ] 

Hudson commented on HBASE-11907:


FAILURE: Integrated in HBase-1.0 #268 (See 
[https://builds.apache.org/job/HBase-1.0/268/])
HBASE-11907 Use the joni byte[] regex engine in place of j.u.regex (apurtell: 
rev 5881eed36ebac0939daaa431000fd73fcf796c33)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/filter/RegexStringComparator.java
* pom.xml
* hbase-client/pom.xml
* hbase-protocol/src/main/protobuf/Comparator.proto
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ComparatorProtos.java


 Use the joni byte[] regex engine in place of j.u.regex in 
 RegexStringComparator
 ---

 Key: HBASE-11907
 URL: https://issues.apache.org/jira/browse/HBASE-11907
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, 
 HBASE-11907.patch


 The joni regex engine (https://github.com/jruby/joni), a Java port of 
 Oniguruma regexp library done by the JRuby project, is:
 - MIT licensed
 - Designed to work with byte[] arguments instead of String
 - Capable of handling UTF8 encoding
 - Regex syntax compatible
 - Interruptible
 - *About twice as fast as j.u.regex*
 - Has JRuby's jcodings library as a dependency, also MIT licensed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator


[ 
https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157807#comment-14157807
 ] 

Hudson commented on HBASE-11907:


FAILURE: Integrated in HBase-0.98 #567 (See 
[https://builds.apache.org/job/HBase-0.98/567/])
HBASE-11907 Use the joni byte[] regex engine in place of j.u.regex (apurtell: 
rev 579ce7a0d610352a7bcff5527ce24b04e8b2292a)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/filter/RegexStringComparator.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ComparatorProtos.java
* hbase-protocol/src/main/protobuf/Comparator.proto
* hbase-client/pom.xml
* pom.xml


 Use the joni byte[] regex engine in place of j.u.regex in 
 RegexStringComparator
 ---

 Key: HBASE-11907
 URL: https://issues.apache.org/jira/browse/HBASE-11907
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, 
 HBASE-11907.patch


 The joni regex engine (https://github.com/jruby/joni), a Java port of 
 Oniguruma regexp library done by the JRuby project, is:
 - MIT licensed
 - Designed to work with byte[] arguments instead of String
 - Capable of handling UTF8 encoding
 - Regex syntax compatible
 - Interruptible
 - *About twice as fast as j.u.regex*
 - Has JRuby's jcodings library as a dependency, also MIT licensed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator


[ 
https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157820#comment-14157820
 ] 

Hudson commented on HBASE-11907:


FAILURE: Integrated in HBase-TRUNK #5615 (See 
[https://builds.apache.org/job/HBase-TRUNK/5615/])
HBASE-11907 Use the joni byte[] regex engine in place of j.u.regex (apurtell: 
rev d8a7b67d798ab5fec399d4a0b97a025d5bff531c)
* pom.xml
* hbase-client/pom.xml
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/filter/RegexStringComparator.java
* hbase-protocol/src/main/protobuf/Comparator.proto
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ComparatorProtos.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java


 Use the joni byte[] regex engine in place of j.u.regex in 
 RegexStringComparator
 ---

 Key: HBASE-11907
 URL: https://issues.apache.org/jira/browse/HBASE-11907
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, 
 HBASE-11907.patch


 The joni regex engine (https://github.com/jruby/joni), a Java port of 
 Oniguruma regexp library done by the JRuby project, is:
 - MIT licensed
 - Designed to work with byte[] arguments instead of String
 - Capable of handling UTF8 encoding
 - Regex syntax compatible
 - Interruptible
 - *About twice as fast as j.u.regex*
 - Has JRuby's jcodings library as a dependency, also MIT licensed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11216) PerformanceEvaluation should provide an option to modify the value length.

2014-10-03 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158030#comment-14158030
 ] 

Jean-Marc Spaggiari commented on HBASE-11216:
-

Hum. Sound like I need to rebase this :(

 PerformanceEvaluation should provide an option to modify the value length.
 --

 Key: HBASE-11216
 URL: https://issues.apache.org/jira/browse/HBASE-11216
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Minor
 Attachments: HBASE-11216-v0-trunk.patch, HBASE-11216-v1-trunk.patch, 
 HBASE-11216-v2-trunk.patch


 All in the title.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11216) PerformanceEvaluation should provide an option to modify the value length.

2014-10-03 Thread Jean-Marc Spaggiari (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-11216:

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Duplicate of HBASE-11350

 PerformanceEvaluation should provide an option to modify the value length.
 --

 Key: HBASE-11216
 URL: https://issues.apache.org/jira/browse/HBASE-11216
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Minor
 Attachments: HBASE-11216-v0-trunk.patch, HBASE-11216-v1-trunk.patch, 
 HBASE-11216-v2-trunk.patch


 All in the title.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11350) [PE] Allow random value size

2014-10-03 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158038#comment-14158038
 ] 

Jean-Marc Spaggiari commented on HBASE-11350:
-

[~lhofhansl] you might this to be backported to 0.94.

 [PE] Allow random value size
 

 Key: HBASE-11350
 URL: https://issues.apache.org/jira/browse/HBASE-11350
 Project: HBase
  Issue Type: Improvement
  Components: Performance
Reporter: stack
Assignee: stack
 Fix For: 0.99.0

 Attachments: 11348.txt


 Allow PE to write random value sizes.  Helpful mimic'ing 'real' sizings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator


[ 
https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158041#comment-14158041
 ] 

Ted Yu commented on HBASE-11907:


From https://builds.apache.org/job/HBase-1.0/268/console:
{code}
[ERROR] 
/home/jenkins/jenkins-slave/workspace/HBase-1.0/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java:[26,49]
 error: package org.apache.hadoop.hbase.testclassification does not exist
[ERROR] 
/home/jenkins/jenkins-slave/workspace/HBase-1.0/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java:[27,49]
 error: package org.apache.hadoop.hbase.testclassification does not exist
[ERROR] 
/home/jenkins/jenkins-slave/workspace/HBase-1.0/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java:[32,11]
 error: cannot find symbol
[ERROR] symbol: class FilterTests
[ERROR] 
/home/jenkins/jenkins-slave/workspace/HBase-1.0/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java:[32,30]
 error: cannot find symbol
[ERROR] symbol: class SmallTests
{code}
Addendum for 1.0 handles the above

 Use the joni byte[] regex engine in place of j.u.regex in 
 RegexStringComparator
 ---

 Key: HBASE-11907
 URL: https://issues.apache.org/jira/browse/HBASE-11907
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch, 
 HBASE-11907.patch


 The joni regex engine (https://github.com/jruby/joni), a Java port of 
 Oniguruma regexp library done by the JRuby project, is:
 - MIT licensed
 - Designed to work with byte[] arguments instead of String
 - Capable of handling UTF8 encoding
 - Regex syntax compatible
 - Interruptible
 - *About twice as fast as j.u.regex*
 - Has JRuby's jcodings library as a dependency, also MIT licensed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator


 [ 
https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-11907:
---
Attachment: 11907-1.0.addendum

 Use the joni byte[] regex engine in place of j.u.regex in 
 RegexStringComparator
 ---

 Key: HBASE-11907
 URL: https://issues.apache.org/jira/browse/HBASE-11907
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: 11907-1.0.addendum, HBASE-11907.patch, 
 HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch


 The joni regex engine (https://github.com/jruby/joni), a Java port of 
 Oniguruma regexp library done by the JRuby project, is:
 - MIT licensed
 - Designed to work with byte[] arguments instead of String
 - Capable of handling UTF8 encoding
 - Regex syntax compatible
 - Interruptible
 - *About twice as fast as j.u.regex*
 - Has JRuby's jcodings library as a dependency, also MIT licensed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11625) Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum

2014-10-03 Thread Yuliang Jin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158077#comment-14158077
 ] 

Yuliang Jin commented on HBASE-11625:
-

Thanks for your reply. We are currently using

{noformat}
java version 1.6.0_37
Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
{noformat}

and

{noformat}
Hadoop 2.0.0-cdh4.3.0
HBase 0.94.6-cdh4.3.0
{noformat}

 Reading datablock throws Invalid HFile block magic and can not switch to 
 hdfs checksum 
 -

 Key: HBASE-11625
 URL: https://issues.apache.org/jira/browse/HBASE-11625
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.94.21, 0.98.4, 0.98.5
Reporter: qian wang
 Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz


 when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it 
 could happen file corruption but it only can switch to hdfs checksum 
 inputstream till validateBlockChecksum(). If the datablock's header corrupted 
 when b = new HFileBlock(),it throws the exception Invalid HFile block magic 
 and the rpc call fail



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12122) Try not to assign user regions to master all the time


 [ 
https://issues.apache.org/jira/browse/HBASE-12122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12122:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Try not to assign user regions to master all the time
 -

 Key: HBASE-12122
 URL: https://issues.apache.org/jira/browse/HBASE-12122
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 2.0.0, 0.99.1

 Attachments: hbase-12122.patch, hbase-12122_v2.patch


 Load balancer does a good job not to assign regions of tables not configured 
 to put on the active master. However, if there is no other region server, it 
 still assigns users regions to the master. This happens when all normal 
 region servers are crashed and recovering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator


[ 
https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158144#comment-14158144
 ] 

Andrew Purtell commented on HBASE-11907:


+1

 Use the joni byte[] regex engine in place of j.u.regex in 
 RegexStringComparator
 ---

 Key: HBASE-11907
 URL: https://issues.apache.org/jira/browse/HBASE-11907
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: 11907-1.0.addendum, HBASE-11907.patch, 
 HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch


 The joni regex engine (https://github.com/jruby/joni), a Java port of 
 Oniguruma regexp library done by the JRuby project, is:
 - MIT licensed
 - Designed to work with byte[] arguments instead of String
 - Capable of handling UTF8 encoding
 - Regex syntax compatible
 - Interruptible
 - *About twice as fast as j.u.regex*
 - Has JRuby's jcodings library as a dependency, also MIT licensed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12167) NPE in AssignmentManager

Jimmy Xiang created HBASE-12167:
---

 Summary: NPE in AssignmentManager
 Key: HBASE-12167
 URL: https://issues.apache.org/jira/browse/HBASE-12167
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


If we can't find a region plan, we should check.

{noformat}
2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] 
executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12168) Document Rest gateway SPNEGO-based authentication for client

2014-10-03 Thread Jerry He (JIRA)

Jerry He created HBASE-12168:


 Summary: Document Rest gateway SPNEGO-based authentication for 
client
 Key: HBASE-12168
 URL: https://issues.apache.org/jira/browse/HBASE-12168
 Project: HBase
  Issue Type: Task
  Components: documentation, REST, security
Reporter: Jerry He
 Fix For: 0.98.8, 0.99.1


After HBASE-5050, we seem to support SPNEGO-based authentication from client on 
Rest gateway. But I had a tough time finding the info.
The support is not mentioned in Security book. In the security book, we still 
have:
bq. It should be possible for clients to authenticate with the HBase cluster 
through the REST gateway in a pass-through manner via SPEGNO HTTP 
authentication. This is future work.

The release note in HBASE-5050 seems to be obsolete as well. e.g.
hbase.rest.kerberos.spnego.principal seems to be obsolete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12167) NPE in AssignmentManager


 [ 
https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12167:

Attachment: hbase-12167.patch

 NPE in AssignmentManager
 

 Key: HBASE-12167
 URL: https://issues.apache.org/jira/browse/HBASE-12167
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: hbase-12167.patch


 If we can't find a region plan, we should check.
 {noformat}
 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] 
 executor.EventHandler: Caught throwable while processing event 
 M_SERVER_SHUTDOWN
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12167) NPE in AssignmentManager


 [ 
https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12167:

Fix Version/s: 0.99.1
   2.0.0
   Status: Patch Available  (was: Open)

 NPE in AssignmentManager
 

 Key: HBASE-12167
 URL: https://issues.apache.org/jira/browse/HBASE-12167
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: hbase-12167.patch


 If we can't find a region plan, we should check.
 {noformat}
 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] 
 executor.EventHandler: Caught throwable while processing event 
 M_SERVER_SHUTDOWN
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12168) Document Rest gateway SPNEGO-based authentication for client

2014-10-03 Thread Jerry He (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158164#comment-14158164
 ] 

Jerry He commented on HBASE-12168:
--

The configuration steps are probably similar to or overlap with the set up for 
Rest security and impersonation support.
But it is not clear.

 Document Rest gateway SPNEGO-based authentication for client
 

 Key: HBASE-12168
 URL: https://issues.apache.org/jira/browse/HBASE-12168
 Project: HBase
  Issue Type: Task
  Components: documentation, REST, security
Reporter: Jerry He
 Fix For: 0.98.8, 0.99.1


 After HBASE-5050, we seem to support SPNEGO-based authentication from client 
 on Rest gateway. But I had a tough time finding the info.
 The support is not mentioned in Security book. In the security book, we still 
 have:
 bq. It should be possible for clients to authenticate with the HBase cluster 
 through the REST gateway in a pass-through manner via SPEGNO HTTP 
 authentication. This is future work.
 The release note in HBASE-5050 seems to be obsolete as well. e.g.
 hbase.rest.kerberos.spnego.principal seems to be obsolete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12168) Document Rest gateway SPNEGO-based authentication for client


[ 
https://issues.apache.org/jira/browse/HBASE-12168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158175#comment-14158175
 ] 

Jimmy Xiang commented on HBASE-12168:
-

Have you checked the refguid, section 8.1.6, 8.1.7? 
http://hbase.apache.org/book/security.html#hbase.secure.configuration
How should we improve it? Thanks.

 Document Rest gateway SPNEGO-based authentication for client
 

 Key: HBASE-12168
 URL: https://issues.apache.org/jira/browse/HBASE-12168
 Project: HBase
  Issue Type: Task
  Components: documentation, REST, security
Reporter: Jerry He
 Fix For: 0.98.8, 0.99.1


 After HBASE-5050, we seem to support SPNEGO-based authentication from client 
 on Rest gateway. But I had a tough time finding the info.
 The support is not mentioned in Security book. In the security book, we still 
 have:
 bq. It should be possible for clients to authenticate with the HBase cluster 
 through the REST gateway in a pass-through manner via SPEGNO HTTP 
 authentication. This is future work.
 The release note in HBASE-5050 seems to be obsolete as well. e.g.
 hbase.rest.kerberos.spnego.principal seems to be obsolete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12167) NPE in AssignmentManager


[ 
https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158183#comment-14158183
 ] 

stack commented on HBASE-12167:
---

Go for it. When the IOE comes up, who catches it?  What happens?

 NPE in AssignmentManager
 

 Key: HBASE-12167
 URL: https://issues.apache.org/jira/browse/HBASE-12167
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: hbase-12167.patch


 If we can't find a region plan, we should check.
 {noformat}
 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] 
 executor.EventHandler: Caught throwable while processing event 
 M_SERVER_SHUTDOWN
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12168) Document Rest gateway SPNEGO-based authentication for client

2014-10-03 Thread Jerry He (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158194#comment-14158194
 ] 

Jerry He commented on HBASE-12168:
--

Hi, [~jxiang]

Yes.
We can improve the sections, and/or add a section to mention SPNEGO-based 
authentication for client.
I am trying to set it up.  After that, I probably will have more input.

 Document Rest gateway SPNEGO-based authentication for client
 

 Key: HBASE-12168
 URL: https://issues.apache.org/jira/browse/HBASE-12168
 Project: HBase
  Issue Type: Task
  Components: documentation, REST, security
Reporter: Jerry He
 Fix For: 0.98.8, 0.99.1


 After HBASE-5050, we seem to support SPNEGO-based authentication from client 
 on Rest gateway. But I had a tough time finding the info.
 The support is not mentioned in Security book. In the security book, we still 
 have:
 bq. It should be possible for clients to authenticate with the HBase cluster 
 through the REST gateway in a pass-through manner via SPEGNO HTTP 
 authentication. This is future work.
 The release note in HBASE-5050 seems to be obsolete as well. e.g.
 hbase.rest.kerberos.spnego.principal seems to be obsolete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12167) NPE in AssignmentManager


[ 
https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158199#comment-14158199
 ] 

Jimmy Xiang commented on HBASE-12167:
-

SSH  catches it and reprocess the dead server. In SSH, we tried to wait for an 
extra regionserver. But it is not reliable since the extra regionserver could 
die after SSH thinks the extra server is there. So it is possible for this NPE 
and it is rare. Reprocessing the dead server should help.

 NPE in AssignmentManager
 

 Key: HBASE-12167
 URL: https://issues.apache.org/jira/browse/HBASE-12167
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: hbase-12167.patch


 If we can't find a region plan, we should check.
 {noformat}
 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] 
 executor.EventHandler: Caught throwable while processing event 
 M_SERVER_SHUTDOWN
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12152) TestLoadIncrementalHFiles shows up as zombie test

2014-10-03 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158217#comment-14158217
 ] 

Elliott Clark commented on HBASE-12152:
---

Please take other comments/more findings to a new jira. This one has been 
rendered un-usable.

 TestLoadIncrementalHFiles shows up as zombie test
 -

 Key: HBASE-12152
 URL: https://issues.apache.org/jira/browse/HBASE-12152
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
 Attachments: TestSecureLoadIncrementalHFilesSplitRecovery-fix.txt


 TestLoadIncrementalHFiles and TestLoadIncrementalHFilesSplitRecovery 
 frequently show up as zombie tests (from 0.98 to master branch).
 e.g. https://builds.apache.org/job/hbase-0.98/558/console
 Here is snippet of stack trace for TestLoadIncrementalHFilesSplitRecovery :
 {code}
 main prio=10 tid=0x7f8670008000 nid=0x1105 waiting on condition 
 [0x7f8674b57000]
java.lang.Thread.State: WAITING (parking)
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x00078d4c3ba0 (a 
 java.util.concurrent.FutureTask$Sync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:248)
   at java.util.concurrent.FutureTask.get(FutureTask.java:111)
   at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:382)
   at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:324)
   at 
 org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery.testGroupOrSplitWhenRegionHoleExistsInMeta(TestLoadIncrementalHFilesSplitRecovery.
  java:470)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex in RegexStringComparator


[ 
https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158238#comment-14158238
 ] 

Hudson commented on HBASE-11907:


FAILURE: Integrated in HBase-1.0 #269 (See 
[https://builds.apache.org/job/HBase-1.0/269/])
HBASE-11907 Addendum fixes test category import for TestRegexComparator (tedyu: 
rev 566686d9e97a79143d7661ce34587456eed235ff)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestRegexComparator.java


 Use the joni byte[] regex engine in place of j.u.regex in 
 RegexStringComparator
 ---

 Key: HBASE-11907
 URL: https://issues.apache.org/jira/browse/HBASE-11907
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: 11907-1.0.addendum, HBASE-11907.patch, 
 HBASE-11907.patch, HBASE-11907.patch, HBASE-11907.patch


 The joni regex engine (https://github.com/jruby/joni), a Java port of 
 Oniguruma regexp library done by the JRuby project, is:
 - MIT licensed
 - Designed to work with byte[] arguments instead of String
 - Capable of handling UTF8 encoding
 - Regex syntax compatible
 - Interruptible
 - *About twice as fast as j.u.regex*
 - Has JRuby's jcodings library as a dependency, also MIT licensed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12167) NPE in AssignmentManager


[ 
https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158261#comment-14158261
 ] 

Hadoop QA commented on HBASE-12167:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672784/hbase-12167.patch
  against trunk revision .
  ATTACHMENT ID: 12672784

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.coprocessor.TestMasterObserver
  org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster
  org.apache.hadoop.hbase.master.TestMasterFailover
  org.apache.hadoop.hbase.TestZooKeeper
  org.apache.hadoop.hbase.master.TestDistributedLogSplitting

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11206//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11206//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11206//console

This message is automatically generated.

 NPE in AssignmentManager
 

 Key: HBASE-12167
 URL: https://issues.apache.org/jira/browse/HBASE-12167
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: hbase-12167.patch


 If we can't find a region plan, we should check.
 {noformat}
 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] 
 executor.EventHandler: Caught throwable while processing event 
 M_SERVER_SHUTDOWN
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158275#comment-14158275
 ] 

stack commented on HBASE-12166:
---

[~jxiang]

Problem here is that master is carrying hbase:namespace but when recovering we 
act as though it only is hosting hbase:meta.

We mark hbase:meta as recovering and do its log splitting.

For hbase:namespace, we find it along w/ other regions and mark it as 
recovering only because it was on the master -- and master no longer has 
associated WALs because of above meta processing -- it just stays stuck in 
recovering mode.

If you have suggestion I'm all ears else I'll hack something in.

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
Assignee: stack
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66):

[jira] [Commented] (HBASE-12104) Some optimization and bugfix for HTableMultiplexer

2014-10-03 Thread Yi Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158290#comment-14158290
 ] 

Yi Deng commented on HBASE-12104:
-

[~eclark] I think the failing the tests are not related to my code.

 Some optimization and bugfix for HTableMultiplexer
 --

 Key: HBASE-12104
 URL: https://issues.apache.org/jira/browse/HBASE-12104
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Affects Versions: 2.0.0
Reporter: Yi Deng
Assignee: Yi Deng
  Labels: multiplexer
 Fix For: 2.0.0

 Attachments: 
 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 
 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 
 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 
 0001-Make-HTableMultiplexerStatus-public.patch


 Make HTableMultiplexerStatus public
 Delay before resubmit.
 Fix some missing counting on total failure.
 Use ScheduledExecutorService to simplify the code.
 Other refactoring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12167) NPE in AssignmentManager


[ 
https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158306#comment-14158306
 ] 

stack commented on HBASE-12167:
---

+1

The test failures are not yours.  They are 'classics'


 NPE in AssignmentManager
 

 Key: HBASE-12167
 URL: https://issues.apache.org/jira/browse/HBASE-12167
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: hbase-12167.patch


 If we can't find a region plan, we should check.
 {noformat}
 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] 
 executor.EventHandler: Caught throwable while processing event 
 M_SERVER_SHUTDOWN
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12104) Some optimization and bugfix for HTableMultiplexer


[ 
https://issues.apache.org/jira/browse/HBASE-12104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158309#comment-14158309
 ] 

stack commented on HBASE-12104:
---

Those faliures are not related. You committing [~eclark] or I can if you'd like.

 Some optimization and bugfix for HTableMultiplexer
 --

 Key: HBASE-12104
 URL: https://issues.apache.org/jira/browse/HBASE-12104
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Affects Versions: 2.0.0
Reporter: Yi Deng
Assignee: Yi Deng
  Labels: multiplexer
 Fix For: 2.0.0

 Attachments: 
 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 
 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 
 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 
 0001-Make-HTableMultiplexerStatus-public.patch


 Make HTableMultiplexerStatus public
 Delay before resubmit.
 Fix some missing counting on total failure.
 Use ScheduledExecutorService to simplify the code.
 Other refactoring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158319#comment-14158319
 ] 

Jimmy Xiang commented on HBASE-12166:
-

Good investigation. Table namespace is handled just the same as any other user 
tables. Let me take look.

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
Assignee: stack
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
 This would seem to indicate that we successfully wrote zk that

[jira] [Updated] (HBASE-12075) Preemptive Fast Fail


 [ 
https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju updated HBASE-12075:
---
Status: Open  (was: Patch Available)

 Preemptive Fast Fail
 

 Key: HBASE-12075
 URL: https://issues.apache.org/jira/browse/HBASE-12075
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Affects Versions: 0.98.6.1, 0.99.0, 2.0.0
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
 Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch, 
 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch


 In multi threaded clients, we use a feature developed on 0.89-fb branch 
 called Preemptive Fast Fail. This allows the client threads which would 
 potentially fail, fail fast. The idea behind this feature is that we allow, 
 among the hundreds of client threads, one thread to try and establish 
 connection with the regionserver and if that succeeds, we mark it as a live 
 node again. Meanwhile, other threads which are trying to establish connection 
 to the same server would ideally go into the timeouts which is effectively 
 unfruitful. We can in those cases return appropriate exceptions to those 
 clients instead of letting them retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12075) Preemptive Fast Fail


[ 
https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158335#comment-14158335
 ] 

Manukranth Kolloju commented on HBASE-12075:


Are there any more comments related to the patch ?

 Preemptive Fast Fail
 

 Key: HBASE-12075
 URL: https://issues.apache.org/jira/browse/HBASE-12075
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Affects Versions: 0.99.0, 2.0.0, 0.98.6.1
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
 Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch, 
 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch


 In multi threaded clients, we use a feature developed on 0.89-fb branch 
 called Preemptive Fast Fail. This allows the client threads which would 
 potentially fail, fail fast. The idea behind this feature is that we allow, 
 among the hundreds of client threads, one thread to try and establish 
 connection with the regionserver and if that succeeds, we mark it as a live 
 node again. Meanwhile, other threads which are trying to establish connection 
 to the same server would ideally go into the timeouts which is effectively 
 unfruitful. We can in those cases return appropriate exceptions to those 
 clients instead of letting them retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12075) Preemptive Fast Fail


 [ 
https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju updated HBASE-12075:
---
Status: Patch Available  (was: Open)

 Preemptive Fast Fail
 

 Key: HBASE-12075
 URL: https://issues.apache.org/jira/browse/HBASE-12075
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Affects Versions: 0.98.6.1, 0.99.0, 2.0.0
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
 Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch, 
 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch


 In multi threaded clients, we use a feature developed on 0.89-fb branch 
 called Preemptive Fast Fail. This allows the client threads which would 
 potentially fail, fail fast. The idea behind this feature is that we allow, 
 among the hundreds of client threads, one thread to try and establish 
 connection with the regionserver and if that succeeds, we mark it as a live 
 node again. Meanwhile, other threads which are trying to establish connection 
 to the same server would ideally go into the timeouts which is effectively 
 unfruitful. We can in those cases return appropriate exceptions to those 
 clients instead of letting them retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158340#comment-14158340
 ] 

stack commented on HBASE-12166:
---

[~jxiang] A simple 'fix' would be to not host meta:namespace on master?  I 
don't mind finishing this one (I can make it fail reliably). Was just looking 
for input on how you'd like it solved.

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
Assignee: stack
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region: 9d91d6eafe260ce33e8d7d23ccd13192

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158375#comment-14158375
 ] 

Jimmy Xiang commented on HBASE-12166:
-

You are right. Not hosting namespace on master can solve the issue. Your fix is 
fine with me. I'd like to look into it further to find out the root. Thanks.

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
Assignee: stack
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
 This would

[jira] [Updated] (HBASE-11394) Replication can have data loss if peer id contains hyphen -

2014-10-03 Thread Enis Soztutar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-11394:
--
Labels: beginner  (was: )

 Replication can have data loss if peer id contains hyphen -
 -

 Key: HBASE-11394
 URL: https://issues.apache.org/jira/browse/HBASE-11394
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Talat UYARER
  Labels: beginner
 Fix For: 2.0.0, 0.99.1


 This is an extension to HBASE-8207. It seems that there is no check for the 
 peer id string (which is the short name for the replication peer) format. So 
 in case a peer id containing -, it will cause data loss silently on server 
 failure. 
 I did not verify the claim via testing though, this is just purely from 
 reading the code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11625) Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum

2014-10-03 Thread Enis Soztutar (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158395#comment-14158395
]

Enis Soztutar commented on HBASE-11625:
---

Nick pointed me out to this issue. I have been trying to nail down a test
failure on windows (TestHFileBlock#testConcurrentReading) which fails with the
same stack trace. I can repro the failure only on windows, but with jdk6,
jdk7u45 and jdk7u67 as well, and hadoop versions 2.2.0, 2.4.0, 2.5.0 and
2.6.0-SNAPSHOT.

The test does write a file containing random HFileBlocks and does concurrent
reads from multiple threads. Once in a while, whatever we seek() + read() does
not match what is there in the file (I've verified multiple times from offsets
to the actual file). I think there is a rare edge case that we are hitting.

The other interesting bit is that the test only starts failing after 0.98.3. I
was not able to get previous versions to fail. But also I was not able to
bisect the commit because reproducing the failure is not easy.

Reading datablock throws Invalid HFile block magic and can not switch to
hdfs checksum
-

Key: HBASE-11625
URL: https://issues.apache.org/jira/browse/HBASE-11625
Project: HBase
Issue Type: Bug
Components: HFile
Affects Versions: 0.94.21, 0.98.4, 0.98.5
Reporter: qian wang
Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz

when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it
could happen file corruption but it only can switch to hdfs checksum
inputstream till validateBlockChecksum(). If the datablock's header corrupted
when b = new HFileBlock(),it throws the exception Invalid HFile block magic
and the rpc call fail

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12104) Some optimization and bugfix for HTableMultiplexer

2014-10-03 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158401#comment-14158401
 ] 

Elliott Clark commented on HBASE-12104:
---

If you want to commit that would be awesome. If not I can get it in a couple of 
hours if you haven't yet.

 Some optimization and bugfix for HTableMultiplexer
 --

 Key: HBASE-12104
 URL: https://issues.apache.org/jira/browse/HBASE-12104
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Affects Versions: 2.0.0
Reporter: Yi Deng
Assignee: Yi Deng
  Labels: multiplexer
 Fix For: 2.0.0

 Attachments: 
 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 
 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 
 0001-Make-HTableMultiplexerStatus-public-Delay-before-res.patch, 
 0001-Make-HTableMultiplexerStatus-public.patch


 Make HTableMultiplexerStatus public
 Delay before resubmit.
 Fix some missing counting on total failure.
 Use ScheduledExecutorService to simplify the code.
 Other refactoring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12039) Lower log level for TableNotFoundException log message when throwing

2014-10-03 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158403#comment-14158403
 ] 

Lars Hofhansl commented on HBASE-12039:
---

Nope... Let's just remove that stupid log. We'll get a message as soon as we'll 
try to connect in earnest (after we preloaded the cache).

 Lower log level for TableNotFoundException log message when throwing
 

 Key: HBASE-12039
 URL: https://issues.apache.org/jira/browse/HBASE-12039
 Project: HBase
  Issue Type: Bug
Reporter: James Taylor
Assignee: stack
Priority: Minor
 Fix For: 0.98.7, 0.94.25

 Attachments: 12039-0.94.txt, 12039.txt


 Our HBase client tries to get the HTable descriptor for a table that may or 
 may not exist. We catch and ignore the TableNotFoundException if it occurs, 
 but the log message appear regardless of this which confuses our users. Would 
 it be possible to lower the log level of this message since the exception is 
 already being throw (making it up to the caller how they want to handle this).
 14/09/20 20:01:54 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: _IDX_TEST.TESTING, row=_IDX_TEST.TESTING,,99
 at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:151)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1059)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1121)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1001)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:958)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:251)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:243)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12164) Check for presence of user Id in SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate


 [ 
https://issues.apache.org/jira/browse/HBASE-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-12164:
---
Attachment: 12164.addendum

There was a test failure in:
https://builds.apache.org/job/PreCommit-HBASE-Build/11207/console

Proposed addendum which logs the exception.

 Check for presence of user Id in 
 SecureBulkLoadEndpoint#secureBulkLoadHFiles() is inaccurate
 

 Key: HBASE-12164
 URL: https://issues.apache.org/jira/browse/HBASE-12164
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: 12164-v1.txt, 12164-v1.txt, 12164.addendum


 Here is the code:
 {code}
 if (request.getFsToken().hasIdentifier()  
 request.getFsToken().hasPassword()) {
 {code}
 In test case, request.getFsToken().hasIdentifier() returns false, leading to 
 userToken being null.
 This would make secure bulk load unsuccessful because the body of 
 secureBulkLoadHFiles() is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12075) Preemptive Fast Fail


[ 
https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158432#comment-14158432
 ] 

Ted Yu commented on HBASE-12075:


I manually triggered QA run. See:
https://builds.apache.org/job/PreCommit-HBASE-Build/11208/console

Not sure why QA bot didn't pick up the latest patch.

 Preemptive Fast Fail
 

 Key: HBASE-12075
 URL: https://issues.apache.org/jira/browse/HBASE-12075
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Affects Versions: 0.99.0, 2.0.0, 0.98.6.1
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
 Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch, 
 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch


 In multi threaded clients, we use a feature developed on 0.89-fb branch 
 called Preemptive Fast Fail. This allows the client threads which would 
 potentially fail, fail fast. The idea behind this feature is that we allow, 
 among the hundreds of client threads, one thread to try and establish 
 connection with the regionserver and if that succeeds, we mark it as a live 
 node again. Meanwhile, other threads which are trying to establish connection 
 to the same server would ideally go into the timeouts which is effectively 
 unfruitful. We can in those cases return appropriate exceptions to those 
 clients instead of letting them retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12156) TableName cache isn't used for one of valueOf methods.


 [ 
https://issues.apache.org/jira/browse/HBASE-12156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-12156:
---
Summary: TableName cache isn't used for one of valueOf methods.  (was: 
TableName cache doesn't used for once of valueOf methods.)

 TableName cache isn't used for one of valueOf methods.
 --

 Key: HBASE-12156
 URL: https://issues.apache.org/jira/browse/HBASE-12156
 Project: HBase
  Issue Type: Bug
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-12156-addendum-0.98.patch, HBASE-12156.patch


 there is wrong comparison, copypaste code compares namespace with qualifier 
 and namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (HBASE-12169) Document IPC binding options

2014-10-03 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey moved ACCUMULO-3195 to HBASE-12169:
---

  Component/s: (was: docs)
Affects Version/s: (was: 0.99.0)
   (was: 0.94.7)
   (was: 0.98.0)
   0.98.0
   0.94.7
   0.99.0
 Workflow: no-reopen-closed, patch-avail  (was: patch-available, 
re-open possible)
  Key: HBASE-12169  (was: ACCUMULO-3195)
  Project: HBase  (was: Accumulo)

 Document IPC binding options
 

 Key: HBASE-12169
 URL: https://issues.apache.org/jira/browse/HBASE-12169
 Project: HBase
  Issue Type: Task
Affects Versions: 0.99.0, 0.94.7, 0.98.0
Reporter: Sean Busbey
Priority: Minor

 HBASE-8148 added options to change binding component services, but there 
 aren't any docs for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12167) NPE in AssignmentManager


[ 
https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158444#comment-14158444
 ] 

Hudson commented on HBASE-12167:


FAILURE: Integrated in HBase-1.0 #270 (See 
[https://builds.apache.org/job/HBase-1.0/270/])
HBASE-12167 NPE in AssignmentManager (jxiang: rev 
c452942f57daa0ac8075556ed5d03940a0a13571)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 NPE in AssignmentManager
 

 Key: HBASE-12167
 URL: https://issues.apache.org/jira/browse/HBASE-12167
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: hbase-12167.patch


 If we can't find a region plan, we should check.
 {noformat}
 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] 
 executor.EventHandler: Caught throwable while processing event 
 M_SERVER_SHUTDOWN
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12073) Shell command user_permission fails on the table created by user if he is not global admin.


[ 
https://issues.apache.org/jira/browse/HBASE-12073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158448#comment-14158448
 ] 

Hadoop QA commented on HBASE-12073:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12671988/HBASE-12073.patch
  against trunk revision .
  ATTACHMENT ID: 12671988

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+  verifyDenied(listTablesRestrictedAction, USER_CREATE, USER_RW, 
USER_RO, USER_NONE, TABLE_ADMIN);
+  LOG.error(error during call of AccessControlClient.getUserPermissions. 
 + e.getStackTrace());

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.mapreduce.TestSecureLoadIncrementalHFilesSplitRecovery
  org.apache.hadoop.hbase.master.TestDistributedLogSplitting

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11207//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11207//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11207//console

This message is automatically generated.

 Shell command user_permission fails on the table created by user if he is not 
 global admin.   
 --

 Key: HBASE-12073
 URL: https://issues.apache.org/jira/browse/HBASE-12073
 Project: HBase
  Issue Type: Bug
Reporter: Srikanth Srungarapu
Assignee: Srikanth Srungarapu
Priority: Minor
 Attachments: HBASE-12073.patch


 The command fails as the changes introduced by HBASE-10892 requires user 
 (because of newly introduced call to getTableDescriptors) to have global 
 admin permission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158455#comment-14158455
 ] 

stack commented on HBASE-12166:
---

The problem is that the meta region has no edits in it so when we list the 
filesystem to find crashed servers, we see this:

 476 2014-10-03 10:21:42,470 INFO  [ActiveMasterManager] 
master.ServerManager(918): Finished waiting for region servers count to settle; 
checked in 6, slept for 1012 ms, expecting minimum of 5, maximum of 6, master 
is running
 477 2014-10-03 10:21:42,471 INFO  [ActiveMasterManager] 
master.MasterFileSystem(253): Log folder 
hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61565,1412356895892 
belongs to an existing region server
 478 2014-10-03 10:21:42,471 INFO  [ActiveMasterManager] 
master.MasterFileSystem(249): Log folder 
hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61572,1412356895952 
doesn't belong to a known region server, splitting
 479 2014-10-03 10:21:42,471 INFO  [ActiveMasterManager] 
master.MasterFileSystem(253): Log folder 
hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61576,1412356896091 
belongs to an existing region server
 480 2014-10-03 10:21:42,471 INFO  [ActiveMasterManager] 
master.MasterFileSystem(253): Log folder 
hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61579,1412356896131 
belongs to an existing region server
 481 2014-10-03 10:21:42,471 INFO  [ActiveMasterManager] 
master.MasterFileSystem(253): Log folder 
hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61582,1412356896169 
belongs to an existing region server
 482 2014-10-03 10:21:42,471 INFO  [ActiveMasterManager] 
master.MasterFileSystem(253): Log folder 
hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61586,1412356896205 
belongs to an existing region server
 483 2014-10-03 10:21:42,471 INFO  [ActiveMasterManager] 
master.MasterFileSystem(253): Log folder 
hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61591,1412356896245 
belongs to an existing region server

i.e. all servers but the dead master which in this case is 
localhost,61562,1412356895859

When we go to online hbase:meta, it complains that there is no WAL file:

 501 2014-10-03 10:21:42,478 INFO  [ActiveMasterManager] 
master.MasterFileSystem(325): Log dir for server localhost,61562,1412356895859 
does not exist
 502 2014-10-03 10:21:42,479 DEBUG [ActiveMasterManager] 
master.MasterFileSystem(323): Renamed region directory: 
hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61572,1412356895952-splitting
 503 2014-10-03 10:21:42,479 INFO  [ActiveMasterManager] 
master.SplitLogManager(536): dead splitlog workers 
[localhost,61562,1412356895859, localhost,61572,1412356895952]
 504 2014-10-03 10:21:42,480 INFO  [ActiveMasterManager] 
master.SplitLogManager(172): 
hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61572,1412356895952-splitting
 is empty dir, no logs to split
 505 2014-10-03 10:21:42,480 DEBUG [ActiveMasterManager] 
master.SplitLogManager(235): Scheduling batch of logs to split
 506 2014-10-03 10:21:42,480 INFO  [ActiveMasterManager] 
master.SplitLogManager(237): started splitting 0 logs in 
[hdfs://localhost:58772/user/stack/hbase/WALs/localhost,61572,1412356895952-splitting]

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
Assignee: stack
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at

[jira] [Commented] (HBASE-12167) NPE in AssignmentManager


[ 
https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158461#comment-14158461
 ] 

Hudson commented on HBASE-12167:


FAILURE: Integrated in HBase-TRUNK #5616 (See 
[https://builds.apache.org/job/HBase-TRUNK/5616/])
HBASE-12167 NPE in AssignmentManager (jxiang: rev 
5375ff07bcb6451e45c09f23f010a4d051968896)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 NPE in AssignmentManager
 

 Key: HBASE-12167
 URL: https://issues.apache.org/jira/browse/HBASE-12167
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: hbase-12167.patch


 If we can't find a region plan, we should check.
 {noformat}
 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] 
 executor.EventHandler: Caught throwable while processing event 
 M_SERVER_SHUTDOWN
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12075) Preemptive Fast Fail


[ 
https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158470#comment-14158470
 ] 

Ted Yu commented on HBASE-12075:


Jenkins machine got rebooted. Here is the new run:
https://builds.apache.org/job/PreCommit-HBASE-Build/11209/console

 Preemptive Fast Fail
 

 Key: HBASE-12075
 URL: https://issues.apache.org/jira/browse/HBASE-12075
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Affects Versions: 0.99.0, 2.0.0, 0.98.6.1
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
 Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch, 
 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch


 In multi threaded clients, we use a feature developed on 0.89-fb branch 
 called Preemptive Fast Fail. This allows the client threads which would 
 potentially fail, fail fast. The idea behind this feature is that we allow, 
 among the hundreds of client threads, one thread to try and establish 
 connection with the regionserver and if that succeeds, we mark it as a live 
 node again. Meanwhile, other threads which are trying to establish connection 
 to the same server would ideally go into the timeouts which is effectively 
 unfruitful. We can in those cases return appropriate exceptions to those 
 clients instead of letting them retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11940) Add utility scripts for snapshotting / restoring all tables in cluster


 [ 
https://issues.apache.org/jira/browse/HBASE-11940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-11940:
---
Attachment: 11940-v1.txt

 Add utility scripts for snapshotting / restoring all tables in cluster
 --

 Key: HBASE-11940
 URL: https://issues.apache.org/jira/browse/HBASE-11940
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 11940-v1.txt, snapshot-all.sh, snapshot_restore.sh


 This JIRA is to provide script that snapshot all the tables in a cluster.
 Another script is to restore all the tables in cluster.
 Use cases include table backup prior to upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker


 [ 
https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virag Kothari updated HBASE-12136:
--
Attachment: HBASE-12136-0.98.patch

Thanks [~tedyu] for review.
Attached is the patch for 0.98


 Race condition between client adding tableCF replication znode and  server 
 triggering TableCFsTracker
 -

 Key: HBASE-12136
 URL: https://issues.apache.org/jira/browse/HBASE-12136
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.98.6
Reporter: Virag Kothari
Assignee: Virag Kothari
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch


 In ReplicationPeersZKImpl.addPeer(), there is a race between client creating 
 tableCf znode and the server triggering  TableCFsTracker. If the server wins, 
 it wont be able to read the data set on tableCF znode and  replication will 
 be misconfigured 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


 [ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HBASE-12166:
---

Assignee: Jimmy Xiang  (was: stack)

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
 This would seem to indicate that we successfully wrote zk that we are 
 recovering:
 {code}
 2014-10-03 01:57:47,672 DEBUG [MASTER_SERVER_OPERATIONS-asf900:37113-0]

[jira] [Commented] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker

[
https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158516#comment-14158516
]

Hadoop QA commented on HBASE-12136:
---

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12672836/HBASE-12136-0.98.patch
against trunk revision .
ATTACHMENT ID: 12672836

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/11210//console

This message is automatically generated.

Race condition between client adding tableCF replication znode and server
triggering TableCFsTracker
-

Key: HBASE-12136
URL: https://issues.apache.org/jira/browse/HBASE-12136
Project: HBase
Issue Type: Bug
Components: Replication
Affects Versions: 0.98.6
Reporter: Virag Kothari
Assignee: Virag Kothari
Fix For: 2.0.0, 0.98.7, 0.99.1

Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch

In ReplicationPeersZKImpl.addPeer(), there is a race between client creating
tableCf znode and the server triggering TableCFsTracker. If the server wins,
it wont be able to read the data set on tableCF znode and replication will
be misconfigured

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158515#comment-14158515
 ] 

Jimmy Xiang commented on HBASE-12166:
-

I think I found out the cause. In 
ZKSplitLogManagerCoordination#removeRecoveringRegions:

{noformat}
  listSize = failedServers.size();
  for (int j = 0; j  listSize; j++) {
{noformat}

The listSize is redefined. That's not a bug, it is a hidden bomb :)

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66):

[jira] [Updated] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


 [ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12166:

Attachment: hbase-12166.patch

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, hbase-12166.patch, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
 This would seem to indicate that we successfully wrote zk that we are 
 recovering:
 {code}
 2014-10-03 01:57:47,672 DEBUG [MASTER_SERVER_OPERATIONS-asf900:37113-0]

[jira] [Updated] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


 [ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12166:

Status: Patch Available  (was: Open)

Attached a simple patch. The test is ok locally now. Let's see what the jenkins 
says. Hope this is the last DLR bug.

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, hbase-12166.patch, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
 This would seem to indicate that we

[jira] [Commented] (HBASE-11625) Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum

2014-10-03 Thread Paul Fleetwood (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158526#comment-14158526
 ] 

Paul Fleetwood commented on HBASE-11625:


java version 1.6.0_65
Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609)
Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)

0.98.5-hadoop1, rUnknown, Mon Aug  4 23:39:24 PDT 2014

Hadoop 1.2.1
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 
-r 1503152
Compiled by mattf on Mon Jul 22 15:23:09 PDT 2013
From source with checksum 6923c86528809c4e7e6f493b6b413a9a

 Reading datablock throws Invalid HFile block magic and can not switch to 
 hdfs checksum 
 -

 Key: HBASE-11625
 URL: https://issues.apache.org/jira/browse/HBASE-11625
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.94.21, 0.98.4, 0.98.5
Reporter: qian wang
 Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz


 when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it 
 could happen file corruption but it only can switch to hdfs checksum 
 inputstream till validateBlockChecksum(). If the datablock's header corrupted 
 when b = new HFileBlock(),it throws the exception Invalid HFile block magic 
 and the rpc call fail



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


 [ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12166:

Component/s: wal

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test, wal
Reporter: stack
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, hbase-12166.patch, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
 This would seem to indicate that we successfully wrote zk that we are 
 recovering:
 {code}
 2014-10-03 01:57:47,672 DEBUG [MASTER_SERVER_OPERATIONS-asf900:37113-0]

[jira] [Commented] (HBASE-11625) Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum


[ 
https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158528#comment-14158528
 ] 

Andrew Purtell commented on HBASE-11625:


bq. The other interesting bit is that the test only starts failing after 
0.98.3. 

0.98.3 was released June 7. Here are commits touching o.a.h.h.io.hfile in 
hbase-server in 0.98 since that date, excluding changes to LRU given the above 
description: HBASE-8, HBASE-11437, HBASE 11586, HBASE-11331, HBASE-11845, 
HBASE-12059, HBASE-12076, HBASE-12123. I think we can suspect less later 
changes like HBASE-11331 and beyond if this is observable in releases like 
0.98.4.

 Reading datablock throws Invalid HFile block magic and can not switch to 
 hdfs checksum 
 -

 Key: HBASE-11625
 URL: https://issues.apache.org/jira/browse/HBASE-11625
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.94.21, 0.98.4, 0.98.5
Reporter: qian wang
 Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz


 when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it 
 could happen file corruption but it only can switch to hdfs checksum 
 inputstream till validateBlockChecksum(). If the datablock's header corrupted 
 when b = new HFileBlock(),it throws the exception Invalid HFile block magic 
 and the rpc call fail



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HBASE-11625) Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum

[
https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158528#comment-14158528
]

Andrew Purtell edited comment on HBASE-11625 at 10/3/14 9:12 PM:
-

bq. The other interesting bit is that the test only starts failing after
0.98.3.

0.98.3 was released June 7. Here are commits touching o.a.h.h.io.hfile in
hbase-server in 0.98 since that date, excluding changes to block cache given
the above description: HBASE-8, HBASE-11437, HBASE-11586, HBASE-11331,
HBASE-11845, HBASE-12059, HBASE-12076, HBASE-12123. I think we can suspect less
later changes like HBASE-11331 and beyond if this is observable in releases
like 0.98.4.

was (Author: apurtell):
bq. The other interesting bit is that the test only starts failing after
0.98.3.

0.98.3 was released June 7. Here are commits touching o.a.h.h.io.hfile in
hbase-server in 0.98 since that date, excluding changes to LRU given the above
description: HBASE-8, HBASE-11437, HBASE 11586, HBASE-11331, HBASE-11845,
HBASE-12059, HBASE-12076, HBASE-12123. I think we can suspect less later
changes like HBASE-11331 and beyond if this is observable in releases like
0.98.4.

Reading datablock throws Invalid HFile block magic and can not switch to
hdfs checksum
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12137) Alter table add cf doesn't do compression test


 [ 
https://issues.apache.org/jira/browse/HBASE-12137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virag Kothari updated HBASE-12137:
--
Attachment: HBASE-12137.patch
HBASE-12137-0.98.patch

Thanks [~jmspaggi] for review. 
Changed name to columnDescriptor. Also attached patch for 0.98

 Alter table add cf doesn't do compression test
 --

 Key: HBASE-12137
 URL: https://issues.apache.org/jira/browse/HBASE-12137
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.98.6
Reporter: Virag Kothari
Assignee: Virag Kothari
 Attachments: HBASE-12137-0.98.patch, HBASE-12137.patch, 
 HBASE-12137.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12137) Alter table add cf doesn't do compression test


 [ 
https://issues.apache.org/jira/browse/HBASE-12137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virag Kothari updated HBASE-12137:
--
Fix Version/s: 0.99.1
   0.98.7
   2.0.0

 Alter table add cf doesn't do compression test
 --

 Key: HBASE-12137
 URL: https://issues.apache.org/jira/browse/HBASE-12137
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.98.6
Reporter: Virag Kothari
Assignee: Virag Kothari
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-12137-0.98.patch, HBASE-12137.patch, 
 HBASE-12137.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker


 [ 
https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-12136:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Race condition between client adding tableCF replication znode and  server 
 triggering TableCFsTracker
 -

 Key: HBASE-12136
 URL: https://issues.apache.org/jira/browse/HBASE-12136
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.98.6
Reporter: Virag Kothari
Assignee: Virag Kothari
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch


 In ReplicationPeersZKImpl.addPeer(), there is a race between client creating 
 tableCf znode and the server triggering  TableCFsTracker. If the server wins, 
 it wont be able to read the data set on tableCF znode and  replication will 
 be misconfigured 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker


[ 
https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158556#comment-14158556
 ] 

Ted Yu commented on HBASE-12136:


There was a conflict in 
hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeersZKImpl.java
 which I resolved.

Integrated to 0.98, branch-1 and master

Thanks for the contribution, Virag

 Race condition between client adding tableCF replication znode and  server 
 triggering TableCFsTracker
 -

 Key: HBASE-12136
 URL: https://issues.apache.org/jira/browse/HBASE-12136
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.98.6
Reporter: Virag Kothari
Assignee: Virag Kothari
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch


 In ReplicationPeersZKImpl.addPeer(), there is a race between client creating 
 tableCf znode and the server triggering  TableCFsTracker. If the server wins, 
 it wont be able to read the data set on tableCF znode and  replication will 
 be misconfigured 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12126) Region server coprocessor endpoint


 [ 
https://issues.apache.org/jira/browse/HBASE-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virag Kothari updated HBASE-12126:
--
Attachment: HBASE-12126-0.98_1.patch

Updated the 98 patch with  SingletonCoprocessorService interface

 Region server coprocessor endpoint
 --

 Key: HBASE-12126
 URL: https://issues.apache.org/jira/browse/HBASE-12126
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors
Affects Versions: 0.98.6
Reporter: Virag Kothari
Assignee: Virag Kothari
 Attachments: HBASE-12126-0.98.patch, HBASE-12126-0.98_1.patch


 Utility to make endpoint calls against region server



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker


[ 
https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158575#comment-14158575
 ] 

Hudson commented on HBASE-12136:


FAILURE: Integrated in HBase-1.0 #271 (See 
[https://builds.apache.org/job/HBase-1.0/271/])
HBASE-12136 Race condition between client adding tableCF replication znode and  
server triggering TableCFsTracker (Virag Kothari) (tedyu: rev 
6b95b4a8a4a49dc7877271118c36d5e916d336ab)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeerZKImpl.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeersZKImpl.java


 Race condition between client adding tableCF replication znode and  server 
 triggering TableCFsTracker
 -

 Key: HBASE-12136
 URL: https://issues.apache.org/jira/browse/HBASE-12136
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.98.6
Reporter: Virag Kothari
Assignee: Virag Kothari
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch


 In ReplicationPeersZKImpl.addPeer(), there is a race between client creating 
 tableCf znode and the server triggering  TableCFsTracker. If the server wins, 
 it wont be able to read the data set on tableCF znode and  replication will 
 be misconfigured 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker


 [ 
https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-12136:


Virag:
Any idea of the test failure in branch-1 ?

I couldn't reproduce locally.

 Race condition between client adding tableCF replication znode and  server 
 triggering TableCFsTracker
 -

 Key: HBASE-12136
 URL: https://issues.apache.org/jira/browse/HBASE-12136
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.98.6
Reporter: Virag Kothari
Assignee: Virag Kothari
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch


 In ReplicationPeersZKImpl.addPeer(), there is a race between client creating 
 tableCf znode and the server triggering  TableCFsTracker. If the server wins, 
 it wont be able to read the data set on tableCF znode and  replication will 
 be misconfigured 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12075) Preemptive Fast Fail

[
https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158584#comment-14158584
]

Hadoop QA commented on HBASE-12075:
---

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12671882/0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch
against trunk revision .
ATTACHMENT ID: 12671882

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 8 new
or modified tests.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:green}+1 site{color}. The mvn site goal succeeds with this patch.

{color:red}-1 core tests{color}. The patch failed these unit tests:

org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpointNoMaster
org.apache.hadoop.hbase.master.TestDistributedLogSplitting

{color:red}-1 core zombie tests{color}. There are 1 zombie test(s):
at
org.apache.hadoop.hbase.master.TestMasterNoCluster.testNotPullingDeadRegionServerFromZK(TestMasterNoCluster.java:306)

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/11209//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11209//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/11209//console

This message is automatically generated.

Preemptive Fast Fail

Key: HBASE-12075
URL: https://issues.apache.org/jira/browse/HBASE-12075
Project: HBase
Issue Type: Sub-task
Components: Client
Affects Versions: 0.99.0, 2.0.0, 0.98.6.1
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch,
0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch,
0001-Implement-Preemptive-Fast-Fail.patch,
0001-Implement-Preemptive-Fast-Fail.patch,
0001-Implement-Preemptive-Fast-Fail.patch,
0001-Implement-Preemptive-Fast-Fail.patch,
0001-Implement-Preemptive-Fast-Fail.patch

In multi threaded clients, we use a feature developed on 0.89-fb branch
called Preemptive Fast Fail. This allows the client threads which would
potentially fail, fail fast. The idea behind this feature is that we allow,
among the hundreds of client threads, one thread to try and establish
connection with the regionserver and if that succeeds, we mark it as a live
node again. Meanwhile, other threads which are trying to establish connection
to the same server would ideally go into the timeouts which is effectively
unfruitful. We can in those cases return appropriate exceptions to those
clients instead of letting them retry.

--
This message was sent by Atlassian JIRA

[jira] [Commented] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker


[ 
https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158590#comment-14158590
 ] 

Virag Kothari commented on HBASE-12136:
---

In https://builds.apache.org/job/HBase-1.0/271, 
org.apache.hadoop.hbase.util.TestTableName.testValueOf  fails. That might be 
related to HBASE-12156

 Race condition between client adding tableCF replication znode and  server 
 triggering TableCFsTracker
 -

 Key: HBASE-12136
 URL: https://issues.apache.org/jira/browse/HBASE-12136
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.98.6
Reporter: Virag Kothari
Assignee: Virag Kothari
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch


 In ReplicationPeersZKImpl.addPeer(), there is a race between client creating 
 tableCf znode and the server triggering  TableCFsTracker. If the server wins, 
 it wont be able to read the data set on tableCF znode and  replication will 
 be misconfigured 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker


 [ 
https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-12136.

Resolution: Fixed

Oops, TestTableName is not related to the change in this JIRA.

 Race condition between client adding tableCF replication znode and  server 
 triggering TableCFsTracker
 -

 Key: HBASE-12136
 URL: https://issues.apache.org/jira/browse/HBASE-12136
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.98.6
Reporter: Virag Kothari
Assignee: Virag Kothari
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch


 In ReplicationPeersZKImpl.addPeer(), there is a race between client creating 
 tableCf znode and the server triggering  TableCFsTracker. If the server wins, 
 it wont be able to read the data set on tableCF znode and  replication will 
 be misconfigured 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12167) NPE in AssignmentManager


 [ 
https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12167:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Integrated into branch 1 and master. Thanks.

 NPE in AssignmentManager
 

 Key: HBASE-12167
 URL: https://issues.apache.org/jira/browse/HBASE-12167
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: hbase-12167.patch


 If we can't find a region plan, we should check.
 {noformat}
 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] 
 executor.EventHandler: Caught throwable while processing event 
 M_SERVER_SHUTDOWN
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12167) NPE in AssignmentManager


[ 
https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158598#comment-14158598
 ] 

Jimmy Xiang commented on HBASE-12167:
-

Checked in an addendum to fix TestMasterObserver.

 NPE in AssignmentManager
 

 Key: HBASE-12167
 URL: https://issues.apache.org/jira/browse/HBASE-12167
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: hbase-12167.patch


 If we can't find a region plan, we should check.
 {noformat}
 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] 
 executor.EventHandler: Caught throwable while processing event 
 M_SERVER_SHUTDOWN
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker


[ 
https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158610#comment-14158610
 ] 

Hudson commented on HBASE-12136:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #539 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/539/])
HBASE-12136 Race condition between client adding tableCF replication znode and  
server triggering TableCFsTracker (Virag Kothari) (tedyu: rev 
a9138d7f96910f09e52b226248ccb169c98d6bd4)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeersZKImpl.java


 Race condition between client adding tableCF replication znode and  server 
 triggering TableCFsTracker
 -

 Key: HBASE-12136
 URL: https://issues.apache.org/jira/browse/HBASE-12136
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.98.6
Reporter: Virag Kothari
Assignee: Virag Kothari
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch


 In ReplicationPeersZKImpl.addPeer(), there is a race between client creating 
 tableCf znode and the server triggering  TableCFsTracker. If the server wins, 
 it wont be able to read the data set on tableCF znode and  replication will 
 be misconfigured 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12075) Preemptive Fast Fail


[ 
https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158613#comment-14158613
 ] 

Manukranth Kolloju commented on HBASE-12075:


Is 900 for surefire.timeout too low?

 Preemptive Fast Fail
 

 Key: HBASE-12075
 URL: https://issues.apache.org/jira/browse/HBASE-12075
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Affects Versions: 0.99.0, 2.0.0, 0.98.6.1
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
 Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch, 
 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch, 
 0001-Implement-Preemptive-Fast-Fail.patch


 In multi threaded clients, we use a feature developed on 0.89-fb branch 
 called Preemptive Fast Fail. This allows the client threads which would 
 potentially fail, fail fast. The idea behind this feature is that we allow, 
 among the hundreds of client threads, one thread to try and establish 
 connection with the regionserver and if that succeeds, we mark it as a live 
 node again. Meanwhile, other threads which are trying to establish connection 
 to the same server would ideally go into the timeouts which is effectively 
 unfruitful. We can in those cases return appropriate exceptions to those 
 clients instead of letting them retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork

[
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158621#comment-14158621
]

Hadoop QA commented on HBASE-12166:
---

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12672838/hbase-12166.patch
against trunk revision .
ATTACHMENT ID: 12672838

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:green}+1 site{color}. The mvn site goal succeeds with this patch.

{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.coprocessor.TestMasterObserver

org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpoint

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//console

This message is automatically generated.

TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
---

Key: HBASE-12166
URL: https://issues.apache.org/jira/browse/HBASE-12166
Project: HBase
Issue Type: Bug
Components: test, wal
Reporter: stack
Assignee: Jimmy Xiang
Fix For: 2.0.0, 0.99.1

Attachments: 12166.txt, hbase-12166.patch, log.txt

See
https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
The namespace region gets stuck. It is never 'recovered' even though we have
finished log splitting. Here is the main exception:
{code}
4941 2014-10-03 02:00:36,862 DEBUG
[B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111):
B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service:
ClientService methodName: Get
size: 99 connection: 67.195.81.144:44526
4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException:
hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
4943 at
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
4944 at
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
4945 at
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
4946

[jira] [Updated] (HBASE-11764) Support per cell TTLs


 [ 
https://issues.apache.org/jira/browse/HBASE-11764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-11764:
---
Attachment: HBASE-11764-0.98.patch

 Support per cell TTLs
 -

 Key: HBASE-11764
 URL: https://issues.apache.org/jira/browse/HBASE-11764
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-11764-0.98.patch, HBASE-11764-0.98.patch, 
 HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, 
 HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11764) Support per cell TTLs


 [ 
https://issues.apache.org/jira/browse/HBASE-11764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-11764:
---
Status: Patch Available  (was: Open)

 Support per cell TTLs
 -

 Key: HBASE-11764
 URL: https://issues.apache.org/jira/browse/HBASE-11764
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-11764-0.98.patch, HBASE-11764-0.98.patch, 
 HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, 
 HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, 
 HBASE-11764.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11764) Support per cell TTLs


 [ 
https://issues.apache.org/jira/browse/HBASE-11764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-11764:
---
Attachment: HBASE-11764.patch

Updated patches for master and 0.98 that adjust implementation of cell TTLs to 
avoid changes to ColumnTrackers (HBASE-11763, moved out)

 Support per cell TTLs
 -

 Key: HBASE-11764
 URL: https://issues.apache.org/jira/browse/HBASE-11764
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-11764-0.98.patch, HBASE-11764-0.98.patch, 
 HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, 
 HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, 
 HBASE-11764.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158641#comment-14158641
 ] 

Jimmy Xiang commented on HBASE-12166:
-

TestMasterObserver should be fixed by the addendumo of HBASE-12167.

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test, wal
Reporter: stack
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, hbase-12166.patch, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
 This would seem to indicate that we successfully wrote zk that we are

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158645#comment-14158645
 ] 

Jimmy Xiang commented on HBASE-12166:
-

[~stack], [~jeffreyz], could you take a look the patch? Thanks.

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test, wal
Reporter: stack
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, hbase-12166.patch, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
 This would seem to indicate that we successfully wrote zk that we are

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158649#comment-14158649
 ] 

Jimmy Xiang commented on HBASE-12166:
-

TestRegionReplicaReplicationEndpoint is ok locally. I can increase the timeout 
a little at checkin (from 1000 to 6000?).

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test, wal
Reporter: stack
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, hbase-12166.patch, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
 This would seem

[jira] [Commented] (HBASE-12137) Alter table add cf doesn't do compression test

[
https://issues.apache.org/jira/browse/HBASE-12137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158655#comment-14158655
]

Hadoop QA commented on HBASE-12137:
---

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12672842/HBASE-12137.patch
against trunk revision .
ATTACHMENT ID: 12672842

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:green}+1 site{color}. The mvn site goal succeeds with this patch.

{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.util.TestHBaseFsck
org.apache.hadoop.hbase.master.TestDistributedLogSplitting

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/11212//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11212//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/11212//console

This message is automatically generated.

Alter table add cf doesn't do compression test
--

Key: HBASE-12137
URL: https://issues.apache.org/jira/browse/HBASE-12137
Project: HBase
Issue Type: Bug
Components: master
Affects Versions: 0.98.6
Reporter: Virag Kothari
Assignee: Virag Kothari
Fix For: 2.0.0, 0.98.7, 0.99.1

Attachments: HBASE-12137-0.98.patch, HBASE-12137.patch,
HBASE-12137.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158515#comment-14158515
 ] 

Jimmy Xiang edited comment on HBASE-12166 at 10/3/14 11:08 PM:
---

I think I found out the cause. In 
ZKSplitLogManagerCoordination#removeRecoveringRegions:

{noformat}
  listSize = failedServers.size();
  for (int j = 0; j  listSize; j++) {
{noformat}

The listSize is redefined.


was (Author: jxiang):
I think I found out the cause. In 
ZKSplitLogManagerCoordination#removeRecoveringRegions:

{noformat}
  listSize = failedServers.size();
  for (int j = 0; j  listSize; j++) {
{noformat}

The listSize is redefined. That's not a bug, it is a hidden bomb :)

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test, wal
Reporter: stack
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, hbase-12166.patch, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03

[jira] [Updated] (HBASE-11764) Support per cell TTLs


 [ 
https://issues.apache.org/jira/browse/HBASE-11764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-11764:
---
Status: Open  (was: Patch Available)

 Support per cell TTLs
 -

 Key: HBASE-11764
 URL: https://issues.apache.org/jira/browse/HBASE-11764
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-11764-0.98.patch, HBASE-11764-0.98.patch, 
 HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, 
 HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, 
 HBASE-11764.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12136) Race condition between client adding tableCF replication znode and server triggering TableCFsTracker


[ 
https://issues.apache.org/jira/browse/HBASE-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158659#comment-14158659
 ] 

Hudson commented on HBASE-12136:


FAILURE: Integrated in HBase-TRUNK #5617 (See 
[https://builds.apache.org/job/HBase-TRUNK/5617/])
HBASE-12136 Race condition between client adding tableCF replication znode and  
server triggering TableCFsTracker (Virag Kothari) (tedyu: rev 
efe0787c87ca03e548bec13d8ae24200f582b438)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeerZKImpl.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeersZKImpl.java


 Race condition between client adding tableCF replication znode and  server 
 triggering TableCFsTracker
 -

 Key: HBASE-12136
 URL: https://issues.apache.org/jira/browse/HBASE-12136
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.98.6
Reporter: Virag Kothari
Assignee: Virag Kothari
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-12136-0.98.patch, HBASE-12136.patch


 In ReplicationPeersZKImpl.addPeer(), there is a race between client creating 
 tableCf znode and the server triggering  TableCFsTracker. If the server wins, 
 it wont be able to read the data set on tableCF znode and  replication will 
 be misconfigured 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12167) NPE in AssignmentManager


[ 
https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158660#comment-14158660
 ] 

Hudson commented on HBASE-12167:


FAILURE: Integrated in HBase-TRUNK #5617 (See 
[https://builds.apache.org/job/HBase-TRUNK/5617/])
HBASE-12167 addendum; fix TestMasterObserver (jxiang: rev 
dbef2bdafe5500c0abc8fc61d3539d3b7a2132b9)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java


 NPE in AssignmentManager
 

 Key: HBASE-12167
 URL: https://issues.apache.org/jira/browse/HBASE-12167
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: hbase-12167.patch


 If we can't find a region plan, we should check.
 {noformat}
 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] 
 executor.EventHandler: Caught throwable while processing event 
 M_SERVER_SHUTDOWN
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158663#comment-14158663
 ] 

stack commented on HBASE-12166:
---

[~jxiang] Yeah, thats it. I just ran into it (Didn't believe it...). Test 
passed for me when I made the change,  +1 and +1 to upping timeout (Am checking 
other uses of 'listSize' -- smile).

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test, wal
Reporter: stack
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, hbase-12166.patch, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region:

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork

2014-10-03 Thread Jeffrey Zhong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158667#comment-14158667
 ] 

Jeffrey Zhong commented on HBASE-12166:
---

[~jxiang]Good catch! Looks good to me(+1). Better change the variable name 
listSize2 to tmpFailedServerSizse though.

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test, wal
Reporter: stack
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, hbase-12166.patch, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
 This would seem

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158673#comment-14158673
 ] 

stack commented on HBASE-12166:
---

There is another bomb in that same class [~jxiang]


 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test, wal
Reporter: stack
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, hbase-12166.patch, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
 This would seem to indicate that we successfully wrote zk that we are 
 recovering:
 {code}

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork


[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158679#comment-14158679
 ] 

Jimmy Xiang commented on HBASE-12166:
-

[~stack], good catch! Unbeliveable!

 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
 ---

 Key: HBASE-12166
 URL: https://issues.apache.org/jira/browse/HBASE-12166
 Project: HBase
  Issue Type: Bug
  Components: test, wal
Reporter: stack
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1

 Attachments: 12166.txt, hbase-12166.patch, log.txt


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
 The namespace region gets stuck.  It is never 'recovered' even though we have 
 finished log splitting.  Here is the main exception:
 {code}
 4941 2014-10-03 02:00:36,862 DEBUG 
 [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
 B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
 ClientService methodName: Get
   size: 99 connection: 67.195.81.144:44526
 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
 hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
 4943   at 
 org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
 4944   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
 4945   at 
 org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
 4948   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
 4949   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 4952   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 4954   at java.lang.Thread.run(Thread.java:744)  
 {code}
 See how we've finished log splitting long time previous:
 {code}
 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
 master.SplitLogManager(294): finished splitting (more than or equal to) 
 197337 bytes in 1 log files in 
 [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
  in 379ms
 {code}
 If I grep for the deleting of znodes on recovery, which is when we set the 
 recovering flag to false, I see a bunch of regions but not my namespace one:
 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
 znode deleted. Region: 1588230740 completes recovery.
 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
 Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
 Region: 41d438848305831b61d708a406d5ecde completes recovery.
 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
 Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
 Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
 Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
 Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
 zookeeper.RecoveringRegionWatcher(66): 
 /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
 Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
 This would seem to indicate that we successfully wrote zk that we are 
 recovering:
 {code}

[jira] [Updated] (HBASE-11764) Support per cell TTLs


 [ 
https://issues.apache.org/jira/browse/HBASE-11764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-11764:
---
Attachment: HBASE-11764-0.98.patch

 Support per cell TTLs
 -

 Key: HBASE-11764
 URL: https://issues.apache.org/jira/browse/HBASE-11764
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: HBASE-11764-0.98.patch, HBASE-11764-0.98.patch, 
 HBASE-11764-0.98.patch, HBASE-11764.patch, HBASE-11764.patch, 
 HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch, 
 HBASE-11764.patch, HBASE-11764.patch, HBASE-11764.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork