[jira] [Commented] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-11-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815743#comment-13815743
 ] 

Lars Hofhansl commented on HBASE-9778:
--

Some more numbers with other hardcoded improvements indicate that some Phoenix 
queries can run over 3x as fast (8.8s instead of 27s).
The challenge is now to keep the improvements from HBASE-4433 while also 
improve other scenarios. A new config option is probably not avoidable.


> Avoid seeking to next column in ExplicitColumnTracker when possible
> ---
>
> Key: HBASE-9778
> URL: https://issues.apache.org/jira/browse/HBASE-9778
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
> Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 
> 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk.txt
>
>
> The issue of slow seeking in ExplicitColumnTracker was brought up by 
> [~vrodionov] on the dev list.
> My idea here is to avoid the seeking if we know that there aren't many 
> versions to skip.
> How do we know? We'll use the column family's VERSIONS setting as a hint. If 
> VERSIONS is set to 1 (or maybe some value < 10) we'll avoid the seek and call 
> SKIP repeatedly.
> HBASE-9769 has some initial number for this approach:
> Interestingly it depends on which column(s) is (are) selected.
> Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
> everything filtered at the server with a ValueFilter. Everything measured in 
> seconds.
> Without patch:
> ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
> |6.4|8.5|14.3|14.6|11.1|20.3|
> With patch:
> ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
> |6.4|8.4|8.9|9.9|6.4|10.0|
> Variation here was +- 0.2s.
> So with this patch scanning is 2x faster than without in some cases, and 
> never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (HBASE-9893) Incorrect assert condition in OrderedBytes decoding

2013-11-06 Thread He Liangliang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Liangliang reassigned HBASE-9893:


Assignee: Nick Dimiduk  (was: He Liangliang)

> Incorrect assert condition in OrderedBytes decoding
> ---
>
> Key: HBASE-9893
> URL: https://issues.apache.org/jira/browse/HBASE-9893
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.96.0
>Reporter: He Liangliang
>Assignee: Nick Dimiduk
>Priority: Minor
> Attachments: HBASE-9893.patch
>
>
> The following assert condition is incorrect when decoding blob var byte array.
> {code}
> assert t == 0 : "Unexpected bits remaining after decoding blob.";
> {code}
> When the number of bytes to decode is multiples of 8 (i.e the original number 
> of bytes is multiples of 7), this assert may fail.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-7403) Online Merge

2013-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815714#comment-13815714
 ] 

stack commented on HBASE-7403:
--

[~asafm] yes

> Online Merge
> 
>
> Key: HBASE-7403
> URL: https://issues.apache.org/jira/browse/HBASE-7403
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.95.0
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.98.0, 0.95.0
>
> Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403-v5.txt, 
> 7403v5.diff, 7403v5.txt, hbase-7403-0.95.patch, hbase-7403-94v1.patch, 
> hbase-7403-trunkv1.patch, hbase-7403-trunkv10.patch, 
> hbase-7403-trunkv11.patch, hbase-7403-trunkv12.patch, 
> hbase-7403-trunkv13.patch, hbase-7403-trunkv14.patch, 
> hbase-7403-trunkv15.patch, hbase-7403-trunkv16.patch, 
> hbase-7403-trunkv19.patch, hbase-7403-trunkv20.patch, 
> hbase-7403-trunkv22.patch, hbase-7403-trunkv23.patch, 
> hbase-7403-trunkv24.patch, hbase-7403-trunkv26.patch, 
> hbase-7403-trunkv28.patch, hbase-7403-trunkv29.patch, 
> hbase-7403-trunkv30.patch, hbase-7403-trunkv31.patch, 
> hbase-7403-trunkv32.patch, hbase-7403-trunkv33.patch, 
> hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, hbase-7403-trunkv7.patch, 
> hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, merge region.pdf
>
>
> Support executing region merge transaction on Regionserver, similar with 
> split transaction
> Process of merging two regions:
> a.client sends RPC (dispatch merging regions) to master
> b.master moves the regions together (on the same regionserver where the more 
> heavily loaded region resided)
> c.master sends RPC (merge regions) to this regionserver
> d.Regionserver executes the region merge transaction in the thread pool
> e.the above b,c,d run asynchronously
> Process of region merge transaction:
> a.Construct a new region merge transaction.
> b.prepare for the merge transaction, the transaction will be canceled if it 
> is unavailable, 
> e.g. two regions don't belong to same table; two regions are not adjacent in 
> a non-compulsory merge; region is closed or has reference
> c.execute the transaction as the following:
> /**
>  * Set region as in transition, set it into MERGING state.
>  */
> SET_MERGING_IN_ZK,
> /**
>  * We created the temporary merge data directory.
>  */
> CREATED_MERGE_DIR,
> /**
>  * Closed the merging region A.
>  */
> CLOSED_REGION_A,
> /**
>  * The merging region A has been taken out of the server's online regions 
> list.
>  */
> OFFLINED_REGION_A,
> /**
>  * Closed the merging region B.
>  */
> CLOSED_REGION_B,
> /**
>  * The merging region B has been taken out of the server's online regions 
> list.
>  */
> OFFLINED_REGION_B,
> /**
>  * Started in on creation of the merged region.
>  */
> STARTED_MERGED_REGION_CREATION,
> /**
>  * Point of no return. If we got here, then transaction is not recoverable
>  * other than by crashing out the regionserver.
>  */
> PONR
> d.roll back if step c throws exception
> Usage:
> HBaseAdmin#mergeRegions
> See more details from the patch



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9912) Need to delete a row based on partial rowkey in hbase ... Pls provide query for that

2013-11-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815709#comment-13815709
 ] 

Lars Hofhansl commented on HBASE-9912:
--

Also, this is not really possible. I tried this a while ago and failed. Even 
blogged about that failure: 
http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html

(In a nutshell we'd break seeking, since HBase would have no way of knowing how 
many KVs before the seek key it would have to seek in order to determine 
whether KVs following the seek key are marked for deletion).

> Need to delete a row based on partial rowkey in hbase ... Pls provide query 
> for that 
> -
>
> Key: HBASE-9912
> URL: https://issues.apache.org/jira/browse/HBASE-9912
> Project: HBase
>  Issue Type: Bug
>Reporter: ranjini
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9902) Region Server is starting normally even if clock skew is more than default 30 seconds(or any configured). -> Regionserver node time is greater than master node time

2013-11-06 Thread Kashif J S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kashif J S updated HBASE-9902:
--

Attachment: HBASE-9902.patch

Patch for absolute value for clock skew detection. For 0.98.0 and 0.96.0 
versions

> Region Server is starting normally even if clock skew is more than default 30 
> seconds(or any configured). -> Regionserver node time is greater than master 
> node time
> 
>
> Key: HBASE-9902
> URL: https://issues.apache.org/jira/browse/HBASE-9902
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.11
>Reporter: Kashif J S
> Fix For: 0.98.0, 0.96.0
>
> Attachments: HBASE-9902.patch
>
>
> When Region server's time is ahead of Master's time and the difference is 
> more than hbase.master.maxclockskew value, region server startup is not 
> failing with ClockOutOfSyncException.
> This causes some abnormal behavior as detected by our Tests.
> ServerManager.java#checkClockSkew
>   long skew = System.currentTimeMillis() - serverCurrentTime;
> if (skew > maxSkew) {
>   String message = "Server " + serverName + " has been " +
> "rejected; Reported time is too far out of sync with master.  " +
> "Time difference of " + skew + "ms > max allowed of " + maxSkew + 
> "ms";
>   LOG.warn(message);
>   throw new ClockOutOfSyncException(message);
> }
> Above line results in negative value when Master's time is lesser than 
> region server time and  " if (skew > maxSkew) " check fails to find the skew 
> in this case.
> Please Note: This was tested in hbase 0.94.11 version and the trunk also 
> currently has the same logic.
> The fix for the same would be to make the skew positive value first as below:
>  long skew = System.currentTimeMillis() - serverCurrentTime;
> skew = (skew < 0 ? -skew : skew);
> if (skew > maxSkew) {.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9902) Region Server is starting normally even if clock skew is more than default 30 seconds(or any configured). -> Regionserver node time is greater than master node time

2013-11-06 Thread Kashif J S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kashif J S updated HBASE-9902:
--

Fix Version/s: 0.96.0
   0.98.0

> Region Server is starting normally even if clock skew is more than default 30 
> seconds(or any configured). -> Regionserver node time is greater than master 
> node time
> 
>
> Key: HBASE-9902
> URL: https://issues.apache.org/jira/browse/HBASE-9902
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.11
>Reporter: Kashif J S
> Fix For: 0.98.0, 0.96.0
>
>
> When Region server's time is ahead of Master's time and the difference is 
> more than hbase.master.maxclockskew value, region server startup is not 
> failing with ClockOutOfSyncException.
> This causes some abnormal behavior as detected by our Tests.
> ServerManager.java#checkClockSkew
>   long skew = System.currentTimeMillis() - serverCurrentTime;
> if (skew > maxSkew) {
>   String message = "Server " + serverName + " has been " +
> "rejected; Reported time is too far out of sync with master.  " +
> "Time difference of " + skew + "ms > max allowed of " + maxSkew + 
> "ms";
>   LOG.warn(message);
>   throw new ClockOutOfSyncException(message);
> }
> Above line results in negative value when Master's time is lesser than 
> region server time and  " if (skew > maxSkew) " check fails to find the skew 
> in this case.
> Please Note: This was tested in hbase 0.94.11 version and the trunk also 
> currently has the same logic.
> The fix for the same would be to make the skew positive value first as below:
>  long skew = System.currentTimeMillis() - serverCurrentTime;
> skew = (skew < 0 ? -skew : skew);
> if (skew > maxSkew) {.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8741) Scope sequenceid to the region rather than regionserver (WAS: Mutations on Regions in recovery mode might have same sequenceIDs)

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815707#comment-13815707
 ] 

Hadoop QA commented on HBASE-8741:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12612521/HBASE-8741-trunk-v6.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 36 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7772//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7772//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7772//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7772//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7772//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7772//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7772//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7772//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7772//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7772//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7772//console

This message is automatically generated.

> Scope sequenceid to the region rather than regionserver (WAS: Mutations on 
> Regions in recovery mode might have same sequenceIDs)
> 
>
> Key: HBASE-8741
> URL: https://issues.apache.org/jira/browse/HBASE-8741
> Project: HBase
>  Issue Type: Bug
>  Components: MTTR
>Affects Versions: 0.95.1
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Fix For: 0.98.0
>
> Attachments: HBASE-8741-trunk-v6.1-rebased.patch, 
> HBASE-8741-trunk-v6.2.1.patch, HBASE-8741-trunk-v6.2.2.patch, 
> HBASE-8741-trunk-v6.2.2.patch, HBASE-8741-trunk-v6.3.patch, 
> HBASE-8741-trunk-v6.4.patch, HBASE-8741-trunk-v6.patch, HBASE-8741-v0.patch, 
> HBASE-8741-v2.patch, HBASE-8741-v3.patch, HBASE-8741-v4-again.patch, 
> HBASE-8741-v4-again.patch, HBASE-8741-v4.patch, HBASE-8741-v5-again.patch, 
> HBASE-8741-v5.patch
>
>
> Currently, when opening a region, we find the maximum sequence ID from all 
> its HFiles and then set the LogSequenceId of the log (in case the later is at 
> a small value). This works good in recovered.edits case as we are not writing 
> to the region until we have replayed all of its previous edits. 
> With distributed log replay, if we want to enable writes while a region is 
> under recovery, we need to make sure that the logSequenceId > maximum 
> logSequenceId of the old regionserver. Otherwise, we might have a situation 
> where new edits have same (or smaller) sequenceIds. 
> We can store region level information in the WALTrailer, than this scenario 
> could be avoided by:
> a) reading the trailer of the "last completed" file, i.e., last wal file 
> which has a trailer and,
> b) completely reading the last wal file (this file would not

[jira] [Commented] (HBASE-7403) Online Merge

2013-11-06 Thread Asaf Mesika (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815706#comment-13815706
 ] 

Asaf Mesika commented on HBASE-7403:


0.96.0 has this too?

> Online Merge
> 
>
> Key: HBASE-7403
> URL: https://issues.apache.org/jira/browse/HBASE-7403
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.95.0
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.98.0, 0.95.0
>
> Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403-v5.txt, 
> 7403v5.diff, 7403v5.txt, hbase-7403-0.95.patch, hbase-7403-94v1.patch, 
> hbase-7403-trunkv1.patch, hbase-7403-trunkv10.patch, 
> hbase-7403-trunkv11.patch, hbase-7403-trunkv12.patch, 
> hbase-7403-trunkv13.patch, hbase-7403-trunkv14.patch, 
> hbase-7403-trunkv15.patch, hbase-7403-trunkv16.patch, 
> hbase-7403-trunkv19.patch, hbase-7403-trunkv20.patch, 
> hbase-7403-trunkv22.patch, hbase-7403-trunkv23.patch, 
> hbase-7403-trunkv24.patch, hbase-7403-trunkv26.patch, 
> hbase-7403-trunkv28.patch, hbase-7403-trunkv29.patch, 
> hbase-7403-trunkv30.patch, hbase-7403-trunkv31.patch, 
> hbase-7403-trunkv32.patch, hbase-7403-trunkv33.patch, 
> hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, hbase-7403-trunkv7.patch, 
> hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, merge region.pdf
>
>
> Support executing region merge transaction on Regionserver, similar with 
> split transaction
> Process of merging two regions:
> a.client sends RPC (dispatch merging regions) to master
> b.master moves the regions together (on the same regionserver where the more 
> heavily loaded region resided)
> c.master sends RPC (merge regions) to this regionserver
> d.Regionserver executes the region merge transaction in the thread pool
> e.the above b,c,d run asynchronously
> Process of region merge transaction:
> a.Construct a new region merge transaction.
> b.prepare for the merge transaction, the transaction will be canceled if it 
> is unavailable, 
> e.g. two regions don't belong to same table; two regions are not adjacent in 
> a non-compulsory merge; region is closed or has reference
> c.execute the transaction as the following:
> /**
>  * Set region as in transition, set it into MERGING state.
>  */
> SET_MERGING_IN_ZK,
> /**
>  * We created the temporary merge data directory.
>  */
> CREATED_MERGE_DIR,
> /**
>  * Closed the merging region A.
>  */
> CLOSED_REGION_A,
> /**
>  * The merging region A has been taken out of the server's online regions 
> list.
>  */
> OFFLINED_REGION_A,
> /**
>  * Closed the merging region B.
>  */
> CLOSED_REGION_B,
> /**
>  * The merging region B has been taken out of the server's online regions 
> list.
>  */
> OFFLINED_REGION_B,
> /**
>  * Started in on creation of the merged region.
>  */
> STARTED_MERGED_REGION_CREATION,
> /**
>  * Point of no return. If we got here, then transaction is not recoverable
>  * other than by crashing out the regionserver.
>  */
> PONR
> d.roll back if step c throws exception
> Usage:
> HBaseAdmin#mergeRegions
> See more details from the patch



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815697#comment-13815697
 ] 

Hadoop QA commented on HBASE-9895:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612519/hbase-9895.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:488)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7771//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7771//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7771//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7771//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7771//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7771//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7771//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7771//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7771//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7771//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7771//console

This message is automatically generated.

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HBASE-9912) Need to delete a row based on partial rowkey in hbase ... Pls provide query for that

2013-11-06 Thread Gary Helmling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling resolved HBASE-9912.
--

Resolution: Invalid

This is a question, not a bug. Please email u...@hbase.apache.org with 
questions.  JIRA is for actual bug reports, improvements, etc.

See http://hbase.apache.org/mail-lists.html

> Need to delete a row based on partial rowkey in hbase ... Pls provide query 
> for that 
> -
>
> Key: HBASE-9912
> URL: https://issues.apache.org/jira/browse/HBASE-9912
> Project: HBase
>  Issue Type: Bug
>Reporter: ranjini
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9912) Need to delete a row based on partial rowkey in hbase ... Pls provide query for that

2013-11-06 Thread ranjini (JIRA)
ranjini created HBASE-9912:
--

 Summary: Need to delete a row based on partial rowkey in hbase ... 
Pls provide query for that 
 Key: HBASE-9912
 URL: https://issues.apache.org/jira/browse/HBASE-9912
 Project: HBase
  Issue Type: Bug
Reporter: ranjini
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9907) Rig to fake a cluster so can profile client behaviors

2013-11-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9907:
-

Attachment: 9907.txt

> Rig to fake a cluster so can profile client behaviors
> -
>
> Key: HBASE-9907
> URL: https://issues.apache.org/jira/browse/HBASE-9907
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Affects Versions: 0.96.0
>Reporter: stack
>Assignee: stack
> Fix For: 0.98.0, 0.96.1
>
> Attachments: 9907.txt
>
>
> Patch carried over from HBASE-9775 parent issue.  Adds to the 
> TestClientNoCluster#main a rig that allows faking many clients against a few 
> servers and the opposite.  Useful for studying client operation.
> Includes a few changes to pb makings to try and save on a few creations.
> Also has an edit of javadoc on how to create an HConnection and HTable trying 
> to be more forceful about pointing you in right direction ([~lhofhansl] -- 
> mind reviewing these javadoc changes?)
> I have a +1 already on this patch up in parent issue.  Will run by hadoopqa 
> to make sure all good before commit.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9885) Avoid some Result creation in protobuf conversions

2013-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815671#comment-13815671
 ] 

Hudson commented on HBASE-9885:
---

FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #829 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/829/])
HBASE-9885 Avoid some Result creation in protobuf conversions - REVERT to check 
the cause of precommit flakiness (nkeywal: rev 1539492)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
HBASE-9885 Avoid some Result creation in protobuf conversions (nkeywal: rev 
1539429)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java


> Avoid some Result creation in protobuf conversions
> --
>
> Key: HBASE-9885
> URL: https://issues.apache.org/jira/browse/HBASE-9885
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Protobufs, regionserver
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.98.0, 0.96.1
>
> Attachments: 9885.v1.patch, 9885.v2, 9885.v2.patch, 9885.v3.patch, 
> 9885.v3.patch
>
>
> We creates a lot of Result that we could avoid, as they contain nothing else 
> than a boolean value. We create sometimes a protobuf builder as well on this 
> path, this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9001) TestThriftServerCmdLine.testRunThriftServer[0] failed

2013-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815672#comment-13815672
 ] 

Hudson commented on HBASE-9001:
---

FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #829 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/829/])
HBASE-9001 Add a toString in HTable, fix a log in AssignmentManager (nkeywal: 
rev 1539425)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


> TestThriftServerCmdLine.testRunThriftServer[0] failed
> -
>
> Key: HBASE-9001
> URL: https://issues.apache.org/jira/browse/HBASE-9001
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 0.95.2
>
> Attachments: 9001.txt
>
>
> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/624/testReport/junit/org.apache.hadoop.hbase.thrift/TestThriftServerCmdLine/testRunThriftServer_0_/
> It seems stuck here:
> {code}
> 2013-07-19 03:52:03,158 INFO  [Thread-131] 
> thrift.TestThriftServerCmdLine(132): Starting HBase Thrift server with 
> command line: -hsha -port 56708 start
> 2013-07-19 03:52:03,174 INFO  [ThriftServer-cmdline] 
> thrift.ThriftServerRunner$ImplType(208): Using thrift server type hsha
> 2013-07-19 03:52:03,205 WARN  [ThriftServer-cmdline] conf.Configuration(817): 
> fs.default.name is deprecated. Instead, use fs.defaultFS
> 2013-07-19 03:52:03,206 WARN  [ThriftServer-cmdline] conf.Configuration(817): 
> mapreduce.job.counters.limit is deprecated. Instead, use 
> mapreduce.job.counters.max
> 2013-07-19 03:52:03,207 WARN  [ThriftServer-cmdline] conf.Configuration(817): 
> io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
> 2013-07-19 03:54:03,156 INFO  [pool-1-thread-1] hbase.ResourceChecker(171): 
> after: thrift.TestThriftServerCmdLine#testRunThriftServer[0] Thread=146 (was 
> 155), OpenFileDescriptor=295 (was 311), MaxFileDescriptor=4096 (was 4096), 
> SystemLoadAverage=293 (was 240) - SystemLoadAverage LEAK? -, ProcessCount=145 
> (was 143) - ProcessCount LEAK? -, AvailableMemoryMB=779 (was 1263), 
> ConnectionCount=4 (was 4)
> 2013-07-19 03:54:03,157 DEBUG [pool-1-thread-1] 
> thrift.TestThriftServerCmdLine(107): implType=-hsha, specifyFramed=false, 
> specifyBindIP=false, specifyCompact=true
> {code}
> My guess is that we didn't get scheduled because load was almost 300 on this 
> box at the time?
> Let me up the timeout of two minutes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9902) Region Server is starting normally even if clock skew is more than default 30 seconds(or any configured). -> Regionserver node time is greater than master node time

2013-11-06 Thread Jyothi Mandava (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815661#comment-13815661
 ] 

Jyothi Mandava commented on HBASE-9902:
---

Kashif will submit the patch soon for 0.94, 0.96 and trunk versions

> Region Server is starting normally even if clock skew is more than default 30 
> seconds(or any configured). -> Regionserver node time is greater than master 
> node time
> 
>
> Key: HBASE-9902
> URL: https://issues.apache.org/jira/browse/HBASE-9902
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.11
>Reporter: Kashif J S
>
> When Region server's time is ahead of Master's time and the difference is 
> more than hbase.master.maxclockskew value, region server startup is not 
> failing with ClockOutOfSyncException.
> This causes some abnormal behavior as detected by our Tests.
> ServerManager.java#checkClockSkew
>   long skew = System.currentTimeMillis() - serverCurrentTime;
> if (skew > maxSkew) {
>   String message = "Server " + serverName + " has been " +
> "rejected; Reported time is too far out of sync with master.  " +
> "Time difference of " + skew + "ms > max allowed of " + maxSkew + 
> "ms";
>   LOG.warn(message);
>   throw new ClockOutOfSyncException(message);
> }
> Above line results in negative value when Master's time is lesser than 
> region server time and  " if (skew > maxSkew) " check fails to find the skew 
> in this case.
> Please Note: This was tested in hbase 0.94.11 version and the trunk also 
> currently has the same logic.
> The fix for the same would be to make the skew positive value first as below:
>  long skew = System.currentTimeMillis() - serverCurrentTime;
> skew = (skew < 0 ? -skew : skew);
> if (skew > maxSkew) {.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9903) Remove the jamon generated classes from the findbugs analysis

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815655#comment-13815655
 ] 

Hadoop QA commented on HBASE-9903:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612442/9903.v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7769//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7769//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7769//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7769//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7769//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7769//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7769//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7769//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7769//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7769//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7769//console

This message is automatically generated.

> Remove the jamon generated classes from the findbugs analysis
> -
>
> Key: HBASE-9903
> URL: https://issues.apache.org/jira/browse/HBASE-9903
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.98.0
>
> Attachments: 9903.v1.patch, 9903.v2.patch, 9903.v2.patch
>
>
> The current filter does not work.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9885) Avoid some Result creation in protobuf conversions

2013-11-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815645#comment-13815645
 ] 

Ted Yu commented on HBASE-9885:
---

{code}
 List values = proto.getCellList();
-if (cells == null) cells = new ArrayList(values.size());
-for (CellProtos.Cell c: values) {
-  cells.add(toCell(c));
+if (cells == null) {
+  if (values.isEmpty()) {
+return EMPTY_RESULT;
+  } else {
+cells = new ArrayList(values.size());
+for (CellProtos.Cell c : values) {
+  toCell(c));
+}
+  }
{code}
Looks like the scope of cells == null condition is too wide: the for loop 
should be outside 'if (cells == null)' check.

> Avoid some Result creation in protobuf conversions
> --
>
> Key: HBASE-9885
> URL: https://issues.apache.org/jira/browse/HBASE-9885
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Protobufs, regionserver
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.98.0, 0.96.1
>
> Attachments: 9885.v1.patch, 9885.v2, 9885.v2.patch, 9885.v3.patch, 
> 9885.v3.patch
>
>
> We creates a lot of Result that we could avoid, as they contain nothing else 
> than a boolean value. We create sometimes a protobuf builder as well on this 
> path, this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815646#comment-13815646
 ] 

Hadoop QA commented on HBASE-9906:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12612509/hbase-9906-0.94_v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7770//console

This message is automatically generated.

> Restore snapshot fails to restore the meta edits sporadically  
> ---
>
> Key: HBASE-9906
> URL: https://issues.apache.org/jira/browse/HBASE-9906
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
> Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch
>
>
> After snaphot restore, we see failures to find the table in meta:
> {code}
> > disable 'tablefour'
> > restore_snapshot 'snapshot_tablefour'
> > enable 'tablefour'
> ERROR: Table tablefour does not exist.'
> {code}
> This is quite subtle. From the looks of it, we successfully restore the 
> snapshot, do the meta updates, return to the client about the status. The 
> client then tries to do an operation for the table (like enable table, or 
> scan in the test outputs) which fails because the meta entry for the region 
> seems to be gone (in case of single region, the table will be reported 
> missing). Subsequent attempts for creating the table will also fail because 
> the table directories will be there, but not the meta entries.
> For restoring meta entries, we are doing a delete then a put to the same 
> region:
> {code}
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
> 76d0e2b7ec3291afcaa82e18a56ccc30
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
> fa41edf43fe3ee131db4a34b848ff432
> ...
> 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Deleted [{ENCODED => fa41edf43fe3ee131db4a34b848ff432, NAME => 
> 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
> => '', ENDKEY => ''}, {ENCODED => 76d0e2b7ec3291afcaa82e18a56ccc30, NAME => 
> 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
> 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added 1
> {code}
> The root cause for this sporadic failure is that, the delete and subsequent 
> put will have the same timestamp if they execute in the same ms. The delete 
> will override the put in the same ts, even though the put have a larger ts.
> See: HBASE-9905, HBASE-8770
> Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9001) TestThriftServerCmdLine.testRunThriftServer[0] failed

2013-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815640#comment-13815640
 ] 

Hudson commented on HBASE-9001:
---

SUCCESS: Integrated in HBase-TRUNK #4671 (See 
[https://builds.apache.org/job/HBase-TRUNK/4671/])
HBASE-9001 Add a toString in HTable, fix a log in AssignmentManager (nkeywal: 
rev 1539425)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


> TestThriftServerCmdLine.testRunThriftServer[0] failed
> -
>
> Key: HBASE-9001
> URL: https://issues.apache.org/jira/browse/HBASE-9001
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 0.95.2
>
> Attachments: 9001.txt
>
>
> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/624/testReport/junit/org.apache.hadoop.hbase.thrift/TestThriftServerCmdLine/testRunThriftServer_0_/
> It seems stuck here:
> {code}
> 2013-07-19 03:52:03,158 INFO  [Thread-131] 
> thrift.TestThriftServerCmdLine(132): Starting HBase Thrift server with 
> command line: -hsha -port 56708 start
> 2013-07-19 03:52:03,174 INFO  [ThriftServer-cmdline] 
> thrift.ThriftServerRunner$ImplType(208): Using thrift server type hsha
> 2013-07-19 03:52:03,205 WARN  [ThriftServer-cmdline] conf.Configuration(817): 
> fs.default.name is deprecated. Instead, use fs.defaultFS
> 2013-07-19 03:52:03,206 WARN  [ThriftServer-cmdline] conf.Configuration(817): 
> mapreduce.job.counters.limit is deprecated. Instead, use 
> mapreduce.job.counters.max
> 2013-07-19 03:52:03,207 WARN  [ThriftServer-cmdline] conf.Configuration(817): 
> io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
> 2013-07-19 03:54:03,156 INFO  [pool-1-thread-1] hbase.ResourceChecker(171): 
> after: thrift.TestThriftServerCmdLine#testRunThriftServer[0] Thread=146 (was 
> 155), OpenFileDescriptor=295 (was 311), MaxFileDescriptor=4096 (was 4096), 
> SystemLoadAverage=293 (was 240) - SystemLoadAverage LEAK? -, ProcessCount=145 
> (was 143) - ProcessCount LEAK? -, AvailableMemoryMB=779 (was 1263), 
> ConnectionCount=4 (was 4)
> 2013-07-19 03:54:03,157 DEBUG [pool-1-thread-1] 
> thrift.TestThriftServerCmdLine(107): implType=-hsha, specifyFramed=false, 
> specifyBindIP=false, specifyCompact=true
> {code}
> My guess is that we didn't get scheduled because load was almost 300 on this 
> box at the time?
> Let me up the timeout of two minutes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9885) Avoid some Result creation in protobuf conversions

2013-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815639#comment-13815639
 ] 

Hudson commented on HBASE-9885:
---

SUCCESS: Integrated in HBase-TRUNK #4671 (See 
[https://builds.apache.org/job/HBase-TRUNK/4671/])
HBASE-9885 Avoid some Result creation in protobuf conversions - REVERT to check 
the cause of precommit flakiness (nkeywal: rev 1539492)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
HBASE-9885 Avoid some Result creation in protobuf conversions (nkeywal: rev 
1539429)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java


> Avoid some Result creation in protobuf conversions
> --
>
> Key: HBASE-9885
> URL: https://issues.apache.org/jira/browse/HBASE-9885
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Protobufs, regionserver
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.98.0, 0.96.1
>
> Attachments: 9885.v1.patch, 9885.v2, 9885.v2.patch, 9885.v3.patch, 
> 9885.v3.patch
>
>
> We creates a lot of Result that we could avoid, as they contain nothing else 
> than a boolean value. We create sometimes a protobuf builder as well on this 
> path, this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9909) TestHFilePerformance should not be a unit test, but a tool

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815634#comment-13815634
 ] 

Hadoop QA commented on HBASE-9909:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612502/hbase-9909_v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7768//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7768//console

This message is automatically generated.

> TestHFilePerformance should not be a unit test, but a tool
> --
>
> Key: HBASE-9909
> URL: https://issues.apache.org/jira/browse/HBASE-9909
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9909_v1.patch
>
>
> TestHFilePerformance is a very old test, which does not test anything, but a 
> perf evaluation tool. It is not clear to me whether there is any utility for 
> keeping it, but that should at least be converted to be a tool. 
> Note that TestHFile already covers the unit test cases (writing hfile with 
> none and gz compression). We do not need to test SequenceFile. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset

2013-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9818:
--

Attachment: 9818-trial.txt

I tried to use 9818-trial.txt for detecting where the close() came from.

First attempt ended in with device full.
Rerunning the two tests now.

> NPE in HFileBlock#AbstractFSReader#readAtOffset
> ---
>
> Key: HBASE-9818
> URL: https://issues.apache.org/jira/browse/HBASE-9818
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Ted Yu
> Attachments: 9818-trial.txt, 9818-v2.txt, 9818-v3.txt, 9818-v4.txt, 
> 9818-v5.txt
>
>
> HFileBlock#istream seems to be null.  I was wondering should we hide 
> FSDataInputStreamWrapper#useHBaseChecksum.
> By the way, this happened when online schema change is enabled (encoding)
> {noformat}
> 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
> regionserver.HRegionServer:
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
> at java.lang.Thread.run(Thread.java:724)
> 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
> regionserver.HRegionServer:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
> nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
> request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
> false next_call_seq: 53437
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcSched

[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset

2013-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9818:
--

Status: Open  (was: Patch Available)

> NPE in HFileBlock#AbstractFSReader#readAtOffset
> ---
>
> Key: HBASE-9818
> URL: https://issues.apache.org/jira/browse/HBASE-9818
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Ted Yu
> Attachments: 9818-v2.txt, 9818-v3.txt, 9818-v4.txt, 9818-v5.txt
>
>
> HFileBlock#istream seems to be null.  I was wondering should we hide 
> FSDataInputStreamWrapper#useHBaseChecksum.
> By the way, this happened when online schema change is enabled (encoding)
> {noformat}
> 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
> regionserver.HRegionServer:
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
> at java.lang.Thread.run(Thread.java:724)
> 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
> regionserver.HRegionServer:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
> nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
> request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
> false next_call_seq: 53437
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.

[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815632#comment-13815632
 ] 

Ted Yu commented on HBASE-9906:
---

Minor comment:
{code}
-  if (metaChanges.hasRegionsToRestore()) 
hrisToRemove.addAll(metaChanges.getRegionsToRestore());
   MetaEditor.deleteRegions(catalogTracker, hrisToRemove);
{code}
Can the 20 ms sleep start counting from the call to MetaEditor.deleteRegions() ?
Would 17ms sleep be good enough ?

> Restore snapshot fails to restore the meta edits sporadically  
> ---
>
> Key: HBASE-9906
> URL: https://issues.apache.org/jira/browse/HBASE-9906
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
> Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch
>
>
> After snaphot restore, we see failures to find the table in meta:
> {code}
> > disable 'tablefour'
> > restore_snapshot 'snapshot_tablefour'
> > enable 'tablefour'
> ERROR: Table tablefour does not exist.'
> {code}
> This is quite subtle. From the looks of it, we successfully restore the 
> snapshot, do the meta updates, return to the client about the status. The 
> client then tries to do an operation for the table (like enable table, or 
> scan in the test outputs) which fails because the meta entry for the region 
> seems to be gone (in case of single region, the table will be reported 
> missing). Subsequent attempts for creating the table will also fail because 
> the table directories will be there, but not the meta entries.
> For restoring meta entries, we are doing a delete then a put to the same 
> region:
> {code}
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
> 76d0e2b7ec3291afcaa82e18a56ccc30
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
> fa41edf43fe3ee131db4a34b848ff432
> ...
> 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Deleted [{ENCODED => fa41edf43fe3ee131db4a34b848ff432, NAME => 
> 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
> => '', ENDKEY => ''}, {ENCODED => 76d0e2b7ec3291afcaa82e18a56ccc30, NAME => 
> 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
> 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added 1
> {code}
> The root cause for this sporadic failure is that, the delete and subsequent 
> put will have the same timestamp if they execute in the same ms. The delete 
> will override the put in the same ts, even though the put have a larger ts.
> See: HBASE-9905, HBASE-8770
> Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9808) org.apache.hadoop.hbase.rest.PerformanceEvaluation is out of sync with org.apache.hadoop.hbase.PerformanceEvaluation

2013-11-06 Thread Gustavo Anatoly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gustavo Anatoly updated HBASE-9808:
---

Attachment: HBASE-9808-v2.patch

Nick, could you please review again? 
Thanks.

> org.apache.hadoop.hbase.rest.PerformanceEvaluation is out of sync with 
> org.apache.hadoop.hbase.PerformanceEvaluation
> 
>
> Key: HBASE-9808
> URL: https://issues.apache.org/jira/browse/HBASE-9808
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Gustavo Anatoly
> Attachments: HBASE-9808-v1.patch, HBASE-9808-v2.patch, 
> HBASE-9808.patch
>
>
> Here is list of JIRAs whose fixes might have gone into 
> rest.PerformanceEvaluation :
> {code}
> 
> r1527817 | mbertozzi | 2013-09-30 15:57:44 -0700 (Mon, 30 Sep 2013) | 1 line
> HBASE-9663 PerformanceEvaluation does not properly honor specified table name 
> parameter
> 
> r1526452 | mbertozzi | 2013-09-26 04:58:50 -0700 (Thu, 26 Sep 2013) | 1 line
> HBASE-9662 PerformanceEvaluation input do not handle tags properties
> 
> r1525269 | ramkrishna | 2013-09-21 11:01:32 -0700 (Sat, 21 Sep 2013) | 3 lines
> HBASE-8496 - Implement tags and the internals of how a tag should look like 
> (Ram)
> 
> r1524985 | nkeywal | 2013-09-20 06:02:54 -0700 (Fri, 20 Sep 2013) | 1 line
> HBASE-9558  PerformanceEvaluation is in hbase-server, and creates a 
> dependency to MiniDFSCluster
> 
> r1523782 | nkeywal | 2013-09-16 13:07:13 -0700 (Mon, 16 Sep 2013) | 1 line
> HBASE-9521  clean clearBufferOnFail behavior and deprecate it
> 
> r1518341 | jdcryans | 2013-08-28 12:46:55 -0700 (Wed, 28 Aug 2013) | 2 lines
> HBASE-9330 Refactor PE to create HTable the correct way
> {code}
> Long term, we may consider consolidating the two PerformanceEvaluation 
> classes so that such maintenance work can be reduced.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node

2013-11-06 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815619#comment-13815619
 ] 

Liu Shaohui commented on HBASE-9892:


[~enis]
{quote}
What about backporting HBASE-7027 to 0.94 and fixing the issue in table.jsp?

It makes sense, but the challenge is that we cannot easily backport HBASE-7027 
without breaking BC. HServerLoad does not have extra fields we can use i fear. 
{quote}
Since HServerLoad have verison field, I think we can add a info port field and 
keep the compatibility.

[~stack] [~enis] Could you give a suggestion? Which method is acceptable? 
This patch or backporting 7027 and fixing the small issues left.


> Add info port to ServerName to support multi instances in a node
> 
>
> Key: HBASE-9892
> URL: https://issues.apache.org/jira/browse/HBASE-9892
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, 
> HBASE-9892-0.94-v3.diff
>
>
> The full GC time of  regionserver with big heap(> 30G ) usually  can not be 
> controlled in 30s. At the same time, the servers with 64G memory are normal. 
> So we try to deploy multi rs instances(2-3 ) in a single node and the heap of 
> each rs is about 20G ~ 24G.
> Most of the things works fine, except the hbase web ui. The master get the RS 
> info port from conf, which is suitable for this situation of multi rs  
> instances in a node. So we add info port to ServerName.
> a. at the startup, rs report it's info port to Hmaster.
> b, For root region, rs write the servername with info port ro the zookeeper 
> root-region-server node.
> c, For meta regions, rs write the servername with info port to root region 
> d. For user regions,  rs write the servername with info port to meta regions 
> So hmaster and client can get info port from the servername.
> To test this feature, I change the rs num from 1 to 3 in standalone mode, so 
> we can test it in standalone mode,
> I think Hoya(hbase on yarn) will encounter the same problem.  Anyone knows 
> how Hoya handle this problem?
> PS: There are  different formats for servername in zk node and meta table, i 
> think we need to unify it and refactor the code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9885) Avoid some Result creation in protobuf conversions

2013-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815611#comment-13815611
 ] 

Hudson commented on HBASE-9885:
---

FAILURE: Integrated in hbase-0.96 #181 (See 
[https://builds.apache.org/job/hbase-0.96/181/])
HBASE-9885 Avoid some Result creation in protobuf conversions - REVERT to check 
the cause of precommit flakiness (nkeywal: rev 1539493)
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
HBASE-9885 Avoid some Result creation in protobuf conversions (nkeywal: rev 
1539427)
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java


> Avoid some Result creation in protobuf conversions
> --
>
> Key: HBASE-9885
> URL: https://issues.apache.org/jira/browse/HBASE-9885
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Protobufs, regionserver
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.98.0, 0.96.1
>
> Attachments: 9885.v1.patch, 9885.v2, 9885.v2.patch, 9885.v3.patch, 
> 9885.v3.patch
>
>
> We creates a lot of Result that we could avoid, as they contain nothing else 
> than a boolean value. We create sometimes a protobuf builder as well on this 
> path, this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9001) TestThriftServerCmdLine.testRunThriftServer[0] failed

2013-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815612#comment-13815612
 ] 

Hudson commented on HBASE-9001:
---

FAILURE: Integrated in hbase-0.96 #181 (See 
[https://builds.apache.org/job/hbase-0.96/181/])
HBASE-9001 Add a toString in HTable, fix a log in AssignmentManager (nkeywal: 
rev 1539426)
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


> TestThriftServerCmdLine.testRunThriftServer[0] failed
> -
>
> Key: HBASE-9001
> URL: https://issues.apache.org/jira/browse/HBASE-9001
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 0.95.2
>
> Attachments: 9001.txt
>
>
> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/624/testReport/junit/org.apache.hadoop.hbase.thrift/TestThriftServerCmdLine/testRunThriftServer_0_/
> It seems stuck here:
> {code}
> 2013-07-19 03:52:03,158 INFO  [Thread-131] 
> thrift.TestThriftServerCmdLine(132): Starting HBase Thrift server with 
> command line: -hsha -port 56708 start
> 2013-07-19 03:52:03,174 INFO  [ThriftServer-cmdline] 
> thrift.ThriftServerRunner$ImplType(208): Using thrift server type hsha
> 2013-07-19 03:52:03,205 WARN  [ThriftServer-cmdline] conf.Configuration(817): 
> fs.default.name is deprecated. Instead, use fs.defaultFS
> 2013-07-19 03:52:03,206 WARN  [ThriftServer-cmdline] conf.Configuration(817): 
> mapreduce.job.counters.limit is deprecated. Instead, use 
> mapreduce.job.counters.max
> 2013-07-19 03:52:03,207 WARN  [ThriftServer-cmdline] conf.Configuration(817): 
> io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
> 2013-07-19 03:54:03,156 INFO  [pool-1-thread-1] hbase.ResourceChecker(171): 
> after: thrift.TestThriftServerCmdLine#testRunThriftServer[0] Thread=146 (was 
> 155), OpenFileDescriptor=295 (was 311), MaxFileDescriptor=4096 (was 4096), 
> SystemLoadAverage=293 (was 240) - SystemLoadAverage LEAK? -, ProcessCount=145 
> (was 143) - ProcessCount LEAK? -, AvailableMemoryMB=779 (was 1263), 
> ConnectionCount=4 (was 4)
> 2013-07-19 03:54:03,157 DEBUG [pool-1-thread-1] 
> thrift.TestThriftServerCmdLine(107): implType=-hsha, specifyFramed=false, 
> specifyBindIP=false, specifyCompact=true
> {code}
> My guess is that we didn't get scheduled because load was almost 300 on this 
> box at the time?
> Let me up the timeout of two minutes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815607#comment-13815607
 ] 

Hadoop QA commented on HBASE-9890:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612491/HBASE-9890-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7767//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7767//console

This message is automatically generated.

> MR jobs are not working if started by a delegated user
> --
>
> Key: HBASE-9890
> URL: https://issues.apache.org/jira/browse/HBASE-9890
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce, security
>Affects Versions: 0.98.0, 0.94.12, 0.96.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Fix For: 0.98.0, 0.94.13, 0.96.1
>
> Attachments: HBASE-9890-94-v0.patch, HBASE-9890-94-v1.patch, 
> HBASE-9890-v0.patch, HBASE-9890-v1.patch, HBASE-9890-v2.patch
>
>
> If Map-Reduce jobs are started with by a proxy user that has already the 
> delegation tokens, we get an exception on "obtain token" since the proxy user 
> doesn't have the kerberos auth.
> For example:
>  * If we use oozie to execute RowCounter - oozie will get the tokens required 
> (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter 
> tries to obtain the token, it will get an exception.
>  * If we use oozie to execute LoadIncrementalHFiles - oozie will get the 
> tokens required (HDFS_DELEGATION_TOKEN) and it will start the 
> LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the 
> token, it will get an exception.
> {code}
>  org.apache.hadoop.hbase.security.AccessDeniedException: Token generation 
> only allowed for Kerberos authenticated clients
> at 
> org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87)
> {code}
> {code}
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
> can be issued only with kerberos or web authentication
>   at 
> org.apach

[jira] [Updated] (HBASE-8741) Scope sequenceid to the region rather than regionserver (WAS: Mutations on Regions in recovery mode might have same sequenceIDs)

2013-11-06 Thread Himanshu Vashishtha (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-8741:
---

Attachment: HBASE-8741-trunk-v6.4.patch

Uploading the latest on rb here. Thanks.

> Scope sequenceid to the region rather than regionserver (WAS: Mutations on 
> Regions in recovery mode might have same sequenceIDs)
> 
>
> Key: HBASE-8741
> URL: https://issues.apache.org/jira/browse/HBASE-8741
> Project: HBase
>  Issue Type: Bug
>  Components: MTTR
>Affects Versions: 0.95.1
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Fix For: 0.98.0
>
> Attachments: HBASE-8741-trunk-v6.1-rebased.patch, 
> HBASE-8741-trunk-v6.2.1.patch, HBASE-8741-trunk-v6.2.2.patch, 
> HBASE-8741-trunk-v6.2.2.patch, HBASE-8741-trunk-v6.3.patch, 
> HBASE-8741-trunk-v6.4.patch, HBASE-8741-trunk-v6.patch, HBASE-8741-v0.patch, 
> HBASE-8741-v2.patch, HBASE-8741-v3.patch, HBASE-8741-v4-again.patch, 
> HBASE-8741-v4-again.patch, HBASE-8741-v4.patch, HBASE-8741-v5-again.patch, 
> HBASE-8741-v5.patch
>
>
> Currently, when opening a region, we find the maximum sequence ID from all 
> its HFiles and then set the LogSequenceId of the log (in case the later is at 
> a small value). This works good in recovered.edits case as we are not writing 
> to the region until we have replayed all of its previous edits. 
> With distributed log replay, if we want to enable writes while a region is 
> under recovery, we need to make sure that the logSequenceId > maximum 
> logSequenceId of the old regionserver. Otherwise, we might have a situation 
> where new edits have same (or smaller) sequenceIds. 
> We can store region level information in the WALTrailer, than this scenario 
> could be avoided by:
> a) reading the trailer of the "last completed" file, i.e., last wal file 
> which has a trailer and,
> b) completely reading the last wal file (this file would not have the 
> trailer, so it needs to be read completely).
> In future, if we switch to multi wal file, we could read the trailer for all 
> completed WAL files, and reading the remaining incomplete files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-06 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9895:
-

Status: Patch Available  (was: Open)

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-06 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9895:
-

Attachment: hbase-9895.patch

No good way to dynamically determine an input file format in 0.94 so 
introducing a system property such as following in order for Import to load a 
file using 0.94 deserializer.

{code}
./bin/hbase -Dhbase.input.version=0.94 org.apache.hadoop.hbase.mapreduce.Import 
 
{code}

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-06 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong reassigned HBASE-9895:


Assignee: Jeffrey Zhong

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node

2013-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815594#comment-13815594
 ] 

stack commented on HBASE-9892:
--

HBASE-7027 is an admitted hack; would be good to do it better.

> Add info port to ServerName to support multi instances in a node
> 
>
> Key: HBASE-9892
> URL: https://issues.apache.org/jira/browse/HBASE-9892
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, 
> HBASE-9892-0.94-v3.diff
>
>
> The full GC time of  regionserver with big heap(> 30G ) usually  can not be 
> controlled in 30s. At the same time, the servers with 64G memory are normal. 
> So we try to deploy multi rs instances(2-3 ) in a single node and the heap of 
> each rs is about 20G ~ 24G.
> Most of the things works fine, except the hbase web ui. The master get the RS 
> info port from conf, which is suitable for this situation of multi rs  
> instances in a node. So we add info port to ServerName.
> a. at the startup, rs report it's info port to Hmaster.
> b, For root region, rs write the servername with info port ro the zookeeper 
> root-region-server node.
> c, For meta regions, rs write the servername with info port to root region 
> d. For user regions,  rs write the servername with info port to meta regions 
> So hmaster and client can get info port from the servername.
> To test this feature, I change the rs num from 1 to 3 in standalone mode, so 
> we can test it in standalone mode,
> I think Hoya(hbase on yarn) will encounter the same problem.  Anyone knows 
> how Hoya handle this problem?
> PS: There are  different formats for servername in zk node and meta table, i 
> think we need to unify it and refactor the code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9873) Some improvements in hlog and hlog split

2013-11-06 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815593#comment-13815593
 ] 

Liu Shaohui commented on HBASE-9873:


[~nkeywal]
{quote}
Actually, we want to intro a speculative scheduler for hlog tasks, as the 
speculative scheduler for map/reduce tasks in mapreduce.

Note that there is a new algo implemented in HBASE-7006, allows to have writes 
during the recovery. This algo is not really suitable for speculative 
execution, because the writes are always executed on the same machines, so 
adding executions would likely slow down the process. Ok that's not for 0.94
{quote}

I will take a deep look at HBASE-7006 first. Thanks.

{quote}
Rely on the smallest of all biggest hfile's seqId of previous served 
regions to ignore some entries. Facebook have implemented this in HBASE-6508 
and we backport it to hbase 0.94 in HBASE-9568.

Yep, this would be useful for sure (my understanding is that 0.96+ has it)
{quote}
Thanks. HBASE-8573 has done it in 0.96. Sorry for not noticing it. 
As many companies still use 0.94, I think backporting is needed.



> Some improvements in hlog and hlog split
> 
>
> Key: HBASE-9873
> URL: https://issues.apache.org/jira/browse/HBASE-9873
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR, wal
>Reporter: Liu Shaohui
>Priority: Critical
>  Labels: failover, hlog
>
> Some improvements in hlog and hlog split
> 1) Try to clean old hlog after each memstore flush to avoid unnecessary hlogs 
> split in failover.  Now hlogs cleaning only be run in rolling hlog writer. 
> 2) Add a background hlog compaction thread to compaction the hlog: remove the 
> hlog entries whose data have been flushed to hfile. The scenario is that in a 
> share cluster, write requests of a table may very little and periodical,  a 
> lots of hlogs can not be cleaned for entries of this table in those hlogs.
> 3) Rely on the smallest of all biggest hfile's seqId of previous served 
> regions to ignore some entries.  Facebook have implemented this in HBASE-6508 
> and we backport it to hbase 0.94 in HBASE-9568.
> 4) Support running multiple hlog splitters on a single RS and on 
> master(latter can boost split efficiency for tiny cluster)
> 5) Enable multiple splitters on 'big' hlog file by splitting(logically) hlog 
> to slices(configurable size, eg hdfs trunk size 64M)
> support concurrent multiple split tasks on a single hlog file slice 
> 6) Do not cancel the timeout split task until one task reports it succeeds 
> (avoids scenario where split for a hlog file fails due to no one task can 
> succeed within the timeout period ), and and reschedule a same split task to 
> reduce split time ( to avoid some straggler in hlog split)
> 7) Consider the hlog data locality when schedule the hlog split task.  
> Schedule the hlog to a splitter which is near to hlog data.
> 8) Support multi hlog writers and switching to another hlog writer when long 
> write latency to current hlog due to possible temporary network spike? 
> This is a draft which lists the improvements about hlog we try to implement 
> in the near future. Comments and discussions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9775) Client write path perf issues

2013-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815591#comment-13815591
 ] 

stack commented on HBASE-9775:
--

Thanks [~jmspaggi].  Don't mind the patch in here.  I think Nicolas had a 
prescription above for you comparing 0.94 and 0.96?

> Client write path perf issues
> -
>
> Key: HBASE-9775
> URL: https://issues.apache.org/jira/browse/HBASE-9775
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.96.0
>Reporter: Elliott Clark
>Priority: Critical
> Attachments: 9775.rig.txt, 9775.rig.v2.patch, 9775.rig.v3.patch, 
> Charts Search   Cloudera Manager - ITBLL.png, Charts Search   Cloudera 
> Manager.png, hbase-9775.patch, job_run.log, short_ycsb.png, ycsb.png, 
> ycsb_insert_94_vs_96.png
>
>
> Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9889) Make sure we clean up scannerReadPoints upon any exceptions

2013-11-06 Thread Amitanand Aiyer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815590#comment-13815590
 ] 

Amitanand Aiyer commented on HBASE-9889:


I think we can move it to the end of the constructor.

Just need to make sure that we grab the read points before we open the scanners 
(have seen get latency go up .. if we move the entier synchronized () .. block 
to the end).

for the removal .. i think it is okay. scannerReadPoints is supposed to be a 
concurrentHashMap.

> Make sure we clean up scannerReadPoints upon any exceptions
> ---
>
> Key: HBASE-9889
> URL: https://issues.apache.org/jira/browse/HBASE-9889
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 0.89-fb, 0.94.12, 0.96.0
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
>Priority: Minor
> Fix For: 0.96.1
>
> Attachments: hbase-9889.diff
>
>
> If there is an exception in the creation of RegionScanner (for example, 
> exception while opening store files) the scanner Read points is not cleaned 
> up.
> Having an unused old entry in the scannerReadPoints means that flushes and 
> compactions cannot garbage-collect older versions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9908) [WINDOWS] Fix filesystem / classloader related unit tests

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815587#comment-13815587
 ] 

Hadoop QA commented on HBASE-9908:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612498/hbase-9908_v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 32 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7766//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7766//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7766//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7766//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7766//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7766//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7766//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7766//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7766//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7766//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7766//console

This message is automatically generated.

> [WINDOWS] Fix filesystem / classloader related unit tests
> -
>
> Key: HBASE-9908
> URL: https://issues.apache.org/jira/browse/HBASE-9908
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9908_v1.patch
>
>
> Some of the unit tests related to classloasing and filesystem are failing on 
> windows. 
> {code}
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testHBase3810
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLocalFS
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testPrivateClassLoader
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromRelativeLibDirInJar
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLibDirInJar
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromHDFS
> org.apache.hadoop.hbase.backup.TestHFileArchiving.testCleaningRace
> org.apache.hadoop.hbase.regionserver.wal.TestDurability.testDurability
> org.apache.hadoop.hbase.regionserver.wal.TestHLog.testMaintainOrderWithConcurrentWrites
> org.apache.hadoop.hbase.security.access.TestAccessController.testBulkLoad
> org.apache.hadoop.hbase.regionserver.TestHRegion.testRecoveredEditsReplayCompaction
> org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait.testRecoveredEditsReplayCompaction
> org.apache.hadoop.hbase.util.TestFSUtils.testRenameAndSetModifyTime
> {code}
> The root causes are: 
>  - Using local file name for referring to hdfs paths (HBASE-6830)
>  - Classloader using the wrong file system 
>  - StoreFile readers not being closed (for unfinished compaction)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9909) TestHFilePerformance should not be a unit test, but a tool

2013-11-06 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815584#comment-13815584
 ] 

Jean-Marc Spaggiari commented on HBASE-9909:


bq. Looked at those, it seems these are slightly different. 
TestHFilePerformance is more focussed on perf for seq writes and reads between 
hfile and seq file. 

Should they be merged then? I guess you will say yes, but on a separate JIRA? ;)


Opened HBASE-9910 and  HBASE-9911.

> TestHFilePerformance should not be a unit test, but a tool
> --
>
> Key: HBASE-9909
> URL: https://issues.apache.org/jira/browse/HBASE-9909
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9909_v1.patch
>
>
> TestHFilePerformance is a very old test, which does not test anything, but a 
> perf evaluation tool. It is not clear to me whether there is any utility for 
> keeping it, but that should at least be converted to be a tool. 
> Note that TestHFile already covers the unit test cases (writing hfile with 
> none and gz compression). We do not need to test SequenceFile. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9911) PerformanceEvaluation should be used as a proxy class for TestHFilePerformance and HFilePerformanceEvaluation

2013-11-06 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-9911:
--

 Summary: PerformanceEvaluation should be used as a proxy class for 
TestHFilePerformance and HFilePerformanceEvaluation
 Key: HBASE-9911
 URL: https://issues.apache.org/jira/browse/HBASE-9911
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Marc Spaggiari


All the performance tests classes should be called from PerformanceEvaluation 
as a proxy. This will allow to have a clear view of all the performance tests 
available. TestHFilePerformance and HFilePerformanceEvaluation should do the 
same.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9910) TestHFilePerformance and HFilePerformanceEvaluation should be merged in a single HFile performance test class.

2013-11-06 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-9910:
--

 Summary: TestHFilePerformance and HFilePerformanceEvaluation 
should be merged in a single HFile performance test class.
 Key: HBASE-9910
 URL: https://issues.apache.org/jira/browse/HBASE-9910
 Project: HBase
  Issue Type: Bug
  Components: Performance
Reporter: Jean-Marc Spaggiari


Today TestHFilePerformance and HFilePerformanceEvaluation are doing slightly 
different kind of performance tests both for the HFile. We should consider 
merging those 2 tests in a single class.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node

2013-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815583#comment-13815583
 ] 

Enis Soztutar commented on HBASE-9892:
--

bq. What about backporting HBASE-7027 to 0.94 and fixing the issue in table.jsp?
It makes sense, but the challenge is that we cannot easily backport HBASE-7027 
without breaking BC. HServerLoad does not have extra fields we can use i fear. 

Your patch at RB is actually better than 7027, if we do this for trunk, we 
should undo 7027. 




> Add info port to ServerName to support multi instances in a node
> 
>
> Key: HBASE-9892
> URL: https://issues.apache.org/jira/browse/HBASE-9892
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, 
> HBASE-9892-0.94-v3.diff
>
>
> The full GC time of  regionserver with big heap(> 30G ) usually  can not be 
> controlled in 30s. At the same time, the servers with 64G memory are normal. 
> So we try to deploy multi rs instances(2-3 ) in a single node and the heap of 
> each rs is about 20G ~ 24G.
> Most of the things works fine, except the hbase web ui. The master get the RS 
> info port from conf, which is suitable for this situation of multi rs  
> instances in a node. So we add info port to ServerName.
> a. at the startup, rs report it's info port to Hmaster.
> b, For root region, rs write the servername with info port ro the zookeeper 
> root-region-server node.
> c, For meta regions, rs write the servername with info port to root region 
> d. For user regions,  rs write the servername with info port to meta regions 
> So hmaster and client can get info port from the servername.
> To test this feature, I change the rs num from 1 to 3 in standalone mode, so 
> we can test it in standalone mode,
> I think Hoya(hbase on yarn) will encounter the same problem.  Anyone knows 
> how Hoya handle this problem?
> PS: There are  different formats for servername in zk node and meta table, i 
> think we need to unify it and refactor the code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9909) TestHFilePerformance should not be a unit test, but a tool

2013-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815577#comment-13815577
 ] 

Enis Soztutar commented on HBASE-9909:
--

bq. I would love to see a "proxy" for all those performance testing files... 
Can we also modify PE to have an option to test the HFilePerf the same way we 
have randowWrite, etc.?
Sure, not in this issue though. 
bq. TestHFilePerformance and HFilePerformanceEvaluation ?
Looked at those, it seems these are slightly different. TestHFilePerformance is 
more focussed on perf for seq writes and reads between hfile and seq file. 

> TestHFilePerformance should not be a unit test, but a tool
> --
>
> Key: HBASE-9909
> URL: https://issues.apache.org/jira/browse/HBASE-9909
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9909_v1.patch
>
>
> TestHFilePerformance is a very old test, which does not test anything, but a 
> perf evaluation tool. It is not clear to me whether there is any utility for 
> keeping it, but that should at least be converted to be a tool. 
> Note that TestHFile already covers the unit test cases (writing hfile with 
> none and gz compression). We do not need to test SequenceFile. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node

2013-11-06 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815575#comment-13815575
 ] 

Liu Shaohui commented on HBASE-9892:


Thanks [~ndimiduk][~enis]
HBASE-7027 partly have fixed the problem by report the info port to hmaster via 
serverLoad indead. 

But the  infoPort in table.jps still be got from config.
{code}
  // HARDCODED FOR NOW TODO: FIX GET FROM ZK
  // This port might be wrong if RS actually ended up using something else.
  int infoPort = conf.getInt("hbase.regionserver.info.port", 60030);
{code}

What about backporting HBASE-7027 to 0.94 and fixing the issue in table.jsp? 


> Add info port to ServerName to support multi instances in a node
> 
>
> Key: HBASE-9892
> URL: https://issues.apache.org/jira/browse/HBASE-9892
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, 
> HBASE-9892-0.94-v3.diff
>
>
> The full GC time of  regionserver with big heap(> 30G ) usually  can not be 
> controlled in 30s. At the same time, the servers with 64G memory are normal. 
> So we try to deploy multi rs instances(2-3 ) in a single node and the heap of 
> each rs is about 20G ~ 24G.
> Most of the things works fine, except the hbase web ui. The master get the RS 
> info port from conf, which is suitable for this situation of multi rs  
> instances in a node. So we add info port to ServerName.
> a. at the startup, rs report it's info port to Hmaster.
> b, For root region, rs write the servername with info port ro the zookeeper 
> root-region-server node.
> c, For meta regions, rs write the servername with info port to root region 
> d. For user regions,  rs write the servername with info port to meta regions 
> So hmaster and client can get info port from the servername.
> To test this feature, I change the rs num from 1 to 3 in standalone mode, so 
> we can test it in standalone mode,
> I think Hoya(hbase on yarn) will encounter the same problem.  Anyone knows 
> how Hoya handle this problem?
> PS: There are  different formats for servername in zk node and meta table, i 
> think we need to unify it and refactor the code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9909) TestHFilePerformance should not be a unit test, but a tool

2013-11-06 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815569#comment-13815569
 ] 

Jean-Marc Spaggiari commented on HBASE-9909:


Also, any duplication between TestHFilePerformance and 
HFilePerformanceEvaluation ?

> TestHFilePerformance should not be a unit test, but a tool
> --
>
> Key: HBASE-9909
> URL: https://issues.apache.org/jira/browse/HBASE-9909
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9909_v1.patch
>
>
> TestHFilePerformance is a very old test, which does not test anything, but a 
> perf evaluation tool. It is not clear to me whether there is any utility for 
> keeping it, but that should at least be converted to be a tool. 
> Note that TestHFile already covers the unit test cases (writing hfile with 
> none and gz compression). We do not need to test SequenceFile. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9909) TestHFilePerformance should not be a unit test, but a tool

2013-11-06 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815568#comment-13815568
 ] 

Jean-Marc Spaggiari commented on HBASE-9909:


I would love to see a "proxy" for all those performance testing files... Can we 
also modify PE to have an option to test the HFilePerf the same way we have 
randowWrite, etc.?

> TestHFilePerformance should not be a unit test, but a tool
> --
>
> Key: HBASE-9909
> URL: https://issues.apache.org/jira/browse/HBASE-9909
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9909_v1.patch
>
>
> TestHFilePerformance is a very old test, which does not test anything, but a 
> perf evaluation tool. It is not clear to me whether there is any utility for 
> keeping it, but that should at least be converted to be a tool. 
> Note that TestHFile already covers the unit test cases (writing hfile with 
> none and gz compression). We do not need to test SequenceFile. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-9906:
-

Attachment: hbase-9906-0.94_v1.patch

0.94 version of the patch. 

> Restore snapshot fails to restore the meta edits sporadically  
> ---
>
> Key: HBASE-9906
> URL: https://issues.apache.org/jira/browse/HBASE-9906
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
> Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch
>
>
> After snaphot restore, we see failures to find the table in meta:
> {code}
> > disable 'tablefour'
> > restore_snapshot 'snapshot_tablefour'
> > enable 'tablefour'
> ERROR: Table tablefour does not exist.'
> {code}
> This is quite subtle. From the looks of it, we successfully restore the 
> snapshot, do the meta updates, return to the client about the status. The 
> client then tries to do an operation for the table (like enable table, or 
> scan in the test outputs) which fails because the meta entry for the region 
> seems to be gone (in case of single region, the table will be reported 
> missing). Subsequent attempts for creating the table will also fail because 
> the table directories will be there, but not the meta entries.
> For restoring meta entries, we are doing a delete then a put to the same 
> region:
> {code}
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
> 76d0e2b7ec3291afcaa82e18a56ccc30
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
> fa41edf43fe3ee131db4a34b848ff432
> ...
> 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Deleted [{ENCODED => fa41edf43fe3ee131db4a34b848ff432, NAME => 
> 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
> => '', ENDKEY => ''}, {ENCODED => 76d0e2b7ec3291afcaa82e18a56ccc30, NAME => 
> 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
> 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added 1
> {code}
> The root cause for this sporadic failure is that, the delete and subsequent 
> put will have the same timestamp if they execute in the same ms. The delete 
> will override the put in the same ts, even though the put have a larger ts.
> See: HBASE-9905, HBASE-8770
> Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815560#comment-13815560
 ] 

Enis Soztutar commented on HBASE-9906:
--

Thanks Matteo, test failure seems unrelated. 

> Restore snapshot fails to restore the meta edits sporadically  
> ---
>
> Key: HBASE-9906
> URL: https://issues.apache.org/jira/browse/HBASE-9906
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
> Attachments: hbase-9906_v1.patch
>
>
> After snaphot restore, we see failures to find the table in meta:
> {code}
> > disable 'tablefour'
> > restore_snapshot 'snapshot_tablefour'
> > enable 'tablefour'
> ERROR: Table tablefour does not exist.'
> {code}
> This is quite subtle. From the looks of it, we successfully restore the 
> snapshot, do the meta updates, return to the client about the status. The 
> client then tries to do an operation for the table (like enable table, or 
> scan in the test outputs) which fails because the meta entry for the region 
> seems to be gone (in case of single region, the table will be reported 
> missing). Subsequent attempts for creating the table will also fail because 
> the table directories will be there, but not the meta entries.
> For restoring meta entries, we are doing a delete then a put to the same 
> region:
> {code}
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
> 76d0e2b7ec3291afcaa82e18a56ccc30
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
> fa41edf43fe3ee131db4a34b848ff432
> ...
> 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Deleted [{ENCODED => fa41edf43fe3ee131db4a34b848ff432, NAME => 
> 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
> => '', ENDKEY => ''}, {ENCODED => 76d0e2b7ec3291afcaa82e18a56ccc30, NAME => 
> 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
> 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added 1
> {code}
> The root cause for this sporadic failure is that, the delete and subsequent 
> put will have the same timestamp if they execute in the same ms. The delete 
> will override the put in the same ts, even though the put have a larger ts.
> See: HBASE-9905, HBASE-8770
> Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user

2013-11-06 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815556#comment-13815556
 ] 

Gary Helmling commented on HBASE-9890:
--

v2 looks good.  A couple comments:

* instead of ZKClusterId.getUUIDForCluster() and converting back to String, you 
can just use ZKClusterId.readClusterIdZNode().
* I think we need the same changes in mapred.TableMapReduceUtil.  Although that 
one doesn't have the 2 ZK quorum support, if more than one HBASE_AUTH_TOKEN is 
present for the UGI, you could still wind up returning the wrong one and adding 
it to the job.

> MR jobs are not working if started by a delegated user
> --
>
> Key: HBASE-9890
> URL: https://issues.apache.org/jira/browse/HBASE-9890
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce, security
>Affects Versions: 0.98.0, 0.94.12, 0.96.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Fix For: 0.98.0, 0.94.13, 0.96.1
>
> Attachments: HBASE-9890-94-v0.patch, HBASE-9890-94-v1.patch, 
> HBASE-9890-v0.patch, HBASE-9890-v1.patch, HBASE-9890-v2.patch
>
>
> If Map-Reduce jobs are started with by a proxy user that has already the 
> delegation tokens, we get an exception on "obtain token" since the proxy user 
> doesn't have the kerberos auth.
> For example:
>  * If we use oozie to execute RowCounter - oozie will get the tokens required 
> (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter 
> tries to obtain the token, it will get an exception.
>  * If we use oozie to execute LoadIncrementalHFiles - oozie will get the 
> tokens required (HDFS_DELEGATION_TOKEN) and it will start the 
> LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the 
> token, it will get an exception.
> {code}
>  org.apache.hadoop.hbase.security.AccessDeniedException: Token generation 
> only allowed for Kerberos authenticated clients
> at 
> org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87)
> {code}
> {code}
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
> can be issued only with kerberos or web authentication
>   at 
> org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868)
>   at 
> org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509)
>   at 
> org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85)
>   at 
> org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949)
>   at 
> org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854)
>   at 
> org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
>   at 
> org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9909) TestHFilePerformance should not be a unit test, but a tool

2013-11-06 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-9909:
-

Status: Patch Available  (was: Open)

> TestHFilePerformance should not be a unit test, but a tool
> --
>
> Key: HBASE-9909
> URL: https://issues.apache.org/jira/browse/HBASE-9909
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9909_v1.patch
>
>
> TestHFilePerformance is a very old test, which does not test anything, but a 
> perf evaluation tool. It is not clear to me whether there is any utility for 
> keeping it, but that should at least be converted to be a tool. 
> Note that TestHFile already covers the unit test cases (writing hfile with 
> none and gz compression). We do not need to test SequenceFile. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9909) TestHFilePerformance should not be a unit test, but a tool

2013-11-06 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-9909:
-

Attachment: hbase-9909_v1.patch

Attaching patch. 

> TestHFilePerformance should not be a unit test, but a tool
> --
>
> Key: HBASE-9909
> URL: https://issues.apache.org/jira/browse/HBASE-9909
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9909_v1.patch
>
>
> TestHFilePerformance is a very old test, which does not test anything, but a 
> perf evaluation tool. It is not clear to me whether there is any utility for 
> keeping it, but that should at least be converted to be a tool. 
> Note that TestHFile already covers the unit test cases (writing hfile with 
> none and gz compression). We do not need to test SequenceFile. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9775) Client write path perf issues

2013-11-06 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815545#comment-13815545
 ] 

Jean-Marc Spaggiari commented on HBASE-9775:


I will try to start the tests this evening, else will be tomorrow morning. 
Might take about 24h. I will run the last 0.96 and the last 0.96+9775 and 
compare. I will run in standalone but I can also run in pseudo-dist if you want 
(6 disks).

> Client write path perf issues
> -
>
> Key: HBASE-9775
> URL: https://issues.apache.org/jira/browse/HBASE-9775
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.96.0
>Reporter: Elliott Clark
>Priority: Critical
> Attachments: 9775.rig.txt, 9775.rig.v2.patch, 9775.rig.v3.patch, 
> Charts Search   Cloudera Manager - ITBLL.png, Charts Search   Cloudera 
> Manager.png, hbase-9775.patch, job_run.log, short_ycsb.png, ycsb.png, 
> ycsb_insert_94_vs_96.png
>
>
> Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9909) TestHFilePerformance should not be a unit test, but a tool

2013-11-06 Thread Enis Soztutar (JIRA)
Enis Soztutar created HBASE-9909:


 Summary: TestHFilePerformance should not be a unit test, but a tool
 Key: HBASE-9909
 URL: https://issues.apache.org/jira/browse/HBASE-9909
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1


TestHFilePerformance is a very old test, which does not test anything, but a 
perf evaluation tool. It is not clear to me whether there is any utility for 
keeping it, but that should at least be converted to be a tool. 

Note that TestHFile already covers the unit test cases (writing hfile with none 
and gz compression). We do not need to test SequenceFile. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815544#comment-13815544
 ] 

Hadoop QA commented on HBASE-9906:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612481/hbase-9906_v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//console

This message is automatically generated.

> Restore snapshot fails to restore the meta edits sporadically  
> ---
>
> Key: HBASE-9906
> URL: https://issues.apache.org/jira/browse/HBASE-9906
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
> Attachments: hbase-9906_v1.patch
>
>
> After snaphot restore, we see failures to find the table in meta:
> {code}
> > disable 'tablefour'
> > restore_snapshot 'snapshot_tablefour'
> > enable 'tablefour'
> ERROR: Table tablefour does not exist.'
> {code}
> This is quite subtle. From the looks of it, we successfully restore the 
> snapshot, do the meta updates, return to the client about the status. The 
> client then tries to do an operation for the table (like enable table, or 
> scan in the test outputs) which fails because the meta entry for the region 
> seems to be gone (in case of single region, the table will be reported 
> missing). Subsequent attempts for creating the table will also fail because 
> the table directories will be there, but not the meta entries.
> For restoring meta entries, we are doing a delete then a put to the same 
> region:
> {code}
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
> 76d0e2b7ec3291afcaa82e18a56ccc30
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
> fa41edf43fe3ee131db4a34b848ff432
> ...
> 2013-11-0

[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node

2013-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815541#comment-13815541
 ] 

Enis Soztutar commented on HBASE-9892:
--

If we do not have the problem in 0.96, I would be -0 for fixing it in 0.94. 

> Add info port to ServerName to support multi instances in a node
> 
>
> Key: HBASE-9892
> URL: https://issues.apache.org/jira/browse/HBASE-9892
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, 
> HBASE-9892-0.94-v3.diff
>
>
> The full GC time of  regionserver with big heap(> 30G ) usually  can not be 
> controlled in 30s. At the same time, the servers with 64G memory are normal. 
> So we try to deploy multi rs instances(2-3 ) in a single node and the heap of 
> each rs is about 20G ~ 24G.
> Most of the things works fine, except the hbase web ui. The master get the RS 
> info port from conf, which is suitable for this situation of multi rs  
> instances in a node. So we add info port to ServerName.
> a. at the startup, rs report it's info port to Hmaster.
> b, For root region, rs write the servername with info port ro the zookeeper 
> root-region-server node.
> c, For meta regions, rs write the servername with info port to root region 
> d. For user regions,  rs write the servername with info port to meta regions 
> So hmaster and client can get info port from the servername.
> To test this feature, I change the rs num from 1 to 3 in standalone mode, so 
> we can test it in standalone mode,
> I think Hoya(hbase on yarn) will encounter the same problem.  Anyone knows 
> how Hoya handle this problem?
> PS: There are  different formats for servername in zk node and meta table, i 
> think we need to unify it and refactor the code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9908) [WINDOWS] Fix filesystem / classloader related unit tests

2013-11-06 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-9908:
-

Attachment: hbase-9908_v1.patch

Attaching simple patch. 

> [WINDOWS] Fix filesystem / classloader related unit tests
> -
>
> Key: HBASE-9908
> URL: https://issues.apache.org/jira/browse/HBASE-9908
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9908_v1.patch
>
>
> Some of the unit tests related to classloasing and filesystem are failing on 
> windows. 
> {code}
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testHBase3810
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLocalFS
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testPrivateClassLoader
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromRelativeLibDirInJar
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLibDirInJar
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromHDFS
> org.apache.hadoop.hbase.backup.TestHFileArchiving.testCleaningRace
> org.apache.hadoop.hbase.regionserver.wal.TestDurability.testDurability
> org.apache.hadoop.hbase.regionserver.wal.TestHLog.testMaintainOrderWithConcurrentWrites
> org.apache.hadoop.hbase.security.access.TestAccessController.testBulkLoad
> org.apache.hadoop.hbase.regionserver.TestHRegion.testRecoveredEditsReplayCompaction
> org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait.testRecoveredEditsReplayCompaction
> org.apache.hadoop.hbase.util.TestFSUtils.testRenameAndSetModifyTime
> {code}
> The root causes are: 
>  - Using local file name for referring to hdfs paths (HBASE-6830)
>  - Classloader using the wrong file system 
>  - StoreFile readers not being closed (for unfinished compaction)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9908) [WINDOWS] Fix filesystem / classloader related unit tests

2013-11-06 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-9908:
-

Status: Patch Available  (was: Open)

> [WINDOWS] Fix filesystem / classloader related unit tests
> -
>
> Key: HBASE-9908
> URL: https://issues.apache.org/jira/browse/HBASE-9908
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9908_v1.patch
>
>
> Some of the unit tests related to classloasing and filesystem are failing on 
> windows. 
> {code}
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testHBase3810
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLocalFS
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testPrivateClassLoader
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromRelativeLibDirInJar
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLibDirInJar
> org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromHDFS
> org.apache.hadoop.hbase.backup.TestHFileArchiving.testCleaningRace
> org.apache.hadoop.hbase.regionserver.wal.TestDurability.testDurability
> org.apache.hadoop.hbase.regionserver.wal.TestHLog.testMaintainOrderWithConcurrentWrites
> org.apache.hadoop.hbase.security.access.TestAccessController.testBulkLoad
> org.apache.hadoop.hbase.regionserver.TestHRegion.testRecoveredEditsReplayCompaction
> org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait.testRecoveredEditsReplayCompaction
> org.apache.hadoop.hbase.util.TestFSUtils.testRenameAndSetModifyTime
> {code}
> The root causes are: 
>  - Using local file name for referring to hdfs paths (HBASE-6830)
>  - Classloader using the wrong file system 
>  - StoreFile readers not being closed (for unfinished compaction)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9908) [WINDOWS] Fix filesystem / classloader related unit tests

2013-11-06 Thread Enis Soztutar (JIRA)
Enis Soztutar created HBASE-9908:


 Summary: [WINDOWS] Fix filesystem / classloader related unit tests
 Key: HBASE-9908
 URL: https://issues.apache.org/jira/browse/HBASE-9908
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1


Some of the unit tests related to classloasing and filesystem are failing on 
windows. 

{code}
org.apache.hadoop.hbase.coprocessor.TestClassLoading.testHBase3810
org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLocalFS
org.apache.hadoop.hbase.coprocessor.TestClassLoading.testPrivateClassLoader
org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromRelativeLibDirInJar
org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLibDirInJar
org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromHDFS
org.apache.hadoop.hbase.backup.TestHFileArchiving.testCleaningRace
org.apache.hadoop.hbase.regionserver.wal.TestDurability.testDurability
org.apache.hadoop.hbase.regionserver.wal.TestHLog.testMaintainOrderWithConcurrentWrites
org.apache.hadoop.hbase.security.access.TestAccessController.testBulkLoad
org.apache.hadoop.hbase.regionserver.TestHRegion.testRecoveredEditsReplayCompaction
org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait.testRecoveredEditsReplayCompaction
org.apache.hadoop.hbase.util.TestFSUtils.testRenameAndSetModifyTime
{code}

The root causes are: 
 - Using local file name for referring to hdfs paths (HBASE-6830)
 - Classloader using the wrong file system 
 - StoreFile readers not being closed (for unfinished compaction)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815520#comment-13815520
 ] 

Hadoop QA commented on HBASE-9818:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612473/9818-v5.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:488)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7764//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7764//console

This message is automatically generated.

> NPE in HFileBlock#AbstractFSReader#readAtOffset
> ---
>
> Key: HBASE-9818
> URL: https://issues.apache.org/jira/browse/HBASE-9818
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Ted Yu
> Attachments: 9818-v2.txt, 9818-v3.txt, 9818-v4.txt, 9818-v5.txt
>
>
> HFileBlock#istream seems to be null.  I was wondering should we hide 
> FSDataInputStreamWrapper#useHBaseChecksum.
> By the way, this happened when online schema change is enabled (encoding)
> {noformat}
> 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
> regionserver.HRegionServer:
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:55

[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815518#comment-13815518
 ] 

Matteo Bertozzi commented on HBASE-9906:


if we don't have the ts fix, the sleep sounds ok to me

> Restore snapshot fails to restore the meta edits sporadically  
> ---
>
> Key: HBASE-9906
> URL: https://issues.apache.org/jira/browse/HBASE-9906
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
> Attachments: hbase-9906_v1.patch
>
>
> After snaphot restore, we see failures to find the table in meta:
> {code}
> > disable 'tablefour'
> > restore_snapshot 'snapshot_tablefour'
> > enable 'tablefour'
> ERROR: Table tablefour does not exist.'
> {code}
> This is quite subtle. From the looks of it, we successfully restore the 
> snapshot, do the meta updates, return to the client about the status. The 
> client then tries to do an operation for the table (like enable table, or 
> scan in the test outputs) which fails because the meta entry for the region 
> seems to be gone (in case of single region, the table will be reported 
> missing). Subsequent attempts for creating the table will also fail because 
> the table directories will be there, but not the meta entries.
> For restoring meta entries, we are doing a delete then a put to the same 
> region:
> {code}
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
> 76d0e2b7ec3291afcaa82e18a56ccc30
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
> fa41edf43fe3ee131db4a34b848ff432
> ...
> 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Deleted [{ENCODED => fa41edf43fe3ee131db4a34b848ff432, NAME => 
> 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
> => '', ENDKEY => ''}, {ENCODED => 76d0e2b7ec3291afcaa82e18a56ccc30, NAME => 
> 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
> 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added 1
> {code}
> The root cause for this sporadic failure is that, the delete and subsequent 
> put will have the same timestamp if they execute in the same ms. The delete 
> will override the put in the same ts, even though the put have a larger ts.
> See: HBASE-9905, HBASE-8770
> Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815510#comment-13815510
 ] 

Enis Soztutar commented on HBASE-9906:
--

Agreed that sleep is stupid, but without major surgery (uniqueTs, etc), and 
fixes to HBASE-9770, this seems to be the best option. [~mbertozzi], [~jmhsieh] 
mind taking a look? Thanks. 

> Restore snapshot fails to restore the meta edits sporadically  
> ---
>
> Key: HBASE-9906
> URL: https://issues.apache.org/jira/browse/HBASE-9906
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
> Attachments: hbase-9906_v1.patch
>
>
> After snaphot restore, we see failures to find the table in meta:
> {code}
> > disable 'tablefour'
> > restore_snapshot 'snapshot_tablefour'
> > enable 'tablefour'
> ERROR: Table tablefour does not exist.'
> {code}
> This is quite subtle. From the looks of it, we successfully restore the 
> snapshot, do the meta updates, return to the client about the status. The 
> client then tries to do an operation for the table (like enable table, or 
> scan in the test outputs) which fails because the meta entry for the region 
> seems to be gone (in case of single region, the table will be reported 
> missing). Subsequent attempts for creating the table will also fail because 
> the table directories will be there, but not the meta entries.
> For restoring meta entries, we are doing a delete then a put to the same 
> region:
> {code}
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
> 76d0e2b7ec3291afcaa82e18a56ccc30
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
> fa41edf43fe3ee131db4a34b848ff432
> ...
> 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Deleted [{ENCODED => fa41edf43fe3ee131db4a34b848ff432, NAME => 
> 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
> => '', ENDKEY => ''}, {ENCODED => 76d0e2b7ec3291afcaa82e18a56ccc30, NAME => 
> 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
> 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added 1
> {code}
> The root cause for this sporadic failure is that, the delete and subsequent 
> put will have the same timestamp if they execute in the same ms. The delete 
> will override the put in the same ts, even though the put have a larger ts.
> See: HBASE-9905, HBASE-8770
> Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9890) MR jobs are not working if started by a delegated user

2013-11-06 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-9890:
---

Attachment: HBASE-9890-v2.patch

what about something like v2... 
* fetch the ClusterId
* use AuthenticationTokenSelector to select the token based on the clusterId
* if the token is not present ask for a new one

Is there a way to get the ClusterId without connecting to zookeeper? should I 
try to get the token without connecting to zookeeper if we've only one token 
and there is no quorum address specified?

> MR jobs are not working if started by a delegated user
> --
>
> Key: HBASE-9890
> URL: https://issues.apache.org/jira/browse/HBASE-9890
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce, security
>Affects Versions: 0.98.0, 0.94.12, 0.96.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Fix For: 0.98.0, 0.94.13, 0.96.1
>
> Attachments: HBASE-9890-94-v0.patch, HBASE-9890-94-v1.patch, 
> HBASE-9890-v0.patch, HBASE-9890-v1.patch, HBASE-9890-v2.patch
>
>
> If Map-Reduce jobs are started with by a proxy user that has already the 
> delegation tokens, we get an exception on "obtain token" since the proxy user 
> doesn't have the kerberos auth.
> For example:
>  * If we use oozie to execute RowCounter - oozie will get the tokens required 
> (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter 
> tries to obtain the token, it will get an exception.
>  * If we use oozie to execute LoadIncrementalHFiles - oozie will get the 
> tokens required (HDFS_DELEGATION_TOKEN) and it will start the 
> LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the 
> token, it will get an exception.
> {code}
>  org.apache.hadoop.hbase.security.AccessDeniedException: Token generation 
> only allowed for Kerberos authenticated clients
> at 
> org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87)
> {code}
> {code}
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
> can be issued only with kerberos or web authentication
>   at 
> org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868)
>   at 
> org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509)
>   at 
> org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85)
>   at 
> org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949)
>   at 
> org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854)
>   at 
> org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
>   at 
> org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node

2013-11-06 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815502#comment-13815502
 ] 

Nick Dimiduk commented on HBASE-9892:
-

Hi [~liushaohui]

I cannot reproduce the problem you describe on 0.96 or trunk. There it appears 
the info port comes from ServerName or ServerLoad objects for live server link 
or dead server list, respectively. Could we not backport an existing fix rather 
than add all this new plumbing?

For context, I have multiple RS processes deployed on each RS host. Their RPC 
and info ports are set explicitly for each process. Perhaps your configuration 
is different?

> Add info port to ServerName to support multi instances in a node
> 
>
> Key: HBASE-9892
> URL: https://issues.apache.org/jira/browse/HBASE-9892
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, 
> HBASE-9892-0.94-v3.diff
>
>
> The full GC time of  regionserver with big heap(> 30G ) usually  can not be 
> controlled in 30s. At the same time, the servers with 64G memory are normal. 
> So we try to deploy multi rs instances(2-3 ) in a single node and the heap of 
> each rs is about 20G ~ 24G.
> Most of the things works fine, except the hbase web ui. The master get the RS 
> info port from conf, which is suitable for this situation of multi rs  
> instances in a node. So we add info port to ServerName.
> a. at the startup, rs report it's info port to Hmaster.
> b, For root region, rs write the servername with info port ro the zookeeper 
> root-region-server node.
> c, For meta regions, rs write the servername with info port to root region 
> d. For user regions,  rs write the servername with info port to meta regions 
> So hmaster and client can get info port from the servername.
> To test this feature, I change the rs num from 1 to 3 in standalone mode, so 
> we can test it in standalone mode,
> I think Hoya(hbase on yarn) will encounter the same problem.  Anyone knows 
> how Hoya handle this problem?
> PS: There are  different formats for servername in zk node and meta table, i 
> think we need to unify it and refactor the code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815492#comment-13815492
 ] 

Sergey Shelukhin commented on HBASE-9906:
-

Btw another option is uniqueTs. I am -0 on sleep...

> Restore snapshot fails to restore the meta edits sporadically  
> ---
>
> Key: HBASE-9906
> URL: https://issues.apache.org/jira/browse/HBASE-9906
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
> Attachments: hbase-9906_v1.patch
>
>
> After snaphot restore, we see failures to find the table in meta:
> {code}
> > disable 'tablefour'
> > restore_snapshot 'snapshot_tablefour'
> > enable 'tablefour'
> ERROR: Table tablefour does not exist.'
> {code}
> This is quite subtle. From the looks of it, we successfully restore the 
> snapshot, do the meta updates, return to the client about the status. The 
> client then tries to do an operation for the table (like enable table, or 
> scan in the test outputs) which fails because the meta entry for the region 
> seems to be gone (in case of single region, the table will be reported 
> missing). Subsequent attempts for creating the table will also fail because 
> the table directories will be there, but not the meta entries.
> For restoring meta entries, we are doing a delete then a put to the same 
> region:
> {code}
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
> 76d0e2b7ec3291afcaa82e18a56ccc30
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
> fa41edf43fe3ee131db4a34b848ff432
> ...
> 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Deleted [{ENCODED => fa41edf43fe3ee131db4a34b848ff432, NAME => 
> 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
> => '', ENDKEY => ''}, {ENCODED => 76d0e2b7ec3291afcaa82e18a56ccc30, NAME => 
> 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
> 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added 1
> {code}
> The root cause for this sporadic failure is that, the delete and subsequent 
> put will have the same timestamp if they execute in the same ms. The delete 
> will override the put in the same ts, even though the put have a larger ts.
> See: HBASE-9905, HBASE-8770
> Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9775) Client write path perf issues

2013-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815493#comment-13815493
 ] 

stack commented on HBASE-9775:
--

Back to the root discussion on this issue:

bq. with a max.total.tasks of 100 and max.perserver.tasks of 5, the client 
might not use all the server. May be a default of 2 for max.perserver.tasks 
would be better

That'll work if many servers right but will be a constraint if only a few 
servers and a few clients. In that we will only schedule two tasks at most to 
each server when it could take much more.

Ideally we want something like what you had before -- 5 or 1/2 the CPUs on the 
local server as guesstimate of how many CPUs the server has, which ever is 
greater-- and then soon as we get indications that server is struggling, go 
down from this max per server and slowly ramp back up as we have successful ops 
against said server (How drastic the drop in tasks-per-server should be would 
depend on the exception we'd gotten from the server).

bq. the server reject the client when it's busy (HBASE-9467). That increases 
the number of retries to do, and, on an heavy load, can lead us to fail on 
something that would have worked before.

We only reject as 'busy' when we can't obtain lock after an amount of time and 
if we are trying to flush because we are up against the global mem limit.  
Regards retries, if we get one of these RegionTooBusyExceptions, rather than 
back off for a 100ms or so, should we back off more (an Elliott suggestion)?  
And drop the number of tasks to throw at this server at any one time.   It'd be 
hard to do as things are now given backoff is calculated based off retry count 
only.

Give the two items above, we should keep more stats per server than just count 
of tasks?  We should keep a history of success/error and do backoffs -- both 
amount of time and how many tasks to send the server -- based on this?

bq. For example, the new settings will make the client to send 4 queries in 
1 second

Yeah, that is not going to help anyone.

bq. If we want to compare 0.94 and 0.96, may be we should use the same 
settings, i.e. pause: 1000ms backoff: { 1, 1, 1, 2, 2, 4, 4, 8, 16, 32, 64 } 
hbase.client.max.perserver.tasks: 1

Seems like good idea.

[~nkeywal] What you think of the [~jeffreyz] patch?

[~jmspaggi] Any luck run perf test?

We got our big cluster back so we'll start in on this one again.

In single client, if many regions, I see the client threads blocked waiting to 
do locateRegionInMeta (I don't understand this regionLockObject... it locks 
everyone out while a lookup is going on rather than threads contending on the 
same region location).  If there are few regions, we are doing softvaluemap 
operations all the time.








> Client write path perf issues
> -
>
> Key: HBASE-9775
> URL: https://issues.apache.org/jira/browse/HBASE-9775
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.96.0
>Reporter: Elliott Clark
>Priority: Critical
> Attachments: 9775.rig.txt, 9775.rig.v2.patch, 9775.rig.v3.patch, 
> Charts Search   Cloudera Manager - ITBLL.png, Charts Search   Cloudera 
> Manager.png, hbase-9775.patch, job_run.log, short_ycsb.png, ycsb.png, 
> ycsb_insert_94_vs_96.png
>
>
> Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9047) Tool to handle finishing replication when the cluster is offline

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815491#comment-13815491
 ] 

Hadoop QA commented on HBASE-9047:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12612459/HBASE-9047-trunk-v4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:488)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7762//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7762//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7762//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7762//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7762//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7762//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7762//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7762//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7762//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7762//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7762//console

This message is automatically generated.

> Tool to handle finishing replication when the cluster is offline
> 
>
> Key: HBASE-9047
> URL: https://issues.apache.org/jira/browse/HBASE-9047
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.96.0
>Reporter: Jean-Daniel Cryans
>Assignee: Demai Ni
> Fix For: 0.98.0
>
> Attachments: HBASE-9047-0.94.9-v0.PATCH, HBASE-9047-trunk-v0.patch, 
> HBASE-9047-trunk-v1.patch, HBASE-9047-trunk-v2.patch, 
> HBASE-9047-trunk-v3.patch, HBASE-9047-trunk-v4.patch
>
>
> We're having a discussion on the mailing list about replicating the data on a 
> cluster that was shut down in an offline fashion. The motivation could be 
> that you don't want to bring HBase back up but still need that data on the 
> slave.
> So I have this idea of a tool that would be running on the master cluster 
> while it is down, although it could also run at any time. Basically it would 
> be able to read the replication state of each master region server, finish 
> replicating what's missing to all the slave, and then clear that state in 
> zookeeper.
> The code that handles replication does most of that already, see 
> ReplicationSourceManager and ReplicationSource. Basically when 
> ReplicationSourceManager.init() is called, it will check all the queues in ZK 
> and try to grab those that aren't attached to a region server. If the whole 
> cluster is down, it will grab all of them.
> The beautiful thing here is that you could start that tool on all your 
> machines and the load will be spread out, but that might not be a big concern 
> i

[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815475#comment-13815475
 ] 

Sergey Shelukhin commented on HBASE-9906:
-

You can use the power of out-of-order ts by doing puts first, getting the ts, 
and then doing deletes at that ts minus 1 :) although iirc meta might break 
because of that, because the key-before code optimizes by assuming no 
out-of-order ts across files.

> Restore snapshot fails to restore the meta edits sporadically  
> ---
>
> Key: HBASE-9906
> URL: https://issues.apache.org/jira/browse/HBASE-9906
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
> Attachments: hbase-9906_v1.patch
>
>
> After snaphot restore, we see failures to find the table in meta:
> {code}
> > disable 'tablefour'
> > restore_snapshot 'snapshot_tablefour'
> > enable 'tablefour'
> ERROR: Table tablefour does not exist.'
> {code}
> This is quite subtle. From the looks of it, we successfully restore the 
> snapshot, do the meta updates, return to the client about the status. The 
> client then tries to do an operation for the table (like enable table, or 
> scan in the test outputs) which fails because the meta entry for the region 
> seems to be gone (in case of single region, the table will be reported 
> missing). Subsequent attempts for creating the table will also fail because 
> the table directories will be there, but not the meta entries.
> For restoring meta entries, we are doing a delete then a put to the same 
> region:
> {code}
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
> 76d0e2b7ec3291afcaa82e18a56ccc30
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
> fa41edf43fe3ee131db4a34b848ff432
> ...
> 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Deleted [{ENCODED => fa41edf43fe3ee131db4a34b848ff432, NAME => 
> 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
> => '', ENDKEY => ''}, {ENCODED => 76d0e2b7ec3291afcaa82e18a56ccc30, NAME => 
> 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
> 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added 1
> {code}
> The root cause for this sporadic failure is that, the delete and subsequent 
> put will have the same timestamp if they execute in the same ms. The delete 
> will override the put in the same ts, even though the put have a larger ts.
> See: HBASE-9905, HBASE-8770
> Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-9906:
-

Attachment: hbase-9906_v1.patch

Patch for option (2). 

> Restore snapshot fails to restore the meta edits sporadically  
> ---
>
> Key: HBASE-9906
> URL: https://issues.apache.org/jira/browse/HBASE-9906
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
> Attachments: hbase-9906_v1.patch
>
>
> After snaphot restore, we see failures to find the table in meta:
> {code}
> > disable 'tablefour'
> > restore_snapshot 'snapshot_tablefour'
> > enable 'tablefour'
> ERROR: Table tablefour does not exist.'
> {code}
> This is quite subtle. From the looks of it, we successfully restore the 
> snapshot, do the meta updates, return to the client about the status. The 
> client then tries to do an operation for the table (like enable table, or 
> scan in the test outputs) which fails because the meta entry for the region 
> seems to be gone (in case of single region, the table will be reported 
> missing). Subsequent attempts for creating the table will also fail because 
> the table directories will be there, but not the meta entries.
> For restoring meta entries, we are doing a delete then a put to the same 
> region:
> {code}
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
> 76d0e2b7ec3291afcaa82e18a56ccc30
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
> fa41edf43fe3ee131db4a34b848ff432
> ...
> 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Deleted [{ENCODED => fa41edf43fe3ee131db4a34b848ff432, NAME => 
> 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
> => '', ENDKEY => ''}, {ENCODED => 76d0e2b7ec3291afcaa82e18a56ccc30, NAME => 
> 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
> 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added 1
> {code}
> The root cause for this sporadic failure is that, the delete and subsequent 
> put will have the same timestamp if they execute in the same ms. The delete 
> will override the put in the same ts, even though the put have a larger ts.
> See: HBASE-9905, HBASE-8770
> Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-9906:
-

Status: Patch Available  (was: Open)

> Restore snapshot fails to restore the meta edits sporadically  
> ---
>
> Key: HBASE-9906
> URL: https://issues.apache.org/jira/browse/HBASE-9906
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
> Attachments: hbase-9906_v1.patch
>
>
> After snaphot restore, we see failures to find the table in meta:
> {code}
> > disable 'tablefour'
> > restore_snapshot 'snapshot_tablefour'
> > enable 'tablefour'
> ERROR: Table tablefour does not exist.'
> {code}
> This is quite subtle. From the looks of it, we successfully restore the 
> snapshot, do the meta updates, return to the client about the status. The 
> client then tries to do an operation for the table (like enable table, or 
> scan in the test outputs) which fails because the meta entry for the region 
> seems to be gone (in case of single region, the table will be reported 
> missing). Subsequent attempts for creating the table will also fail because 
> the table directories will be there, but not the meta entries.
> For restoring meta entries, we are doing a delete then a put to the same 
> region:
> {code}
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
> 76d0e2b7ec3291afcaa82e18a56ccc30
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
> fa41edf43fe3ee131db4a34b848ff432
> ...
> 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Deleted [{ENCODED => fa41edf43fe3ee131db4a34b848ff432, NAME => 
> 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
> => '', ENDKEY => ''}, {ENCODED => 76d0e2b7ec3291afcaa82e18a56ccc30, NAME => 
> 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
> 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added 1
> {code}
> The root cause for this sporadic failure is that, the delete and subsequent 
> put will have the same timestamp if they execute in the same ms. The delete 
> will override the put in the same ts, even though the put have a larger ts.
> See: HBASE-9905, HBASE-8770
> Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815463#comment-13815463
 ] 

Enis Soztutar commented on HBASE-9906:
--

Out of the above options, (1) will take some time to fix. (3) has another 
problem because we would be intermixing client-supplied timestamps and server 
supplied tss, which might cause further problems in meta, if clocks are out of 
sync. (4) is not ideal as well, since we want to delete the whole row, except 
for column info:regioninfo. For this we have to do a get for obtaining the 
columns for each row, and send deletes for each row. So that leaves us with 
option (2), which is embarrassing, but given that restore is very infrequent, 
that we can justify sleeping extra 20ms.  

> Restore snapshot fails to restore the meta edits sporadically  
> ---
>
> Key: HBASE-9906
> URL: https://issues.apache.org/jira/browse/HBASE-9906
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
>
> After snaphot restore, we see failures to find the table in meta:
> {code}
> > disable 'tablefour'
> > restore_snapshot 'snapshot_tablefour'
> > enable 'tablefour'
> ERROR: Table tablefour does not exist.'
> {code}
> This is quite subtle. From the looks of it, we successfully restore the 
> snapshot, do the meta updates, return to the client about the status. The 
> client then tries to do an operation for the table (like enable table, or 
> scan in the test outputs) which fails because the meta entry for the region 
> seems to be gone (in case of single region, the table will be reported 
> missing). Subsequent attempts for creating the table will also fail because 
> the table directories will be there, but not the meta entries.
> For restoring meta entries, we are doing a delete then a put to the same 
> region:
> {code}
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
> 76d0e2b7ec3291afcaa82e18a56ccc30
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
> fa41edf43fe3ee131db4a34b848ff432
> ...
> 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Deleted [{ENCODED => fa41edf43fe3ee131db4a34b848ff432, NAME => 
> 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
> => '', ENDKEY => ''}, {ENCODED => 76d0e2b7ec3291afcaa82e18a56ccc30, NAME => 
> 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
> 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added 1
> {code}
> The root cause for this sporadic failure is that, the delete and subsequent 
> put will have the same timestamp if they execute in the same ms. The delete 
> will override the put in the same ts, even though the put have a larger ts.
> See: HBASE-9905, HBASE-8770
> Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9885) Avoid some Result creation in protobuf conversions

2013-11-06 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815446#comment-13815446
 ] 

Nicolas Liochon commented on HBASE-9885:


SInce the commit, the precommit env bacame flaky and it seems that surefire 
cannot parse the tests results. Let see if there is a relation by reverting.

> Avoid some Result creation in protobuf conversions
> --
>
> Key: HBASE-9885
> URL: https://issues.apache.org/jira/browse/HBASE-9885
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Protobufs, regionserver
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.98.0, 0.96.1
>
> Attachments: 9885.v1.patch, 9885.v2, 9885.v2.patch, 9885.v3.patch, 
> 9885.v3.patch
>
>
> We creates a lot of Result that we could avoid, as they contain nothing else 
> than a boolean value. We create sometimes a protobuf builder as well on this 
> path, this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user

2013-11-06 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815447#comment-13815447
 ] 

Gary Helmling commented on HBASE-9890:
--

In the case that Francis points out, whether using CopyTable or something 
custom, you would actually have more than one token of type HBASE_AUTH_TOKEN.  
Does Oozie support running CopyTable between two clusters?  If so, it needs to 
fetch the delegation token for each, but this patch wouldn't pass along both, 
only the first that it sees.  Obtaining the token from UGI by type alone does 
not guarantee it is associated with the given cluster.  That need to match the 
token service against the cluster ID.

In fact, I think the change as it is will cause CopyTable between 2 secure 
HBase clusters to fail.  The change in this section of 
o.a.h.h.mapreduce.TableMapReduce.initCredentials() is the problem:
{code}
  try {
// init credentials for remote cluster
String quorumAddress = 
job.getConfiguration().get(TableOutputFormat.QUORUM_ADDRESS);
if (quorumAddress != null) {
  Configuration peerConf = 
HBaseConfiguration.create(job.getConfiguration());
  ZKUtil.applyClusterKeyToConf(peerConf, quorumAddress);
-  userProvider.getCurrent().obtainAuthTokenForJob(peerConf, job);
+  user.obtainAuthTokenForJob(peerConf, job);
 }
-
userProvider.getCurrent().obtainAuthTokenForJob(job.getConfiguration(), job);
+
+Token authToken = 
user.getToken(AuthenticationTokenIdentifier.AUTH_TOKEN_TYPE.toString());
+if (authToken == null) {
{code}

When running between 2 secure clusters, we'll obtain a token against one 
cluster (using the config value of TableOutputFormat.QUORUM_ADDRESS), then the 
following call to user.getToken("HBASE_AUTH_TOKEN") will return the token just 
obtained, so we never fetch the second token.

You can use AuthenticationTokenSelector.selectToken() to pull out the correct 
token for a given cluster.  But first you will need the cluster ID for the 
cluster you're connecting to.

> MR jobs are not working if started by a delegated user
> --
>
> Key: HBASE-9890
> URL: https://issues.apache.org/jira/browse/HBASE-9890
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce, security
>Affects Versions: 0.98.0, 0.94.12, 0.96.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Fix For: 0.98.0, 0.94.13, 0.96.1
>
> Attachments: HBASE-9890-94-v0.patch, HBASE-9890-94-v1.patch, 
> HBASE-9890-v0.patch, HBASE-9890-v1.patch
>
>
> If Map-Reduce jobs are started with by a proxy user that has already the 
> delegation tokens, we get an exception on "obtain token" since the proxy user 
> doesn't have the kerberos auth.
> For example:
>  * If we use oozie to execute RowCounter - oozie will get the tokens required 
> (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter 
> tries to obtain the token, it will get an exception.
>  * If we use oozie to execute LoadIncrementalHFiles - oozie will get the 
> tokens required (HDFS_DELEGATION_TOKEN) and it will start the 
> LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the 
> token, it will get an exception.
> {code}
>  org.apache.hadoop.hbase.security.AccessDeniedException: Token generation 
> only allowed for Kerberos authenticated clients
> at 
> org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87)
> {code}
> {code}
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
> can be issued only with kerberos or web authentication
>   at 
> org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868)
>   at 
> org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509)
>   at 
> org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85)
>   at 
> org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949)
>   at 
> org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854)
>   at 
> org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
>   at org.apache.hadoop.mapreduce.Job

[jira] [Reopened] (HBASE-9885) Avoid some Result creation in protobuf conversions

2013-11-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon reopened HBASE-9885:



> Avoid some Result creation in protobuf conversions
> --
>
> Key: HBASE-9885
> URL: https://issues.apache.org/jira/browse/HBASE-9885
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Protobufs, regionserver
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.98.0, 0.96.1
>
> Attachments: 9885.v1.patch, 9885.v2, 9885.v2.patch, 9885.v3.patch, 
> 9885.v3.patch
>
>
> We creates a lot of Result that we could avoid, as they contain nothing else 
> than a boolean value. We create sometimes a protobuf builder as well on this 
> path, this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset

2013-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9818:
--

Attachment: 9818-v5.txt

> NPE in HFileBlock#AbstractFSReader#readAtOffset
> ---
>
> Key: HBASE-9818
> URL: https://issues.apache.org/jira/browse/HBASE-9818
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Ted Yu
> Attachments: 9818-v2.txt, 9818-v3.txt, 9818-v4.txt, 9818-v5.txt
>
>
> HFileBlock#istream seems to be null.  I was wondering should we hide 
> FSDataInputStreamWrapper#useHBaseChecksum.
> By the way, this happened when online schema change is enabled (encoding)
> {noformat}
> 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
> regionserver.HRegionServer:
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
> at java.lang.Thread.run(Thread.java:724)
> 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
> regionserver.HRegionServer:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
> nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
> request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
> false next_call_seq: 53437
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset

2013-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9818:
--

Attachment: (was: 9818-v5.txt)

> NPE in HFileBlock#AbstractFSReader#readAtOffset
> ---
>
> Key: HBASE-9818
> URL: https://issues.apache.org/jira/browse/HBASE-9818
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Ted Yu
> Attachments: 9818-v2.txt, 9818-v3.txt, 9818-v4.txt, 9818-v5.txt
>
>
> HFileBlock#istream seems to be null.  I was wondering should we hide 
> FSDataInputStreamWrapper#useHBaseChecksum.
> By the way, this happened when online schema change is enabled (encoding)
> {noformat}
> 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
> regionserver.HRegionServer:
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
> at java.lang.Thread.run(Thread.java:724)
> 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
> regionserver.HRegionServer:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
> nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
> request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
> false next_call_seq: 53437
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
> at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#

[jira] [Commented] (HBASE-9879) Can't undelete a KeyValue

2013-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815432#comment-13815432
 ] 

Enis Soztutar commented on HBASE-9879:
--

bq. There was support in the recent PMC meeting for deprecating client set 
timestamps. Existing tables would grandfather a setting that allows user set 
timestamps but new tables would not allow them. Allowing clients to (ab)use 
cell timestamps leads to several problems not just this known issue.
Opened HBASE-9905 for discussing that. 

> Can't undelete a KeyValue
> -
>
> Key: HBASE-9879
> URL: https://issues.apache.org/jira/browse/HBASE-9879
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.0
>Reporter: Benoit Sigoure
>
> Test scenario:
> put(KV, timestamp=100)
> put(KV, timestamp=200)
> delete(KV, timestamp=200, with MutationProto.DeleteType.DELETE_ONE_VERSION)
> get(KV) => returns value at timestamp=100 (OK)
> put(KV, timestamp=200)
> get(KV) => returns value at timestamp=100 (but not the one at timestamp=200 
> that was "reborn" by the previous put)
> Is that normal?
> I ran into this bug while running the integration tests at 
> https://github.com/OpenTSDB/asynchbase/pull/60 – the first time you run it, 
> it passes, but after that, it keeps failing.  Sorry I don't have the 
> corresponding HTable-based code but that should be fairly easy to write.
> I only tested this with 0.96.0, dunno yet how this behaved in prior releases.
> My hunch is that the tombstone added by the DELETE_ONE_VERSION keeps 
> shadowing the value even after it's reborn.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5583) Master restart on create table with splitkeys does not recreate table with all the splitkey regions

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815420#comment-13815420
 ] 

Hadoop QA commented on HBASE-5583:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12576723/HBASE-5583_new_1_review.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7763//console

This message is automatically generated.

> Master restart on create table with splitkeys does not recreate table with 
> all the splitkey regions
> ---
>
> Key: HBASE-5583
> URL: https://issues.apache.org/jira/browse/HBASE-5583
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.96.1
>
> Attachments: HBASE-5583_new_1.patch, HBASE-5583_new_1_review.patch, 
> HBASE-5583_new_2.patch, HBASE-5583_new_4_WIP.patch, 
> HBASE-5583_new_5_WIP_using_tableznode.patch
>
>
> -> Create table using splitkeys
> -> MAster goes down before all regions are added to meta
> -> On master restart the table is again enabled but with less number of 
> regions than specified in splitkeys
> Anyway client will get an exception if i had called sync create table.  But 
> table exists or not check will say table exists. 
> Is this scenario to be handled by client only or can we have some mechanism 
> on the master side for this? Pls suggest.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815418#comment-13815418
 ] 

Enis Soztutar commented on HBASE-9906:
--

We can fix this issue by: 
  - Fix either HBASE-9905 or HBASE-8770 or HBASE-9879
  - Add a sleep(20) between meta delete and update
  - obtain a ts from the client, and do the delete with that ts, and puts with 
ts+1 
  - change the meta delete to only delete columns not needed. The subsequent 
put will override the column values anyway. 


> Restore snapshot fails to restore the meta edits sporadically  
> ---
>
> Key: HBASE-9906
> URL: https://issues.apache.org/jira/browse/HBASE-9906
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
>
> After snaphot restore, we see failures to find the table in meta:
> {code}
> > disable 'tablefour'
> > restore_snapshot 'snapshot_tablefour'
> > enable 'tablefour'
> ERROR: Table tablefour does not exist.'
> {code}
> This is quite subtle. From the looks of it, we successfully restore the 
> snapshot, do the meta updates, return to the client about the status. The 
> client then tries to do an operation for the table (like enable table, or 
> scan in the test outputs) which fails because the meta entry for the region 
> seems to be gone (in case of single region, the table will be reported 
> missing). Subsequent attempts for creating the table will also fail because 
> the table directories will be there, but not the meta entries.
> For restoring meta entries, we are doing a delete then a put to the same 
> region:
> {code}
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
> 76d0e2b7ec3291afcaa82e18a56ccc30
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
> fa41edf43fe3ee131db4a34b848ff432
> ...
> 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Deleted [{ENCODED => fa41edf43fe3ee131db4a34b848ff432, NAME => 
> 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
> => '', ENDKEY => ''}, {ENCODED => 76d0e2b7ec3291afcaa82e18a56ccc30, NAME => 
> 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
> 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added 1
> {code}
> The root cause for this sporadic failure is that, the delete and subsequent 
> put will have the same timestamp if they execute in the same ms. The delete 
> will override the put in the same ts, even though the put have a larger ts.
> See: HBASE-9905, HBASE-8770
> Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9905) Enable using seqId as timestamp

2013-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815414#comment-13815414
 ] 

Enis Soztutar commented on HBASE-9905:
--

Some offline discussion with Sergey, we probably need a couple of "modes" per 
table for "timestamp mode" : 
  - mode_seqid   : The server supplies seqId to the cells. If ts is set 
for puts, the server will throw IllegelArgumentException. 
  - mode_server_ts : The server supplies ts to the cells from the wall 
clock. If ts is set for puts, the server will throw IllegelArgumentException. 
  - mode_client_ts   : The client always supplies the timestamps (from 
clock or from ts oracle). The server throws exception if the cell does not have 
timestamp
  - mode_mixed  : Will operate similarly to current semantics. Will be 
deprecated. 

mode_server_ts is a special case for mode_mixed, and may not be needed. 

> Enable using seqId as timestamp 
> 
>
> Key: HBASE-9905
> URL: https://issues.apache.org/jira/browse/HBASE-9905
> Project: HBase
>  Issue Type: New Feature
>Reporter: Enis Soztutar
> Fix For: 0.98.0
>
>
> This has been discussed previously, and Lars H. was mentioning an idea from 
> the client to declare whether timestamps are used or not explicitly. 
> The problem is that, for data models not using timestamps, we are still 
> relying on clocks to order the updates. Clock skew, same milisecond puts 
> after deletes, etc can cause unexpected behavior and data not being visible.  
> We should have a table descriptor / family property, which would declare that 
> the data model does not use timestamps. Then we can populate this dimension 
> with the seqId, so that global ordering of edits are not effected by wall 
> clock. 
> For example, META will use this. 
> Once we have something like this, we can think of making it default for new 
> tables, so that the unknowing user will not shoot herself in the foot. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9907) Rig to fake a cluster so can profile client behaviors

2013-11-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9907:
-

Affects Version/s: 0.96.0
   Status: Patch Available  (was: Open)

> Rig to fake a cluster so can profile client behaviors
> -
>
> Key: HBASE-9907
> URL: https://issues.apache.org/jira/browse/HBASE-9907
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Affects Versions: 0.96.0
>Reporter: stack
>Assignee: stack
> Fix For: 0.98.0, 0.96.1
>
>
> Patch carried over from HBASE-9775 parent issue.  Adds to the 
> TestClientNoCluster#main a rig that allows faking many clients against a few 
> servers and the opposite.  Useful for studying client operation.
> Includes a few changes to pb makings to try and save on a few creations.
> Also has an edit of javadoc on how to create an HConnection and HTable trying 
> to be more forceful about pointing you in right direction ([~lhofhansl] -- 
> mind reviewing these javadoc changes?)
> I have a +1 already on this patch up in parent issue.  Will run by hadoopqa 
> to make sure all good before commit.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9907) Rig to fake a cluster so can profile client behaviors

2013-11-06 Thread stack (JIRA)
stack created HBASE-9907:


 Summary: Rig to fake a cluster so can profile client behaviors
 Key: HBASE-9907
 URL: https://issues.apache.org/jira/browse/HBASE-9907
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack
 Fix For: 0.98.0, 0.96.1


Patch carried over from HBASE-9775 parent issue.  Adds to the 
TestClientNoCluster#main a rig that allows faking many clients against a few 
servers and the opposite.  Useful for studying client operation.

Includes a few changes to pb makings to try and save on a few creations.

Also has an edit of javadoc on how to create an HConnection and HTable trying 
to be more forceful about pointing you in right direction ([~lhofhansl] -- mind 
reviewing these javadoc changes?)

I have a +1 already on this patch up in parent issue.  Will run by hadoopqa to 
make sure all good before commit.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9903) Remove the jamon generated classes from the findbugs analysis

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815406#comment-13815406
 ] 

Hadoop QA commented on HBASE-9903:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612442/9903.v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7761//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7761//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7761//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7761//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7761//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7761//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7761//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7761//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7761//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7761//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7761//console

This message is automatically generated.

> Remove the jamon generated classes from the findbugs analysis
> -
>
> Key: HBASE-9903
> URL: https://issues.apache.org/jira/browse/HBASE-9903
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.98.0
>
> Attachments: 9903.v1.patch, 9903.v2.patch, 9903.v2.patch
>
>
> The current filter does not work.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815407#comment-13815407
 ] 

Hadoop QA commented on HBASE-9818:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612439/9818-v5.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:488)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7760//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7760//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7760//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7760//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7760//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7760//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7760//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7760//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7760//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7760//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7760//console

This message is automatically generated.

> NPE in HFileBlock#AbstractFSReader#readAtOffset
> ---
>
> Key: HBASE-9818
> URL: https://issues.apache.org/jira/browse/HBASE-9818
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Ted Yu
> Attachments: 9818-v2.txt, 9818-v3.txt, 9818-v4.txt, 9818-v5.txt
>
>
> HFileBlock#istream seems to be null.  I was wondering should we hide 
> FSDataInputStreamWrapper#useHBaseChecksum.
> By the way, this happened when online schema change is enabled (encoding)
> {noformat}
> 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
> regionserver.HRegionServer:
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:55

[jira] [Updated] (HBASE-9047) Tool to handle finishing replication when the cluster is offline

2013-11-06 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-9047:


Attachment: HBASE-9047-trunk-v4.patch

new patch remove the 30 sec timeout at the end because the 
resourcemanager.oldsource checking is good enough to indicate no edits in queue.

> Tool to handle finishing replication when the cluster is offline
> 
>
> Key: HBASE-9047
> URL: https://issues.apache.org/jira/browse/HBASE-9047
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.96.0
>Reporter: Jean-Daniel Cryans
>Assignee: Demai Ni
> Fix For: 0.98.0
>
> Attachments: HBASE-9047-0.94.9-v0.PATCH, HBASE-9047-trunk-v0.patch, 
> HBASE-9047-trunk-v1.patch, HBASE-9047-trunk-v2.patch, 
> HBASE-9047-trunk-v3.patch, HBASE-9047-trunk-v4.patch
>
>
> We're having a discussion on the mailing list about replicating the data on a 
> cluster that was shut down in an offline fashion. The motivation could be 
> that you don't want to bring HBase back up but still need that data on the 
> slave.
> So I have this idea of a tool that would be running on the master cluster 
> while it is down, although it could also run at any time. Basically it would 
> be able to read the replication state of each master region server, finish 
> replicating what's missing to all the slave, and then clear that state in 
> zookeeper.
> The code that handles replication does most of that already, see 
> ReplicationSourceManager and ReplicationSource. Basically when 
> ReplicationSourceManager.init() is called, it will check all the queues in ZK 
> and try to grab those that aren't attached to a region server. If the whole 
> cluster is down, it will grab all of them.
> The beautiful thing here is that you could start that tool on all your 
> machines and the load will be spread out, but that might not be a big concern 
> if replication wasn't lagging since it would take a few seconds to finish 
> replicating the missing data for each region server.
> I'm guessing when starting ReplicationSourceManager you'd give it a fake 
> region server ID, and you'd tell it not to start its own source.
> FWIW the main difference in how replication is handled between Apache's HBase 
> and Facebook's is that the latter is always done separately of HBase itself. 
> This jira isn't about doing that.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Enis Soztutar (JIRA)
Enis Soztutar created HBASE-9906:


 Summary: Restore snapshot fails to restore the meta edits 
sporadically  
 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14


After snaphot restore, we see failures to find the table in meta:
{code}
> disable 'tablefour'
> restore_snapshot 'snapshot_tablefour'
> enable 'tablefour'
ERROR: Table tablefour does not exist.'
{code}

This is quite subtle. From the looks of it, we successfully restore the 
snapshot, do the meta updates, return to the client about the status. The 
client then tries to do an operation for the table (like enable table, or scan 
in the test outputs) which fails because the meta entry for the region seems to 
be gone (in case of single region, the table will be reported missing). 
Subsequent attempts for creating the table will also fail because the table 
directories will be there, but not the meta entries.
For restoring meta entries, we are doing a delete then a put to the same region:
{code}
2013-11-04 10:39:51,582 INFO 
org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
76d0e2b7ec3291afcaa82e18a56ccc30
2013-11-04 10:39:51,582 INFO 
org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
fa41edf43fe3ee131db4a34b848ff432
...
2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
Deleted [{ENCODED => fa41edf43fe3ee131db4a34b848ff432, NAME => 
'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY => 
'', ENDKEY => ''}, {ENCODED => 76d0e2b7ec3291afcaa82e18a56ccc30, NAME => 
'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1
{code}
The root cause for this sporadic failure is that, the delete and subsequent put 
will have the same timestamp if they execute in the same ms. The delete will 
override the put in the same ts, even though the put have a larger ts.

See: HBASE-9905, HBASE-8770
Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8770) deletes and puts with the same ts should be resolved according to mvcc/seqNum

2013-11-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815380#comment-13815380
 ] 

Sergey Shelukhin commented on HBASE-8770:
-

There was another issue today where user does use equal TS (put ts 100m, put ts 
200, del-version ts 200, then later put ts 200). 
This would solve both problems... I think HBASE-9905 we can also do

> deletes and puts with the same ts should be resolved according to mvcc/seqNum
> -
>
> Key: HBASE-8770
> URL: https://issues.apache.org/jira/browse/HBASE-8770
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Sergey Shelukhin
>
> This came up during HBASE-8721. Puts with the same ts are resolved by seqNum. 
> It's not clear why deletes with the same ts as a put should always mask the 
> put, rather than also being resolve by seqNum.
> What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8770) deletes and puts with the same ts should be resolved according to mvcc/seqNum

2013-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815369#comment-13815369
 ] 

Enis Soztutar commented on HBASE-8770:
--

Linking HBASE-9905. We might as well do that instead of this. 

> deletes and puts with the same ts should be resolved according to mvcc/seqNum
> -
>
> Key: HBASE-8770
> URL: https://issues.apache.org/jira/browse/HBASE-8770
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Sergey Shelukhin
>
> This came up during HBASE-8721. Puts with the same ts are resolved by seqNum. 
> It's not clear why deletes with the same ts as a put should always mask the 
> put, rather than also being resolve by seqNum.
> What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9905) Enable using seqId as timestamp

2013-11-06 Thread Enis Soztutar (JIRA)
Enis Soztutar created HBASE-9905:


 Summary: Enable using seqId as timestamp 
 Key: HBASE-9905
 URL: https://issues.apache.org/jira/browse/HBASE-9905
 Project: HBase
  Issue Type: New Feature
Reporter: Enis Soztutar
 Fix For: 0.98.0


This has been discussed previously, and Lars H. was mentioning an idea from the 
client to declare whether timestamps are used or not explicitly. 

The problem is that, for data models not using timestamps, we are still relying 
on clocks to order the updates. Clock skew, same milisecond puts after deletes, 
etc can cause unexpected behavior and data not being visible.  

We should have a table descriptor / family property, which would declare that 
the data model does not use timestamps. Then we can populate this dimension 
with the seqId, so that global ordering of edits are not effected by wall 
clock. 

For example, META will use this. 

Once we have something like this, we can think of making it default for new 
tables, so that the unknowing user will not shoot herself in the foot. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9775) Client write path perf issues

2013-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815341#comment-13815341
 ] 

stack commented on HBASE-9775:
--

bq, I've tried it myself (exactly the same approach), but I didn't see a real 
difference. Do you see one in your tests?

Minor (Certain allocation hotspots went from 3% to 2.4% in my extreme 
allocation test which probably means close to zero diff).  I left it in since 
on the face of it there are less allocations.

I'll commit this since the rig can be useful.  Want to do some comment/javadoc 
first though. 

> Client write path perf issues
> -
>
> Key: HBASE-9775
> URL: https://issues.apache.org/jira/browse/HBASE-9775
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.96.0
>Reporter: Elliott Clark
>Priority: Critical
> Attachments: 9775.rig.txt, 9775.rig.v2.patch, 9775.rig.v3.patch, 
> Charts Search   Cloudera Manager - ITBLL.png, Charts Search   Cloudera 
> Manager.png, hbase-9775.patch, job_run.log, short_ycsb.png, ycsb.png, 
> ycsb_insert_94_vs_96.png
>
>
> Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HBASE-4876) TestDistributedLogSplitting#testWorkerAbort occasionally fails

2013-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-4876.
---

Resolution: Cannot Reproduce

> TestDistributedLogSplitting#testWorkerAbort occasionally fails
> --
>
> Key: HBASE-4876
> URL: https://issues.apache.org/jira/browse/HBASE-4876
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>
> From 
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2486/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testWorkerAbort/:
> {code}
> 2011-11-26 18:10:25,075 DEBUG 
> [SplitLogWorker-janus.apache.org,42484,1322330994864] wal.HLogSplitter(460): 
> Closed 
> hdfs://localhost:47236/user/jenkins/splitlog/janus.apache.org,42484,1322330994864_hdfs%3A%2F%2Flocalhost%3A47236%2Fuser%2Fjenkins%2F.logs%2Fjanus.apache.org%2C42484%2C1322330994864%2Fjanus.apache.org%252C42484%252C1322330994864.1322330997838/table/be67e8c1df1e77e93181ff7300e77639/recovered.edits/152
> 2011-11-26 18:10:25,075 DEBUG 
> [SplitLogWorker-janus.apache.org,42484,1322330994864] wal.HLogSplitter(460): 
> Closed 
> hdfs://localhost:47236/user/jenkins/splitlog/janus.apache.org,42484,1322330994864_hdfs%3A%2F%2Flocalhost%3A47236%2Fuser%2Fjenkins%2F.logs%2Fjanus.apache.org%2C42484%2C1322330994864%2Fjanus.apache.org%252C42484%252C1322330994864.1322330997838/table/bf112e57fbaa65c12accfafaaa4dc2b0/recovered.edits/167
> 2011-11-26 18:10:25,075 DEBUG 
> [SplitLogWorker-janus.apache.org,42484,1322330994864] wal.HLogSplitter(460): 
> Closed 
> hdfs://localhost:47236/user/jenkins/splitlog/janus.apache.org,42484,1322330994864_hdfs%3A%2F%2Flocalhost%3A47236%2Fuser%2Fjenkins%2F.logs%2Fjanus.apache.org%2C42484%2C1322330994864%2Fjanus.apache.org%252C42484%252C1322330994864.1322330997838/table/bfb6983046589215ed8e6cb0e60dd803/recovered.edits/146
> 2011-11-26 18:10:25,488 INFO  
> [SplitLogWorker-janus.apache.org,42484,1322330994864] 
> regionserver.SplitLogWorker(308): worker janus.apache.org,42484,1322330994864 
> done with task 
> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%3A47236%2Fuser%2Fjenkins%2F.logs%2Fjanus.apache.org%2C42484%2C1322330994864%2Fjanus.apache.org%252C42484%252C1322330994864.1322330997838
>  in 13379ms
> 2011-11-26 18:10:25,488 ERROR 
> [SplitLogWorker-janus.apache.org,42484,1322330994864] 
> regionserver.SplitLogWorker(169): unexpected error 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeThreads(DFSClient.java:3648)
>   at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3691)
>   at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3626)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
>   at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:966)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:214)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:459)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:352)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:266)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
>   at java.lang.Thread.run(Thread.java:662)
> 2011-11-26 18:10:25,488 INFO  
> [SplitLogWorker-janus.apache.org,42484,1322330994864] 
> regionserver.SplitLogWorker(171): SplitLogWorker 
> janus.apache.org,42484,1322330994864 exiting
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9888) HBase replicates edits written before the replication peer is created

2013-11-06 Thread Dave Latham (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815337#comment-13815337
 ] 

Dave Latham commented on HBASE-9888:


{quote}
> Would it work to just do it in each RS when the ReplicationSource on that RS 
> is created (in the mode for add_peer)?
That's what I was proposing, sorry if not clear.
{quote}
+1 for the proposal.

> HBase replicates edits written before the replication peer is created
> -
>
> Key: HBASE-9888
> URL: https://issues.apache.org/jira/browse/HBASE-9888
> Project: HBase
>  Issue Type: Bug
>Reporter: Dave Latham
>
> When creating a new replication peer the ReplicationSourceManager enqueues 
> the currently open HLog to the ReplicationSource to ship to the destination 
> cluster.  The ReplicationSource starts at the beginning of the HLog and ships 
> over any pre-existing writes.
> A workaround is to roll all the HLogs before enabling replication.
> A little background for how it affected us - we were migrating one cluster in 
> a master-master pair.  I.e. transitioning from A <\-> B to B <-> C.  After 
> shutting down writes from A -> B we enabled writes from C -> B.  However, 
> this replicated some earlier writes that were in C's HLogs that had 
> originated in A.  Since we were running a version of HBase before HBASE-7709 
> those writes then got caught in a infinite replication cycle and bringing 
> down region servers OOM because of HBASE-9865.
> However, in general, if one wants to manage what data gets replicated, one 
> wouldn't expect that potentially very old writes would be included when 
> setting up a new replication link.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HBASE-6731) Port HBASE-6537 'Race between balancer and disable table can lead to inconsistent cluster' to 0.92

2013-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-6731.
---

   Resolution: Later
Fix Version/s: (was: 0.92.3)

0.92 is not active.

> Port HBASE-6537 'Race between balancer and disable table can lead to 
> inconsistent cluster' to 0.92
> --
>
> Key: HBASE-6731
> URL: https://issues.apache.org/jira/browse/HBASE-6731
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: rajeshbabu
> Attachments: HBASE-6731.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HBASE-4839) Re-enable TestInstantSchemaChangeFailover#testInstantSchemaOperationsInZKForMasterFailover

2013-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-4839.
---

Resolution: Won't Fix

The test no longer exists

> Re-enable 
> TestInstantSchemaChangeFailover#testInstantSchemaOperationsInZKForMasterFailover
> --
>
> Key: HBASE-4839
> URL: https://issues.apache.org/jira/browse/HBASE-4839
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Subbu M Iyer
>
> TestInstantSchemaChangeFailover#testInstantSchemaOperationsInZKForMasterFailover
>  was disabled for instant schema change (HBASE-4213) after it failed on 
> Jenkins.
> We should enable it and make it pass on Jenkins and dev enviroments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9888) HBase replicates edits written before the replication peer is created

2013-11-06 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815322#comment-13815322
 ] 

Jean-Daniel Cryans commented on HBASE-9888:
---

bq. So,are you suggesting implementing a custom ReplicationSource to seek into 
the current WAL to find the edit with writeTime > sourceCreationTimestamp?

It wouldn't be custom, it'd be the default behavior.

> HBase replicates edits written before the replication peer is created
> -
>
> Key: HBASE-9888
> URL: https://issues.apache.org/jira/browse/HBASE-9888
> Project: HBase
>  Issue Type: Bug
>Reporter: Dave Latham
>
> When creating a new replication peer the ReplicationSourceManager enqueues 
> the currently open HLog to the ReplicationSource to ship to the destination 
> cluster.  The ReplicationSource starts at the beginning of the HLog and ships 
> over any pre-existing writes.
> A workaround is to roll all the HLogs before enabling replication.
> A little background for how it affected us - we were migrating one cluster in 
> a master-master pair.  I.e. transitioning from A <\-> B to B <-> C.  After 
> shutting down writes from A -> B we enabled writes from C -> B.  However, 
> this replicated some earlier writes that were in C's HLogs that had 
> originated in A.  Since we were running a version of HBase before HBASE-7709 
> those writes then got caught in a infinite replication cycle and bringing 
> down region servers OOM because of HBASE-9865.
> However, in general, if one wants to manage what data gets replicated, one 
> wouldn't expect that potentially very old writes would be included when 
> setting up a new replication link.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9888) HBase replicates edits written before the replication peer is created

2013-11-06 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815314#comment-13815314
 ] 

Jean-Daniel Cryans commented on HBASE-9888:
---

bq. That sounds great. Is that 0.94 only or do the newer versions also have it?

It's in trunk too.

bq. Do you have an idea where the minimum timestamp would be generated?

Once we get the zk event? Not sure.

bq. Would it work to just do it in each RS when the ReplicationSource on that 
RS is created (in the mode for add_peer)?

That's what I was proposing, sorry if not clear.

bq. Alternatively, should each RS roll its HLog when creating a new peer?

That could work but I'd rather not roll logs for this.

> HBase replicates edits written before the replication peer is created
> -
>
> Key: HBASE-9888
> URL: https://issues.apache.org/jira/browse/HBASE-9888
> Project: HBase
>  Issue Type: Bug
>Reporter: Dave Latham
>
> When creating a new replication peer the ReplicationSourceManager enqueues 
> the currently open HLog to the ReplicationSource to ship to the destination 
> cluster.  The ReplicationSource starts at the beginning of the HLog and ships 
> over any pre-existing writes.
> A workaround is to roll all the HLogs before enabling replication.
> A little background for how it affected us - we were migrating one cluster in 
> a master-master pair.  I.e. transitioning from A <\-> B to B <-> C.  After 
> shutting down writes from A -> B we enabled writes from C -> B.  However, 
> this replicated some earlier writes that were in C's HLogs that had 
> originated in A.  Since we were running a version of HBase before HBASE-7709 
> those writes then got caught in a infinite replication cycle and bringing 
> down region servers OOM because of HBASE-9865.
> However, in general, if one wants to manage what data gets replicated, one 
> wouldn't expect that potentially very old writes would be included when 
> setting up a new replication link.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9888) HBase replicates edits written before the replication peer is created

2013-11-06 Thread santosh banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815313#comment-13815313
 ] 

santosh banerjee commented on HBASE-9888:
-

{quote} In 0.94, HLogKey has a writeTime and we could seek in the current WAL 
until we find an edit that's been written after the source was created.{quote}

This sounds interesting. So,are you suggesting implementing a custom 
ReplicationSource to seek into the current WAL to find the edit with writeTime 
> sourceCreationTimestamp?


> HBase replicates edits written before the replication peer is created
> -
>
> Key: HBASE-9888
> URL: https://issues.apache.org/jira/browse/HBASE-9888
> Project: HBase
>  Issue Type: Bug
>Reporter: Dave Latham
>
> When creating a new replication peer the ReplicationSourceManager enqueues 
> the currently open HLog to the ReplicationSource to ship to the destination 
> cluster.  The ReplicationSource starts at the beginning of the HLog and ships 
> over any pre-existing writes.
> A workaround is to roll all the HLogs before enabling replication.
> A little background for how it affected us - we were migrating one cluster in 
> a master-master pair.  I.e. transitioning from A <\-> B to B <-> C.  After 
> shutting down writes from A -> B we enabled writes from C -> B.  However, 
> this replicated some earlier writes that were in C's HLogs that had 
> originated in A.  Since we were running a version of HBase before HBASE-7709 
> those writes then got caught in a infinite replication cycle and bringing 
> down region servers OOM because of HBASE-9865.
> However, in general, if one wants to manage what data gets replicated, one 
> wouldn't expect that potentially very old writes would be included when 
> setting up a new replication link.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9903) Remove the jamon generated classes from the findbugs analysis

2013-11-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-9903:
---

Status: Patch Available  (was: Open)

> Remove the jamon generated classes from the findbugs analysis
> -
>
> Key: HBASE-9903
> URL: https://issues.apache.org/jira/browse/HBASE-9903
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.96.0, 0.98.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.98.0
>
> Attachments: 9903.v1.patch, 9903.v2.patch, 9903.v2.patch
>
>
> The current filter does not work.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-7025) Metric for how many WAL files a regionserver is carrying

2013-11-06 Thread Asaf Mesika (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815299#comment-13815299
 ] 

Asaf Mesika commented on HBASE-7025:


I'm running into a "too many hlog" warnings, which in turn causes (eventually) 
region server to crash. I'm in the middle on analyzing it through the debug log 
files and Graphite and could really use this metric to understand, overtime, 
when started having a lot of wal files in the queue.


> Metric for how many WAL files a regionserver is carrying
> 
>
> Key: HBASE-7025
> URL: https://issues.apache.org/jira/browse/HBASE-7025
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: stack
>
> A metric that shows how many WAL files a regionserver is carrying at any one 
> time would be useful for fingering those servers that are always over the 
> upper bounds and in need of attention



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9903) Remove the jamon generated classes from the findbugs analysis

2013-11-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-9903:
---

Status: Open  (was: Patch Available)

> Remove the jamon generated classes from the findbugs analysis
> -
>
> Key: HBASE-9903
> URL: https://issues.apache.org/jira/browse/HBASE-9903
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.96.0, 0.98.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.98.0
>
> Attachments: 9903.v1.patch, 9903.v2.patch, 9903.v2.patch
>
>
> The current filter does not work.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9903) Remove the jamon generated classes from the findbugs analysis

2013-11-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-9903:
---

Attachment: 9903.v2.patch

> Remove the jamon generated classes from the findbugs analysis
> -
>
> Key: HBASE-9903
> URL: https://issues.apache.org/jira/browse/HBASE-9903
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.98.0
>
> Attachments: 9903.v1.patch, 9903.v2.patch, 9903.v2.patch
>
>
> The current filter does not work.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


  1   2   3   >