[jira] [Commented] (HBASE-10788) Add 99th percentile of latency in PE

2014-03-19 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940227#comment-13940227
 ] 

Liu Shaohui commented on HBASE-10788:
-

[~ndimiduk]
Yes, I would like to add percentiles in the base class: Test and add all 
percentiles for all tests.
Sorry for not noticing the percentiles code in  the randomRead test. I will 
redo the patch based on HBASE-10007.

 Add 99th percentile of latency in PE
 

 Key: HBASE-10788
 URL: https://issues.apache.org/jira/browse/HBASE-10788
 Project: HBase
  Issue Type: Improvement
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Attachments: HBASE-10788-trunk-v1.diff


 In production env, 99th percentile of latency is more important than the avg. 
 The 99th percentile is helpful to measure the influence of GC, slow 
 read/write of HDFS.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-9740) A corrupt HFile could cause endless attempts to assign the region without a chance of success

2014-03-19 Thread Ping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ping updated HBASE-9740:


Attachment: HBase-9740_0.94_v4.patch

hi,  Hofhansl, thanks for your suggestions,  I modified hashmap to 
ConcurrentHashMap and also replace Integer with AtomicInteger for its values.
please review that

 A corrupt HFile could cause endless attempts to assign the region without a 
 chance of success
 -

 Key: HBASE-9740
 URL: https://issues.apache.org/jira/browse/HBASE-9740
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.16
Reporter: Aditya Kishore
Assignee: Ping
 Fix For: 0.94.19

 Attachments: HBase-9740_0.94_v4.patch, HBase-9749_0.94_v2.patch, 
 HBase-9749_0.94_v3.patch, patch-9740_0.94.txt


 As described in HBASE-9737, a corrupt HFile in a region could lead to an 
 assignment storm in the cluster since the Master will keep trying to assign 
 the region to each region server one after another and obviously none will 
 succeed.
 The region server, upon detecting such a scenario should mark the region as 
 RS_ZK_REGION_FAILED_ERROR (or something to the effect) in the Zookeeper 
 which should indicate the Master to stop assigning the region until the error 
 has been resolved (via an HBase shell command, probably assign?)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10787) TestHCM#testConnection* take too long

2014-03-19 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940247#comment-13940247
 ] 

Anoop Sam John commented on HBASE-10787:


Looks good.

 TestHCM#testConnection* take too long
 -

 Key: HBASE-10787
 URL: https://issues.apache.org/jira/browse/HBASE-10787
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10787-v1.txt


 TestHCM#testConnectionClose takes more than 5 minutes on Apache Jenkins.
 The test can be shortened when retry count is lowered.
 On my Mac, for TestHCM#testConnection* (two tests)
 without patch:
 {code}
 Running org.apache.hadoop.hbase.client.TestHCM
 2014-03-18 15:46:57.695 java[71368:1203] Unable to load realm info from 
 SCDynamicStore
 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 242.2 sec
 {code}
 with patch:
 {code}
 Running org.apache.hadoop.hbase.client.TestHCM
 2014-03-18 15:40:44.013 java[71184:1203] Unable to load realm info from 
 SCDynamicStore
 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 100.465 sec
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1

2014-03-19 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-10781:
--

Attachment: 10690v2.txt

Send folks to 0.96 and 0.98 doc if building those releases (A note in the doc 
under build does that).  Make this doc change about building 99 and 1.0.  
Changed the make-rc.sh script to do 1.0.  Let me test a bit more then will 
commit if it works.

 Remove hadoop-one-compat module and all references to hadoop1
 -

 Key: HBASE-10781
 URL: https://issues.apache.org/jira/browse/HBASE-10781
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0

 Attachments: 10690.txt, 10690v2.txt


 Clean out hadoop1 references.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10531) Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo

2014-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940264#comment-13940264
 ] 

Hadoop QA commented on HBASE-10531:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12635491/HBASE-10531_7.patch
  against trunk revision .
  ATTACHMENT ID: 12635491

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 32 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9041//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9041//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9041//console

This message is automatically generated.

 Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo
 

 Key: HBASE-10531
 URL: https://issues.apache.org/jira/browse/HBASE-10531
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.99.0

 Attachments: HBASE-10531.patch, HBASE-10531_1.patch, 
 HBASE-10531_2.patch, HBASE-10531_3.patch, HBASE-10531_4.patch, 
 HBASE-10531_5.patch, HBASE-10531_6.patch, HBASE-10531_7.patch


 Currently the byte[] key passed to HFileScanner.seekTo and 
 HFileScanner.reseekTo, is a combination of row, cf, qual, type and ts.  And 
 the caller forms this by using kv.getBuffer, which is actually deprecated.  
 So see how this can be achieved considering kv.getBuffer is removed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-9740) A corrupt HFile could cause endless attempts to assign the region without a chance of success

2014-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940273#comment-13940273
 ] 

Hadoop QA commented on HBASE-9740:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12635500/HBase-9740_0.94_v4.patch
  against trunk revision .
  ATTACHMENT ID: 12635500

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9042//console

This message is automatically generated.

 A corrupt HFile could cause endless attempts to assign the region without a 
 chance of success
 -

 Key: HBASE-9740
 URL: https://issues.apache.org/jira/browse/HBASE-9740
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.16
Reporter: Aditya Kishore
Assignee: Ping
 Fix For: 0.94.19

 Attachments: HBase-9740_0.94_v4.patch, HBase-9749_0.94_v2.patch, 
 HBase-9749_0.94_v3.patch, patch-9740_0.94.txt


 As described in HBASE-9737, a corrupt HFile in a region could lead to an 
 assignment storm in the cluster since the Master will keep trying to assign 
 the region to each region server one after another and obviously none will 
 succeed.
 The region server, upon detecting such a scenario should mark the region as 
 RS_ZK_REGION_FAILED_ERROR (or something to the effect) in the Zookeeper 
 which should indicate the Master to stop assigning the region until the error 
 has been resolved (via an HBase shell command, probably assign?)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10648) Pluggable Memstore

2014-03-19 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HBASE-10648:
--

Attachment: HBASE-10648-0.94_v3.patch

Thanks [~lhofhansl] for review. Have replaced Cell related comments with 
KeyValue in the new patch.

 Pluggable Memstore
 --

 Key: HBASE-10648
 URL: https://issues.apache.org/jira/browse/HBASE-10648
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.99.0

 Attachments: HBASE-10648-0.94_v1.patch, HBASE-10648-0.94_v2.patch, 
 HBASE-10648-0.94_v3.patch, HBASE-10648.patch, HBASE-10648_V2.patch, 
 HBASE-10648_V3.patch, HBASE-10648_V4.patch, HBASE-10648_V5.patch, 
 HBASE-10648_V6.patch


 Make Memstore into an interface implementation.  Also make it pluggable by 
 configuring the FQCN of the impl.
 This will allow us to have different impl and optimizations in the Memstore 
 DataStructure and the upper layers untouched.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10648) Pluggable Memstore

2014-03-19 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940317#comment-13940317
 ] 

haosdent commented on HBASE-10648:
--

well done! [~carp84]

 Pluggable Memstore
 --

 Key: HBASE-10648
 URL: https://issues.apache.org/jira/browse/HBASE-10648
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.99.0

 Attachments: HBASE-10648-0.94_v1.patch, HBASE-10648-0.94_v2.patch, 
 HBASE-10648-0.94_v3.patch, HBASE-10648.patch, HBASE-10648_V2.patch, 
 HBASE-10648_V3.patch, HBASE-10648_V4.patch, HBASE-10648_V5.patch, 
 HBASE-10648_V6.patch


 Make Memstore into an interface implementation.  Also make it pluggable by 
 configuring the FQCN of the impl.
 This will allow us to have different impl and optimizations in the Memstore 
 DataStructure and the upper layers untouched.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1

2014-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940359#comment-13940359
 ] 

Hadoop QA commented on HBASE-10781:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12635507/10690v2.txt
  against trunk revision .
  ATTACHMENT ID: 12635507

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 24 new 
or modified tests.

{color:red}-1 hadoop1.0{color}.  The patch failed to compile against the 
hadoop 1.0 profile.
Here is snippet of errors:
{code}{code}

{color:red}-1 hadoop1.1{color}.  The patch failed to compile against the 
hadoop 1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+These instructions are for building HBase 1.0.x.  For building earlier 
versions, the process is different.  See this section
+paraNow, build the src tarball.  This tarball is hadoop version 
independent.  It is just the pure src code and documentation without a 
particular hadoop taint, etc.
+Add the varname-Prelease/varname profile when building; it 
checks files for licenses and will fail the build if unlicensed files present.
+notetitlePoint Release Only/titleparaThe following step that creates a 
new tag can be skipped since you've already created the point release 
tag/para/note
+The last command above copies all artifacts up to a temporary staging apache 
mvn repo in an 'open' state.
+ paraThe script filenamedev-support/make_rc.sh/filename 
automates alot of the above listed release steps.
+ staging repository up in apache maven (human intervention is 
needed here), the checking of
+ the produced artifacts to ensure they are 'good' -- e.g.  undoing 
the produced tarballs, eyeballing them to make
+ sure they look right then starting and checking all is running 
properly --  and then the signing and pushing of
+paraNow lets get back to what is up in maven. Our artifacts should 
be up in maven repository in the staging area 

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s): 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9043//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9043//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9043//console

This message is automatically generated.

 Remove hadoop-one-compat module and all references to hadoop1
 -

 Key: HBASE-10781
 URL: https://issues.apache.org/jira/browse/HBASE-10781
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0

 Attachments: 10690.txt, 10690v2.txt


 Clean 

[jira] [Created] (HBASE-10789) Add NumberComparator

2014-03-19 Thread haosdent (JIRA)
haosdent created HBASE-10789:


 Summary: Add NumberComparator
 Key: HBASE-10789
 URL: https://issues.apache.org/jira/browse/HBASE-10789
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: haosdent
Assignee: haosdent


Sometimes user may want to filter out which value less than a positive number. 
But they finally would get a result contains negative number.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-8963) Add configuration option to skip HFile archiving

2014-03-19 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940431#comment-13940431
 ] 

Jean-Marc Spaggiari commented on HBASE-8963:


I agree with Lars. Makes more sense to me too to have a parameter to drop table 
to allow skipping archiving instead of setting that at any required level.

 Add configuration option to skip HFile archiving
 

 Key: HBASE-8963
 URL: https://issues.apache.org/jira/browse/HBASE-8963
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: bharath v
 Fix For: 0.99.0

 Attachments: HBASE-8963.trunk.v1.patch, HBASE-8963.trunk.v2.patch, 
 HBASE-8963.trunk.v3.patch, HBASE-8963.trunk.v4.patch, 
 HBASE-8963.trunk.v5.patch, HBASE-8963.trunk.v6.patch, 
 HBASE-8963.trunk.v7.patch


 Currently HFileArchiver is always called when a table is dropped.
 A configuration option (either global or per table) should be provided so 
 that archiving can be skipped when table is deleted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-8963) Add configuration option to skip HFile archiving

2014-03-19 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940449#comment-13940449
 ] 

Matteo Bertozzi commented on HBASE-8963:


I don't think is a drop table argument, since the compaction will archive stuff 
anyway.
I like the global level property to have a general settings, but we should also 
have a table level option
something like create 'testtb' {SKIP_ARCHIVE = true} that will work on both 
compaction + delete

 Add configuration option to skip HFile archiving
 

 Key: HBASE-8963
 URL: https://issues.apache.org/jira/browse/HBASE-8963
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: bharath v
 Fix For: 0.99.0

 Attachments: HBASE-8963.trunk.v1.patch, HBASE-8963.trunk.v2.patch, 
 HBASE-8963.trunk.v3.patch, HBASE-8963.trunk.v4.patch, 
 HBASE-8963.trunk.v5.patch, HBASE-8963.trunk.v6.patch, 
 HBASE-8963.trunk.v7.patch


 Currently HFileArchiver is always called when a table is dropped.
 A configuration option (either global or per table) should be provided so 
 that archiving can be skipped when table is deleted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10790) make assembly:single as default in pom.xml

2014-03-19 Thread Liu Shaohui (JIRA)
Liu Shaohui created HBASE-10790:
---

 Summary: make assembly:single as default in pom.xml
 Key: HBASE-10790
 URL: https://issues.apache.org/jira/browse/HBASE-10790
 Project: HBase
  Issue Type: Improvement
Reporter: Liu Shaohui
Priority: Minor


Now to compile a HBase tar release package, we should use
the cmd: 
{code}
 mvn clean package assembly:single
{code}, which is not convenient. We can make assembly:single as a default 
option and run the assembly plugin in maven package phase. Then we can just use 
the cmd {code} mvn clean package {code} to get a release package.

Other suggestions are welcomed.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10790) make assembly:single as default in pom.xml

2014-03-19 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated HBASE-10790:


Attachment: HBASE-10790-trunk-v1.diff

 make assembly:single as default in pom.xml
 --

 Key: HBASE-10790
 URL: https://issues.apache.org/jira/browse/HBASE-10790
 Project: HBase
  Issue Type: Improvement
Reporter: Liu Shaohui
Priority: Minor
 Attachments: HBASE-10790-trunk-v1.diff


 Now to compile a HBase tar release package, we should use
 the cmd: 
 {code}
  mvn clean package assembly:single
 {code}, which is not convenient. We can make assembly:single as a default 
 option and run the assembly plugin in maven package phase. Then we can just 
 use the cmd {code} mvn clean package {code} to get a release package.
 Other suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10791) Add integration test to demonstrate performance improvement

2014-03-19 Thread Nick Dimiduk (JIRA)
Nick Dimiduk created HBASE-10791:


 Summary: Add integration test to demonstrate performance 
improvement
 Key: HBASE-10791
 URL: https://issues.apache.org/jira/browse/HBASE-10791
 Project: HBase
  Issue Type: Sub-task
  Components: Performance, test
Affects Versions: hbase-10070
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk


It would be good to demonstrate that use of region replicas reduces read 
latency. PerformanceEvaluation can be used manually for this purpose, but it's 
not able to use ChaosMonkey. An integration test can set up the monkey actions 
and automate execution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10791) Add integration test to demonstrate performance improvement

2014-03-19 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-10791:
-

Attachment: HBASE-10791.00.patch

Here's a sketch of a patch, it requires HBASE-10548, HBASE-10419, HBASE-10592 
be brought over to the branch. Assuming this direction looks good, I'll bring 
those tickets onto the feature branch.

Testing this has revealed some issues. Spoke with [~stack] yesterday about one 
that exists on trunk, will catch up with [~enis] and [~devaraj] about the other 
today. New tickets to follow.

 Add integration test to demonstrate performance improvement
 ---

 Key: HBASE-10791
 URL: https://issues.apache.org/jira/browse/HBASE-10791
 Project: HBase
  Issue Type: Sub-task
  Components: Performance, test
Affects Versions: hbase-10070
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HBASE-10791.00.patch


 It would be good to demonstrate that use of region replicas reduces read 
 latency. PerformanceEvaluation can be used manually for this purpose, but 
 it's not able to use ChaosMonkey. An integration test can set up the monkey 
 actions and automate execution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10789) Add NumberComparator

2014-03-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940742#comment-13940742
 ] 

Lars Hofhansl commented on HBASE-10789:
---

I think this is a bit more tricky. You may want to sort things correctly too, 
and in that case you'd need to change the encoding.
We can add a one-off comparator now.

[~ndimiduk], FYI.


 Add NumberComparator
 

 Key: HBASE-10789
 URL: https://issues.apache.org/jira/browse/HBASE-10789
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: haosdent
Assignee: haosdent

 Sometimes user may want to filter out which value less than a positive 
 number. But they finally would get a result contains negative number.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10788) Add 99th percentile of latency in PE

2014-03-19 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940794#comment-13940794
 ] 

Nick Dimiduk commented on HBASE-10788:
--

Could be your use of a real metrics library is the right way to go. My version 
allocates arrays of doubles, which can become expensive. I'd also like to add a 
mixed-workload test, in which case it'll be good to isolate read from write 
metrics, etc. Maybe your use of the yammer metrics library will support this, 
and also help minimize memory footprint while maintaining statical significance 
of the results. If you're adding a new dependency, be sure to include the jar 
in the mapreduce job.

Good on you [~liushaohui].

 Add 99th percentile of latency in PE
 

 Key: HBASE-10788
 URL: https://issues.apache.org/jira/browse/HBASE-10788
 Project: HBase
  Issue Type: Improvement
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Attachments: HBASE-10788-trunk-v1.diff


 In production env, 99th percentile of latency is more important than the avg. 
 The 99th percentile is helpful to measure the influence of GC, slow 
 read/write of HDFS.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10789) Add NumberComparator

2014-03-19 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940797#comment-13940797
 ] 

Nick Dimiduk commented on HBASE-10789:
--

DataType-aware filters are also on my todo list, though I wanted to get a 
little further along down that road before speculating about them. The 
advantage of using an order-preserving encoding (like {{OrderedBytes}}) is that 
the data is ordered this way by HBase and these filters can efficiently skip 
over swaths of data (depending on the use-case). There's definitely more to 
explore here.

For the time being, it makes sense to have filters that support the different 
encoding formats produced by {{Bytes}}, but I think this is the wrong level of 
abstraction for the long run.

 Add NumberComparator
 

 Key: HBASE-10789
 URL: https://issues.apache.org/jira/browse/HBASE-10789
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: haosdent
Assignee: haosdent

 Sometimes user may want to filter out which value less than a positive 
 number. But they finally would get a result contains negative number.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure

2014-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940808#comment-13940808
 ] 

Hadoop QA commented on HBASE-10786:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12635570/10786-v2.txt
  against trunk revision .
  ATTACHMENT ID: 12635570

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9044//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9044//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9044//console

This message is automatically generated.

 If snapshot verification fails with 'Regions moved', the message should 
 contain the name of region causing the failure
 --

 Key: HBASE-10786
 URL: https://issues.apache.org/jira/browse/HBASE-10786
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10786-v1.txt, 10786-v2.txt


 I was trying to find cause for test failure in 
 https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/
  :
 {code}
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: 
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
 ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an 
 error.  Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] }
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342)
   at 
 org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at 

[jira] [Commented] (HBASE-10787) TestHCM#testConnection* take too long

2014-03-19 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940823#comment-13940823
 ] 

Andrew Purtell commented on HBASE-10787:


Scratch that, just trunk. Earlier branches don't have this test.

 TestHCM#testConnection* take too long
 -

 Key: HBASE-10787
 URL: https://issues.apache.org/jira/browse/HBASE-10787
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10787-v1.txt


 TestHCM#testConnectionClose takes more than 5 minutes on Apache Jenkins.
 The test can be shortened when retry count is lowered.
 On my Mac, for TestHCM#testConnection* (two tests)
 without patch:
 {code}
 Running org.apache.hadoop.hbase.client.TestHCM
 2014-03-18 15:46:57.695 java[71368:1203] Unable to load realm info from 
 SCDynamicStore
 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 242.2 sec
 {code}
 with patch:
 {code}
 Running org.apache.hadoop.hbase.client.TestHCM
 2014-03-18 15:40:44.013 java[71184:1203] Unable to load realm info from 
 SCDynamicStore
 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 100.465 sec
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10787) TestHCM#testConnection* take too long

2014-03-19 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940814#comment-13940814
 ] 

Andrew Purtell commented on HBASE-10787:


Wow, the previous idea of retry a lot was excessive. Going to commit this to 
0.96+ in a few minutes unless objection.

 TestHCM#testConnection* take too long
 -

 Key: HBASE-10787
 URL: https://issues.apache.org/jira/browse/HBASE-10787
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10787-v1.txt


 TestHCM#testConnectionClose takes more than 5 minutes on Apache Jenkins.
 The test can be shortened when retry count is lowered.
 On my Mac, for TestHCM#testConnection* (two tests)
 without patch:
 {code}
 Running org.apache.hadoop.hbase.client.TestHCM
 2014-03-18 15:46:57.695 java[71368:1203] Unable to load realm info from 
 SCDynamicStore
 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 242.2 sec
 {code}
 with patch:
 {code}
 Running org.apache.hadoop.hbase.client.TestHCM
 2014-03-18 15:40:44.013 java[71184:1203] Unable to load realm info from 
 SCDynamicStore
 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 100.465 sec
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1

2014-03-19 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940842#comment-13940842
 ] 

Elliott Clark commented on HBASE-10781:
---

Are we expecting that hadoop will stop requiring us to do this kind of shimming 
(Hadoop 3.0 whenever it becomes a thing) ?  or should we consider keeping the 
hadoop targeted build scripts ?

 Remove hadoop-one-compat module and all references to hadoop1
 -

 Key: HBASE-10781
 URL: https://issues.apache.org/jira/browse/HBASE-10781
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0

 Attachments: 10690.txt, 10690v2.txt


 Clean out hadoop1 references.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10531) Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo

2014-03-19 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940845#comment-13940845
 ] 

Andrew Purtell commented on HBASE-10531:


+1

I reviewed the patch as a transitional change and the refactoring looks good to 
me. Followed long up on reviewboard where Stack gave this a good look. What are 
the follow on JIRAs? Maybe put a comment here leading to them. 

 Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo
 

 Key: HBASE-10531
 URL: https://issues.apache.org/jira/browse/HBASE-10531
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.99.0

 Attachments: HBASE-10531.patch, HBASE-10531_1.patch, 
 HBASE-10531_2.patch, HBASE-10531_3.patch, HBASE-10531_4.patch, 
 HBASE-10531_5.patch, HBASE-10531_6.patch, HBASE-10531_7.patch


 Currently the byte[] key passed to HFileScanner.seekTo and 
 HFileScanner.reseekTo, is a combination of row, cf, qual, type and ts.  And 
 the caller forms this by using kv.getBuffer, which is actually deprecated.  
 So see how this can be achieved considering kv.getBuffer is removed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10793) AuthFailed as a valid zookeeper state

2014-03-19 Thread Demai Ni (JIRA)
Demai Ni created HBASE-10793:


 Summary: AuthFailed as a valid zookeeper state 
 Key: HBASE-10793
 URL: https://issues.apache.org/jira/browse/HBASE-10793
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Affects Versions: 0.96.0
Reporter: Demai Ni
Assignee: Demai Ni
 Fix For: 0.96.2, 0.98.2


In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed message 
indicates the client could not be authenticated, but it should proceed anyway, 
because only access to znodes that require SASL authentication will be denied 
and this client may never need to access them. Furthermore, AuthFailed is a 
valid event supported by Zookeeper, and following are valid Zookeeper events:

case0: return KeeperState.Disconnected;
case3: return KeeperState.SyncConnected;
case4: return KeeperState.AuthFailed;
case5: return KeeperState.ConnectedReadOnly;
case6: return KeeperState.SaslAuthenticated;
case -112: return KeeperState.Expired;

Based on above, ZooKeeperWatcher should not throw exception for AuthFailed 
event as an invalid event. For this kind of event, Zookeeper already logs it as 
a warning and proceed with non-SASL connection.
{code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid}
hbase(main):006:0 list
TABLE   


14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher
java.lang.IllegalStateException: Received event is not valid: AuthFailed
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
BIMonitoring


BIMonitoringSummary 


BIMonitoringSummary180  


BIMonitoringSummary900  


LogMetadata 


LogRecords  


Mtable  


t1  


t2  


9 row(s) in 0.4040 seconds

= [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, 
BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2]

{code}

the patch will be similar as HBase-8757



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10792) RingBufferTruck does not release its payload

2014-03-19 Thread Nick Dimiduk (JIRA)
Nick Dimiduk created HBASE-10792:


 Summary: RingBufferTruck does not release its payload
 Key: HBASE-10792
 URL: https://issues.apache.org/jira/browse/HBASE-10792
 Project: HBase
  Issue Type: Bug
  Components: Performance, wal
Affects Versions: 0.99.0
Reporter: Nick Dimiduk


Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox 
and watch as HBase eventually dies with an OOM: heap space. Examining the heap 
dump shows an extremely large retained size of KeyValue and RingBufferTrunk 
instances. By my eye, the default value of 
{{hbase.regionserver.wal.disruptor.event.count}} is too large for such a small 
default heap size, or the RBT instances need to release their payloads after 
consumers retrieve them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HBASE-10531) Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo

2014-03-19 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940845#comment-13940845
 ] 

Andrew Purtell edited comment on HBASE-10531 at 3/19/14 7:00 PM:
-

+1

I reviewed the patch as a transitional change and the refactoring looks good to 
me. Also followed along up on reviewboard where Stack gave this a good look. 

What are the follow up JIRAs? Maybe put a comment here leading to them. 


was (Author: apurtell):
+1

I reviewed the patch as a transitional change and the refactoring looks good to 
me. Followed long up on reviewboard where Stack gave this a good look. What are 
the follow on JIRAs? Maybe put a comment here leading to them. 

 Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo
 

 Key: HBASE-10531
 URL: https://issues.apache.org/jira/browse/HBASE-10531
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.99.0

 Attachments: HBASE-10531.patch, HBASE-10531_1.patch, 
 HBASE-10531_2.patch, HBASE-10531_3.patch, HBASE-10531_4.patch, 
 HBASE-10531_5.patch, HBASE-10531_6.patch, HBASE-10531_7.patch


 Currently the byte[] key passed to HFileScanner.seekTo and 
 HFileScanner.reseekTo, is a combination of row, cf, qual, type and ts.  And 
 the caller forms this by using kv.getBuffer, which is actually deprecated.  
 So see how this can be achieved considering kv.getBuffer is removed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10788) Add 99th percentile of latency in PE

2014-03-19 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940826#comment-13940826
 ] 

Andrew Purtell commented on HBASE-10788:


bq. Usually I find min, avg, 95th, 99th, and 99.9th percentiles, and max useful.

Certainly average, max, and 95th are useful information in addition to higher 
percentiles, +1

 Add 99th percentile of latency in PE
 

 Key: HBASE-10788
 URL: https://issues.apache.org/jira/browse/HBASE-10788
 Project: HBase
  Issue Type: Improvement
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Attachments: HBASE-10788-trunk-v1.diff


 In production env, 99th percentile of latency is more important than the avg. 
 The 99th percentile is helpful to measure the influence of GC, slow 
 read/write of HDFS.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10774) Restore TestMultiTableInputFormat

2014-03-19 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940830#comment-13940830
 ] 

Andrew Purtell commented on HBASE-10774:


286 seconds is borderline, 557 was too long. Based on above comment, looks like 
we can go with patch v2.

We still have to care about test running time on Jenkins because if the suite 
runs while the underlying system is particularly loaded, we will get a spurious 
test timeout and build failure. 

 Restore TestMultiTableInputFormat
 -

 Key: HBASE-10774
 URL: https://issues.apache.org/jira/browse/HBASE-10774
 Project: HBase
  Issue Type: Test
Affects Versions: 0.99.0
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Attachments: HBASE-10774-trunk-v2.diff, HBASE-10774-v1.diff


 TestMultiTableInputFormat is removed in HBASE-9009 for this test made the ci 
 failed. But in HBASE-10692 we need to add a new test 
 TestSecureMultiTableInputFormat which is depends on it. So we try to restore 
 it in this issue.
 I rerun the test for several times and it passed.
 {code}
 Running org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat
 Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 314.163 sec
 {code}
 [~stack]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure

2014-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10786:
---

Attachment: 10786-v2.txt

Patch v2 logs the regions whose snapshot directory cannot be found.

[~mbertozzi]:
What do you think ?

 If snapshot verification fails with 'Regions moved', the message should 
 contain the name of region causing the failure
 --

 Key: HBASE-10786
 URL: https://issues.apache.org/jira/browse/HBASE-10786
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10786-v1.txt, 10786-v2.txt


 I was trying to find cause for test failure in 
 https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/
  :
 {code}
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: 
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
 ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an 
 error.  Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] }
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342)
   at 
 org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
 Failed taking snapshot { ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during 
 the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 
 type=FLUSH }'. expected=9 
 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
 Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8
   at 
 org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
   at 
 org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320)
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332)
   ... 11 more
 {code}
 However, it is not clear which region caused the verification to fail.
 I searched for log from balancer but found none.
 The exception message should include region name which caused the 
 verification to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10792) RingBufferTruck does not release its payload

2014-03-19 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-10792:
-

Status: Patch Available  (was: Open)

 RingBufferTruck does not release its payload
 

 Key: HBASE-10792
 URL: https://issues.apache.org/jira/browse/HBASE-10792
 Project: HBase
  Issue Type: Bug
  Components: Performance, wal
Affects Versions: 0.99.0
Reporter: Nick Dimiduk
 Attachments: HBASE-10792.00.patch


 Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox 
 and watch as HBase eventually dies with an OOM: heap space. Examining the 
 heap dump shows an extremely large retained size of KeyValue and 
 RingBufferTrunk instances. By my eye, the default value of 
 {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a 
 small default heap size, or the RBT instances need to release their payloads 
 after consumers retrieve them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10792) RingBufferTruck does not release its payload

2014-03-19 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-10792:
-

Attachment: HBASE-10792.00.patch

Here's a patch that changes RBT a little. Payload content can now be inspected 
and references are removed at unload time. I don't know how this impacts 
failure cases, I need to read up on the disruptor a bit more.

(cc [~fenghh], [~stack])

 RingBufferTruck does not release its payload
 

 Key: HBASE-10792
 URL: https://issues.apache.org/jira/browse/HBASE-10792
 Project: HBase
  Issue Type: Bug
  Components: Performance, wal
Affects Versions: 0.99.0
Reporter: Nick Dimiduk
 Attachments: HBASE-10792.00.patch


 Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox 
 and watch as HBase eventually dies with an OOM: heap space. Examining the 
 heap dump shows an extremely large retained size of KeyValue and 
 RingBufferTrunk instances. By my eye, the default value of 
 {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a 
 small default heap size, or the RBT instances need to release their payloads 
 after consumers retrieve them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10787) TestHCM#testConnection* take too long

2014-03-19 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10787:
---

   Resolution: Fixed
Fix Version/s: 0.99.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 TestHCM#testConnection* take too long
 -

 Key: HBASE-10787
 URL: https://issues.apache.org/jira/browse/HBASE-10787
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.99.0

 Attachments: 10787-v1.txt


 TestHCM#testConnectionClose takes more than 5 minutes on Apache Jenkins.
 The test can be shortened when retry count is lowered.
 On my Mac, for TestHCM#testConnection* (two tests)
 without patch:
 {code}
 Running org.apache.hadoop.hbase.client.TestHCM
 2014-03-18 15:46:57.695 java[71368:1203] Unable to load realm info from 
 SCDynamicStore
 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 242.2 sec
 {code}
 with patch:
 {code}
 Running org.apache.hadoop.hbase.client.TestHCM
 2014-03-18 15:40:44.013 java[71184:1203] Unable to load realm info from 
 SCDynamicStore
 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 100.465 sec
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10793) AuthFailed as a valid zookeeper state

2014-03-19 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-10793:
-

Attachment: HBASE-10793-trunk-v0.patch

 AuthFailed as a valid zookeeper state 
 --

 Key: HBASE-10793
 URL: https://issues.apache.org/jira/browse/HBASE-10793
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Affects Versions: 0.96.0
Reporter: Demai Ni
Assignee: Demai Ni
 Fix For: 0.96.2, 0.99.0, 0.98.2

 Attachments: HBASE-10793-trunk-v0.patch


 In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed 
 message indicates the client could not be authenticated, but it should 
 proceed anyway, because only access to znodes that require SASL 
 authentication will be denied and this client may never need to access them. 
 Furthermore, AuthFailed is a valid event supported by Zookeeper, and 
 following are valid Zookeeper events:
 case0: return KeeperState.Disconnected;
 case3: return KeeperState.SyncConnected;
 case4: return KeeperState.AuthFailed;
 case5: return KeeperState.ConnectedReadOnly;
 case6: return KeeperState.SaslAuthenticated;
 case -112: return KeeperState.Expired;
 Based on above, ZooKeeperWatcher should not throw exception for AuthFailed 
 event as an invalid event. For this kind of event, Zookeeper already logs it 
 as a warning and proceed with non-SASL connection.
 {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid}
 hbase(main):006:0 list
 TABLE 
   
 
 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher
 java.lang.IllegalStateException: Received event is not valid: AuthFailed
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 BIMonitoring  
   
 
 BIMonitoringSummary   
   
 
 BIMonitoringSummary180
   
 
 BIMonitoringSummary900
   
 
 LogMetadata   
   
 
 LogRecords
   
 
 Mtable
   
 
 t1
   
 
 t2
   
 
 9 row(s) in 0.4040 seconds
 = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, 
 BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2]
 {code}
 the patch will be similar as HBase-8757



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10793) AuthFailed as a valid zookeeper state

2014-03-19 Thread Demai Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Demai Ni updated HBASE-10793:
-

Fix Version/s: 0.99.0
   Status: Patch Available  (was: Open)

 AuthFailed as a valid zookeeper state 
 --

 Key: HBASE-10793
 URL: https://issues.apache.org/jira/browse/HBASE-10793
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Affects Versions: 0.96.0
Reporter: Demai Ni
Assignee: Demai Ni
 Fix For: 0.96.2, 0.99.0, 0.98.2

 Attachments: HBASE-10793-trunk-v0.patch


 In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed 
 message indicates the client could not be authenticated, but it should 
 proceed anyway, because only access to znodes that require SASL 
 authentication will be denied and this client may never need to access them. 
 Furthermore, AuthFailed is a valid event supported by Zookeeper, and 
 following are valid Zookeeper events:
 case0: return KeeperState.Disconnected;
 case3: return KeeperState.SyncConnected;
 case4: return KeeperState.AuthFailed;
 case5: return KeeperState.ConnectedReadOnly;
 case6: return KeeperState.SaslAuthenticated;
 case -112: return KeeperState.Expired;
 Based on above, ZooKeeperWatcher should not throw exception for AuthFailed 
 event as an invalid event. For this kind of event, Zookeeper already logs it 
 as a warning and proceed with non-SASL connection.
 {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid}
 hbase(main):006:0 list
 TABLE 
   
 
 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher
 java.lang.IllegalStateException: Received event is not valid: AuthFailed
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 BIMonitoring  
   
 
 BIMonitoringSummary   
   
 
 BIMonitoringSummary180
   
 
 BIMonitoringSummary900
   
 
 LogMetadata   
   
 
 LogRecords
   
 
 Mtable
   
 
 t1
   
 
 t2
   
 
 9 row(s) in 0.4040 seconds
 = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, 
 BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2]
 {code}
 the patch will be similar as HBase-8757



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure

2014-03-19 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940898#comment-13940898
 ] 

Matteo Bertozzi commented on HBASE-10786:
-

+1, maybe just rename that msg to errorMsg or something to make clear what 
we are checking when throwing the exception

 If snapshot verification fails with 'Regions moved', the message should 
 contain the name of region causing the failure
 --

 Key: HBASE-10786
 URL: https://issues.apache.org/jira/browse/HBASE-10786
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10786-v1.txt, 10786-v2.txt


 I was trying to find cause for test failure in 
 https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/
  :
 {code}
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: 
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
 ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an 
 error.  Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] }
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342)
   at 
 org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
 Failed taking snapshot { ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during 
 the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 
 type=FLUSH }'. expected=9 
 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
 Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8
   at 
 org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
   at 
 org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320)
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332)
   ... 11 more
 {code}
 However, it is not clear which region caused the verification to fail.
 I searched for log from balancer but found none.
 The exception message should include region name which caused the 
 verification to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure

2014-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10786:
---

Attachment: 10786-v3.txt

Patch v3 addresses Matteo's comments.

 If snapshot verification fails with 'Regions moved', the message should 
 contain the name of region causing the failure
 --

 Key: HBASE-10786
 URL: https://issues.apache.org/jira/browse/HBASE-10786
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10786-v1.txt, 10786-v2.txt, 10786-v3.txt


 I was trying to find cause for test failure in 
 https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/
  :
 {code}
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: 
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
 ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an 
 error.  Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] }
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342)
   at 
 org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
 Failed taking snapshot { ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during 
 the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 
 type=FLUSH }'. expected=9 
 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
 Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8
   at 
 org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
   at 
 org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320)
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332)
   ... 11 more
 {code}
 However, it is not clear which region caused the verification to fail.
 I searched for log from balancer but found none.
 The exception message should include region name which caused the 
 verification to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure

2014-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10786:
---

Status: Open  (was: Patch Available)

 If snapshot verification fails with 'Regions moved', the message should 
 contain the name of region causing the failure
 --

 Key: HBASE-10786
 URL: https://issues.apache.org/jira/browse/HBASE-10786
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10786-v1.txt, 10786-v2.txt, 10786-v3.txt


 I was trying to find cause for test failure in 
 https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/
  :
 {code}
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: 
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
 ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an 
 error.  Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] }
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342)
   at 
 org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
 Failed taking snapshot { ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during 
 the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 
 type=FLUSH }'. expected=9 
 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
 Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8
   at 
 org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
   at 
 org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320)
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332)
   ... 11 more
 {code}
 However, it is not clear which region caused the verification to fail.
 I searched for log from balancer but found none.
 The exception message should include region name which caused the 
 verification to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure

2014-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10786:
---

Fix Version/s: 0.98.2
   0.99.0
 Hadoop Flags: Reviewed

 If snapshot verification fails with 'Regions moved', the message should 
 contain the name of region causing the failure
 --

 Key: HBASE-10786
 URL: https://issues.apache.org/jira/browse/HBASE-10786
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.99.0, 0.98.2

 Attachments: 10786-v1.txt, 10786-v2.txt, 10786-v3.txt


 I was trying to find cause for test failure in 
 https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/
  :
 {code}
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: 
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
 ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an 
 error.  Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] }
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342)
   at 
 org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
 Failed taking snapshot { ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during 
 the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 
 type=FLUSH }'. expected=9 
 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
 Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8
   at 
 org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
   at 
 org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320)
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332)
   ... 11 more
 {code}
 However, it is not clear which region caused the verification to fail.
 I searched for log from balancer but found none.
 The exception message should include region name which caused the 
 verification to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10792) RingBufferTruck does not release its payload

2014-03-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940910#comment-13940910
 ] 

stack commented on HBASE-10792:
---

+1

Its great.  Thanks for keeping on w/ the strained metaphor!

 RingBufferTruck does not release its payload
 

 Key: HBASE-10792
 URL: https://issues.apache.org/jira/browse/HBASE-10792
 Project: HBase
  Issue Type: Bug
  Components: Performance, wal
Affects Versions: 0.99.0
Reporter: Nick Dimiduk
 Attachments: HBASE-10792.00.patch


 Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox 
 and watch as HBase eventually dies with an OOM: heap space. Examining the 
 heap dump shows an extremely large retained size of KeyValue and 
 RingBufferTrunk instances. By my eye, the default value of 
 {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a 
 small default heap size, or the RBT instances need to release their payloads 
 after consumers retrieve them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure

2014-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-10786.


Resolution: Fixed

Thanks for the review, Matteo.

 If snapshot verification fails with 'Regions moved', the message should 
 contain the name of region causing the failure
 --

 Key: HBASE-10786
 URL: https://issues.apache.org/jira/browse/HBASE-10786
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.99.0, 0.98.2

 Attachments: 10786-v1.txt, 10786-v2.txt, 10786-v3.txt


 I was trying to find cause for test failure in 
 https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/
  :
 {code}
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: 
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
 ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an 
 error.  Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] }
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342)
   at 
 org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
 Failed taking snapshot { ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during 
 the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 
 type=FLUSH }'. expected=9 
 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
 Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8
   at 
 org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
   at 
 org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320)
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332)
   ... 11 more
 {code}
 However, it is not clear which region caused the verification to fail.
 I searched for log from balancer but found none.
 The exception message should include region name which caused the 
 verification to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-7847) Use zookeeper multi to clear znodes

2014-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7847:
--

Attachment: (was: 7847_v6.patch)

 Use zookeeper multi to clear znodes
 ---

 Key: HBASE-7847
 URL: https://issues.apache.org/jira/browse/HBASE-7847
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu
 Attachments: 7847-v1.txt, HBASE-7847.patch, HBASE-7847.patch, 
 HBASE-7847.patch, HBASE-7847_v4.patch, HBASE-7847_v5.patch, 
 HBASE-7847_v6.patch


 In ZKProcedureUtil, clearChildZNodes() and clearZNodes(String procedureName) 
 should utilize zookeeper multi so that they're atomic



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-7847) Use zookeeper multi to clear znodes

2014-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7847:
--

Attachment: 7847_v6.patch

 Use zookeeper multi to clear znodes
 ---

 Key: HBASE-7847
 URL: https://issues.apache.org/jira/browse/HBASE-7847
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu
 Attachments: 7847-v1.txt, 7847_v6.patch, HBASE-7847.patch, 
 HBASE-7847.patch, HBASE-7847.patch, HBASE-7847_v4.patch, HBASE-7847_v5.patch, 
 HBASE-7847_v6.patch


 In ZKProcedureUtil, clearChildZNodes() and clearZNodes(String procedureName) 
 should utilize zookeeper multi so that they're atomic



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10794) multi-get should handle missing replica location from cache

2014-03-19 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HBASE-10794:


 Summary: multi-get should handle missing replica location from 
cache
 Key: HBASE-10794
 URL: https://issues.apache.org/jira/browse/HBASE-10794
 Project: HBase
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-10070


Currently the way cache works is that the meta row is stored together for all 
replicas of a region, so if some replicas are in recovery, getting locations 
for a region will still go to cache only and return null locations for these. 
Multi-get currently ignores such replicas. It should instead try to get 
location again from meta if any replica is null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10794) multi-get should handle missing replica location from cache

2014-03-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940966#comment-13940966
 ] 

Sergey Shelukhin commented on HBASE-10794:
--

[~enis] [~devaraj] fyi

 multi-get should handle missing replica location from cache
 ---

 Key: HBASE-10794
 URL: https://issues.apache.org/jira/browse/HBASE-10794
 Project: HBase
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-10070


 Currently the way cache works is that the meta row is stored together for all 
 replicas of a region, so if some replicas are in recovery, getting locations 
 for a region will still go to cache only and return null locations for these. 
 Multi-get currently ignores such replicas. It should instead try to get 
 location again from meta if any replica is null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10634) Multiget doesn't fully work

2014-03-19 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-10634:


Attachment: 10634-1.1.txt

Patch that has been tested. Include Sergey's last patch and some fixes on top 
for getting the locations of regions when there is a server crash. Also assumes 
HBASE-10701's last patch.

 Multiget doesn't fully work
 ---

 Key: HBASE-10634
 URL: https://issues.apache.org/jira/browse/HBASE-10634
 Project: HBase
  Issue Type: Sub-task
Reporter: Devaraj Das
Assignee: Sergey Shelukhin
 Fix For: hbase-10070

 Attachments: 10634-1.1.txt, 10634-1.txt, HBASE-10634.02.patch, 
 HBASE-10634.patch, HBASE-10634.patch, multi.out, no-multi.out






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10794) multi-get should handle missing replica location from cache

2014-03-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-10794:
-

Issue Type: Sub-task  (was: Improvement)
Parent: HBASE-10070

 multi-get should handle missing replica location from cache
 ---

 Key: HBASE-10794
 URL: https://issues.apache.org/jira/browse/HBASE-10794
 Project: HBase
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-10070


 Currently the way cache works is that the meta row is stored together for all 
 replicas of a region, so if some replicas are in recovery, getting locations 
 for a region will still go to cache only and return null locations for these. 
 Multi-get currently ignores such replicas. It should instead try to get 
 location again from meta if any replica is null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10792) RingBufferTruck does not release its payload

2014-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940974#comment-13940974
 ] 

Hadoop QA commented on HBASE-10792:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12635618/HBASE-10792.00.patch
  against trunk revision .
  ATTACHMENT ID: 12635618

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.mapreduce.TestImportExport.testImport94Table(TestImportExport.java:230)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9045//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9045//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9045//console

This message is automatically generated.

 RingBufferTruck does not release its payload
 

 Key: HBASE-10792
 URL: https://issues.apache.org/jira/browse/HBASE-10792
 Project: HBase
  Issue Type: Bug
  Components: Performance, wal
Affects Versions: 0.99.0
Reporter: Nick Dimiduk
 Attachments: HBASE-10792.00.patch


 Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox 
 and watch as HBase eventually dies with an OOM: heap space. Examining the 
 heap dump shows an extremely large retained size of KeyValue and 
 RingBufferTrunk instances. By my eye, the default value of 
 {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a 
 small default heap size, or the RBT instances need to release their payloads 
 after consumers retrieve them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10793) AuthFailed as a valid zookeeper state

2014-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940987#comment-13940987
 ] 

Hadoop QA commented on HBASE-10793:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12635623/HBASE-10793-trunk-v0.patch
  against trunk revision .
  ATTACHMENT ID: 12635623

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9046//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9046//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9046//console

This message is automatically generated.

 AuthFailed as a valid zookeeper state 
 --

 Key: HBASE-10793
 URL: https://issues.apache.org/jira/browse/HBASE-10793
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Affects Versions: 0.96.0
Reporter: Demai Ni
Assignee: Demai Ni
 Fix For: 0.96.2, 0.99.0, 0.98.2

 Attachments: HBASE-10793-trunk-v0.patch


 In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed 
 message indicates the client could not be authenticated, but it should 
 proceed anyway, because only access to znodes that require SASL 
 authentication will be denied and this client may never need to access them. 
 Furthermore, AuthFailed is a valid event supported by Zookeeper, and 
 following are valid Zookeeper events:
 case0: return KeeperState.Disconnected;
 case3: return KeeperState.SyncConnected;
 case4: return KeeperState.AuthFailed;
 case5: return KeeperState.ConnectedReadOnly;
 case6: return KeeperState.SaslAuthenticated;
 case -112: return KeeperState.Expired;
 Based on above, ZooKeeperWatcher should not throw exception for AuthFailed 
 event as an invalid event. For this kind of event, Zookeeper already logs it 
 as a warning and proceed with non-SASL connection.
 {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid}
 hbase(main):006:0 list
 TABLE 

[jira] [Commented] (HBASE-10793) AuthFailed as a valid zookeeper state

2014-03-19 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940995#comment-13940995
 ] 

Ted Yu commented on HBASE-10793:


+1

 AuthFailed as a valid zookeeper state 
 --

 Key: HBASE-10793
 URL: https://issues.apache.org/jira/browse/HBASE-10793
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Affects Versions: 0.96.0
Reporter: Demai Ni
Assignee: Demai Ni
 Fix For: 0.96.2, 0.99.0, 0.98.2

 Attachments: HBASE-10793-trunk-v0.patch


 In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed 
 message indicates the client could not be authenticated, but it should 
 proceed anyway, because only access to znodes that require SASL 
 authentication will be denied and this client may never need to access them. 
 Furthermore, AuthFailed is a valid event supported by Zookeeper, and 
 following are valid Zookeeper events:
 case0: return KeeperState.Disconnected;
 case3: return KeeperState.SyncConnected;
 case4: return KeeperState.AuthFailed;
 case5: return KeeperState.ConnectedReadOnly;
 case6: return KeeperState.SaslAuthenticated;
 case -112: return KeeperState.Expired;
 Based on above, ZooKeeperWatcher should not throw exception for AuthFailed 
 event as an invalid event. For this kind of event, Zookeeper already logs it 
 as a warning and proceed with non-SASL connection.
 {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid}
 hbase(main):006:0 list
 TABLE 
   
 
 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher
 java.lang.IllegalStateException: Received event is not valid: AuthFailed
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 BIMonitoring  
   
 
 BIMonitoringSummary   
   
 
 BIMonitoringSummary180
   
 
 BIMonitoringSummary900
   
 
 LogMetadata   
   
 
 LogRecords
   
 
 Mtable
   
 
 t1
   
 
 t2
   
 
 9 row(s) in 0.4040 seconds
 = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, 
 BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2]
 {code}
 the patch will be similar as HBase-8757



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10794) multi-get should handle missing replica location from cache

2014-03-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-10794:
-

Attachment: HBASE-10794.patch

This patch is on top of two blocking patches

 multi-get should handle missing replica location from cache
 ---

 Key: HBASE-10794
 URL: https://issues.apache.org/jira/browse/HBASE-10794
 Project: HBase
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-10070

 Attachments: HBASE-10794.patch


 Currently the way cache works is that the meta row is stored together for all 
 replicas of a region, so if some replicas are in recovery, getting locations 
 for a region will still go to cache only and return null locations for these. 
 Multi-get currently ignores such replicas. It should instead try to get 
 location again from meta if any replica is null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10634) Multiget doesn't fully work

2014-03-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941005#comment-13941005
 ] 

Sergey Shelukhin commented on HBASE-10634:
--

+1 for combined patch. Also improves some confusing logging...

 Multiget doesn't fully work
 ---

 Key: HBASE-10634
 URL: https://issues.apache.org/jira/browse/HBASE-10634
 Project: HBase
  Issue Type: Sub-task
Reporter: Devaraj Das
Assignee: Sergey Shelukhin
 Fix For: hbase-10070

 Attachments: 10634-1.1.txt, 10634-1.txt, HBASE-10634.02.patch, 
 HBASE-10634.patch, HBASE-10634.patch, multi.out, no-multi.out






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HBASE-10792) RingBufferTruck does not release its payload

2014-03-19 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk reassigned HBASE-10792:


Assignee: Nick Dimiduk

 RingBufferTruck does not release its payload
 

 Key: HBASE-10792
 URL: https://issues.apache.org/jira/browse/HBASE-10792
 Project: HBase
  Issue Type: Bug
  Components: Performance, wal
Affects Versions: 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HBASE-10792.00.patch


 Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox 
 and watch as HBase eventually dies with an OOM: heap space. Examining the 
 heap dump shows an extremely large retained size of KeyValue and 
 RingBufferTrunk instances. By my eye, the default value of 
 {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a 
 small default heap size, or the RBT instances need to release their payloads 
 after consumers retrieve them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10634) Multiget doesn't fully work

2014-03-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941014#comment-13941014
 ] 

Sergey Shelukhin commented on HBASE-10634:
--

Wait, this is not the combined patch

 Multiget doesn't fully work
 ---

 Key: HBASE-10634
 URL: https://issues.apache.org/jira/browse/HBASE-10634
 Project: HBase
  Issue Type: Sub-task
Reporter: Devaraj Das
Assignee: Sergey Shelukhin
 Fix For: hbase-10070

 Attachments: 10634-1.1.txt, 10634-1.txt, HBASE-10634.02.patch, 
 HBASE-10634.patch, HBASE-10634.patch, multi.out, no-multi.out






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10792) RingBufferTruck does not release its payload

2014-03-19 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-10792:
--

Attachment: HBASE-10792.00.patch

Retry

 RingBufferTruck does not release its payload
 

 Key: HBASE-10792
 URL: https://issues.apache.org/jira/browse/HBASE-10792
 Project: HBase
  Issue Type: Bug
  Components: Performance, wal
Affects Versions: 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HBASE-10792.00.patch, HBASE-10792.00.patch


 Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox 
 and watch as HBase eventually dies with an OOM: heap space. Examining the 
 heap dump shows an extremely large retained size of KeyValue and 
 RingBufferTrunk instances. By my eye, the default value of 
 {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a 
 small default heap size, or the RBT instances need to release their payloads 
 after consumers retrieve them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10531) Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo

2014-03-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941036#comment-13941036
 ] 

stack commented on HBASE-10531:
---

+1

Add followup jiras here as per Andrew.

 Revisit how the key byte[] is passed to HFileScanner.seekTo and reseekTo
 

 Key: HBASE-10531
 URL: https://issues.apache.org/jira/browse/HBASE-10531
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.99.0

 Attachments: HBASE-10531.patch, HBASE-10531_1.patch, 
 HBASE-10531_2.patch, HBASE-10531_3.patch, HBASE-10531_4.patch, 
 HBASE-10531_5.patch, HBASE-10531_6.patch, HBASE-10531_7.patch


 Currently the byte[] key passed to HFileScanner.seekTo and 
 HFileScanner.reseekTo, is a combination of row, cf, qual, type and ts.  And 
 the caller forms this by using kv.getBuffer, which is actually deprecated.  
 So see how this can be achieved considering kv.getBuffer is removed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10792) RingBufferTruck does not release its payload

2014-03-19 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941051#comment-13941051
 ] 

Himanshu Vashishtha commented on HBASE-10792:
-

+1

 RingBufferTruck does not release its payload
 

 Key: HBASE-10792
 URL: https://issues.apache.org/jira/browse/HBASE-10792
 Project: HBase
  Issue Type: Bug
  Components: Performance, wal
Affects Versions: 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HBASE-10792.00.patch, HBASE-10792.00.patch


 Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox 
 and watch as HBase eventually dies with an OOM: heap space. Examining the 
 heap dump shows an extremely large retained size of KeyValue and 
 RingBufferTrunk instances. By my eye, the default value of 
 {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a 
 small default heap size, or the RBT instances need to release their payloads 
 after consumers retrieve them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1

2014-03-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941052#comment-13941052
 ] 

stack commented on HBASE-10781:
---

[~eclark] The hadoop3 patch doesn't include a hadoop3-compat-module, 
HBASE-6581, as yet.  We discussed adding one though IIRC because its all about 
changes in sync API.  So I don't see us giving up your little compat 'system' 
yet.

I got rid of the little build script because while it served a purpose, it is 
ugly. We can revive it if we need such a beast going forward (maven will be 
'fixed' the next time we need this kind of facility -- smile).

I'm testing out my little make-rc.sh changes to run against trunk.. The built 
tarball has some CLASSPATH issues.  Trying to fix before commit.

 Remove hadoop-one-compat module and all references to hadoop1
 -

 Key: HBASE-10781
 URL: https://issues.apache.org/jira/browse/HBASE-10781
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0

 Attachments: 10690.txt, 10690v2.txt


 Clean out hadoop1 references.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10634) Multiget doesn't fully work

2014-03-19 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941072#comment-13941072
 ] 

Devaraj Das commented on HBASE-10634:
-

[~sershe], you got confused :-) The combined patch will be the one with 
HBASE-10794.

 Multiget doesn't fully work
 ---

 Key: HBASE-10634
 URL: https://issues.apache.org/jira/browse/HBASE-10634
 Project: HBase
  Issue Type: Sub-task
Reporter: Devaraj Das
Assignee: Sergey Shelukhin
 Fix For: hbase-10070

 Attachments: 10634-1.1.txt, 10634-1.txt, HBASE-10634.02.patch, 
 HBASE-10634.patch, HBASE-10634.patch, multi.out, no-multi.out






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10776) Separate HConnectionManager into several parts

2014-03-19 Thread Yi Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941075#comment-13941075
 ] 

Yi Deng commented on HBASE-10776:
-

Cool. Another refactoring job I'm thinking is to remove the unnecessary 
inheritance whenever possible, using composition instead. Hopefully the class 
tree could become much flatter.

 Separate HConnectionManager into several parts
 --

 Key: HBASE-10776
 URL: https://issues.apache.org/jira/browse/HBASE-10776
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.89-fb
Reporter: Yi Deng
Priority: Minor
 Fix For: 0.89-fb


 HConnectionManager is too large to effectively maintain. This Jira records 
 some refactoring jobs:
 1. Move TableServers out as a standalone class
 2. Move region-locating code as a class



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10793) AuthFailed as a valid zookeeper state

2014-03-19 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941078#comment-13941078
 ] 

Andrew Purtell commented on HBASE-10793:


lgtm

Going to commit to 0.96+ shortly

Ping [~stack]

 AuthFailed as a valid zookeeper state 
 --

 Key: HBASE-10793
 URL: https://issues.apache.org/jira/browse/HBASE-10793
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Affects Versions: 0.96.0
Reporter: Demai Ni
Assignee: Demai Ni
 Fix For: 0.96.2, 0.99.0, 0.98.2

 Attachments: HBASE-10793-trunk-v0.patch


 In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed 
 message indicates the client could not be authenticated, but it should 
 proceed anyway, because only access to znodes that require SASL 
 authentication will be denied and this client may never need to access them. 
 Furthermore, AuthFailed is a valid event supported by Zookeeper, and 
 following are valid Zookeeper events:
 case0: return KeeperState.Disconnected;
 case3: return KeeperState.SyncConnected;
 case4: return KeeperState.AuthFailed;
 case5: return KeeperState.ConnectedReadOnly;
 case6: return KeeperState.SaslAuthenticated;
 case -112: return KeeperState.Expired;
 Based on above, ZooKeeperWatcher should not throw exception for AuthFailed 
 event as an invalid event. For this kind of event, Zookeeper already logs it 
 as a warning and proceed with non-SASL connection.
 {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid}
 hbase(main):006:0 list
 TABLE 
   
 
 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher
 java.lang.IllegalStateException: Received event is not valid: AuthFailed
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 BIMonitoring  
   
 
 BIMonitoringSummary   
   
 
 BIMonitoringSummary180
   
 
 BIMonitoringSummary900
   
 
 LogMetadata   
   
 
 LogRecords
   
 
 Mtable
   
 
 t1
   
 
 t2
   
 
 9 row(s) in 0.4040 seconds
 = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, 
 BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2]
 {code}
 the patch will be similar as HBase-8757



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10794) multi-get should handle missing replica location from cache

2014-03-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-10794:
-

Attachment: HBASE-10794.patch

include other changes that are not part of HBASE-10634

 multi-get should handle missing replica location from cache
 ---

 Key: HBASE-10794
 URL: https://issues.apache.org/jira/browse/HBASE-10794
 Project: HBase
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-10070

 Attachments: HBASE-10794.patch, HBASE-10794.patch


 Currently the way cache works is that the meta row is stored together for all 
 replicas of a region, so if some replicas are in recovery, getting locations 
 for a region will still go to cache only and return null locations for these. 
 Multi-get currently ignores such replicas. It should instead try to get 
 location again from meta if any replica is null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1

2014-03-19 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941099#comment-13941099
 ] 

Enis Soztutar commented on HBASE-10781:
---

Yeah, we can keep the hadoop-compat modules, since I think some hadoop-2.x 
might also require a shim of its own even before 3.0. 

 Remove hadoop-one-compat module and all references to hadoop1
 -

 Key: HBASE-10781
 URL: https://issues.apache.org/jira/browse/HBASE-10781
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0

 Attachments: 10690.txt, 10690v2.txt


 Clean out hadoop1 references.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10793) AuthFailed as a valid zookeeper state

2014-03-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941101#comment-13941101
 ] 

stack commented on HBASE-10793:
---

ok

 AuthFailed as a valid zookeeper state 
 --

 Key: HBASE-10793
 URL: https://issues.apache.org/jira/browse/HBASE-10793
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Affects Versions: 0.96.0
Reporter: Demai Ni
Assignee: Demai Ni
 Fix For: 0.96.2, 0.99.0, 0.98.2

 Attachments: HBASE-10793-trunk-v0.patch


 In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed 
 message indicates the client could not be authenticated, but it should 
 proceed anyway, because only access to znodes that require SASL 
 authentication will be denied and this client may never need to access them. 
 Furthermore, AuthFailed is a valid event supported by Zookeeper, and 
 following are valid Zookeeper events:
 case0: return KeeperState.Disconnected;
 case3: return KeeperState.SyncConnected;
 case4: return KeeperState.AuthFailed;
 case5: return KeeperState.ConnectedReadOnly;
 case6: return KeeperState.SaslAuthenticated;
 case -112: return KeeperState.Expired;
 Based on above, ZooKeeperWatcher should not throw exception for AuthFailed 
 event as an invalid event. For this kind of event, Zookeeper already logs it 
 as a warning and proceed with non-SASL connection.
 {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid}
 hbase(main):006:0 list
 TABLE 
   
 
 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher
 java.lang.IllegalStateException: Received event is not valid: AuthFailed
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 BIMonitoring  
   
 
 BIMonitoringSummary   
   
 
 BIMonitoringSummary180
   
 
 BIMonitoringSummary900
   
 
 LogMetadata   
   
 
 LogRecords
   
 
 Mtable
   
 
 t1
   
 
 t2
   
 
 9 row(s) in 0.4040 seconds
 = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, 
 BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2]
 {code}
 the patch will be similar as HBase-8757



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7847) Use zookeeper multi to clear znodes

2014-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941106#comment-13941106
 ] 

Hadoop QA commented on HBASE-7847:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12635627/7847_v6.patch
  against trunk revision .
  ATTACHMENT ID: 12635627

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.mapreduce.TestTableMapReduceBase.testMultiRegionTable(TestTableMapReduceBase.java:96)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9047//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9047//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9047//console

This message is automatically generated.

 Use zookeeper multi to clear znodes
 ---

 Key: HBASE-7847
 URL: https://issues.apache.org/jira/browse/HBASE-7847
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu
 Attachments: 7847-v1.txt, 7847_v6.patch, HBASE-7847.patch, 
 HBASE-7847.patch, HBASE-7847.patch, HBASE-7847_v4.patch, HBASE-7847_v5.patch, 
 HBASE-7847_v6.patch


 In ZKProcedureUtil, clearChildZNodes() and clearZNodes(String procedureName) 
 should utilize zookeeper multi so that they're atomic



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1

2014-03-19 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-10781:
--

Attachment: 10781v3.txt

This works for me.  What I'll commit unless objection.

 Remove hadoop-one-compat module and all references to hadoop1
 -

 Key: HBASE-10781
 URL: https://issues.apache.org/jira/browse/HBASE-10781
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0

 Attachments: 10690.txt, 10690v2.txt, 10781v3.txt


 Clean out hadoop1 references.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10782) Hadoop2 MR tests fail occasionally because of mapreduce.jobhistory.address is no set in job conf

2014-03-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941119#comment-13941119
 ] 

stack commented on HBASE-10782:
---

+1

 Hadoop2 MR tests fail occasionally because of mapreduce.jobhistory.address is 
 no set in job conf
 

 Key: HBASE-10782
 URL: https://issues.apache.org/jira/browse/HBASE-10782
 Project: HBase
  Issue Type: Test
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Attachments: HBASE-10782-trunk-v1.diff


 Hadoop2 MR tests fail occasionally with output like this:
 {code}
 ---
 Test set: org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1
 ---
 Tests run: 5, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 347.57 sec 
  FAILURE!
 testScanEmptyToAPP(org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1)
   Time elapsed: 50.047 sec   ERROR!
 java.io.IOException: java.net.ConnectException: Call From 
 liushaohui-OptiPlex-990/127.0.0.1 to 0.0.0.0:10020 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:334)
   at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419)
   at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:524)
   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314)
   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311)
   at java.security.AccessController.doPrivileged(Native Method)
  ...
 {code}
 The reason is that when MR job was running, the job client pulled the job 
 status from AppMaster. When the job is completed, the AppMaster will exit. At 
 this time, if the job client have not got the job completed event from 
 AppMaster, it will switch to get job report from history server. 
 But in HBaseTestingUtility#startMiniMapReduceCluster, the config: 
 mapreduce.jobhistory.address is not copied to TestUtil's config.
  
 CRUNCH-249 reported the same problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10790) make assembly:single as default in pom.xml

2014-03-19 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941130#comment-13941130
 ] 

Enis Soztutar commented on HBASE-10790:
---

I am not in favor of this, since install requires package, it would mean that 
every mvn install would build the tarball. And even in my SSD MBP, it takes 40 
sec to build the tarball. 

Building the tarball is a much less frequent operation than mvn install (at 
least in my daily development). 

 make assembly:single as default in pom.xml
 --

 Key: HBASE-10790
 URL: https://issues.apache.org/jira/browse/HBASE-10790
 Project: HBase
  Issue Type: Improvement
Reporter: Liu Shaohui
Priority: Minor
 Attachments: HBASE-10790-trunk-v1.diff


 Now to compile a HBase tar release package, we should use
 the cmd: 
 {code}
  mvn clean package assembly:single
 {code}, which is not convenient. We can make assembly:single as a default 
 option and run the assembly plugin in maven package phase. Then we can just 
 use the cmd {code} mvn clean package {code} to get a release package.
 Other suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1

2014-03-19 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941131#comment-13941131
 ] 

Enis Soztutar commented on HBASE-10781:
---

lgtm. 

 Remove hadoop-one-compat module and all references to hadoop1
 -

 Key: HBASE-10781
 URL: https://issues.apache.org/jira/browse/HBASE-10781
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0

 Attachments: 10690.txt, 10690v2.txt, 10781v3.txt


 Clean out hadoop1 references.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1

2014-03-19 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-10781:
--

  Resolution: Fixed
Hadoop Flags: Incompatible change,Reviewed  (was: Incompatible change)
  Status: Resolved  (was: Patch Available)

Committed since got +1 from the RM.

 Remove hadoop-one-compat module and all references to hadoop1
 -

 Key: HBASE-10781
 URL: https://issues.apache.org/jira/browse/HBASE-10781
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0

 Attachments: 10690.txt, 10690v2.txt, 10781v3.txt


 Clean out hadoop1 references.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10782) Hadoop2 MR tests fail occasionally because of mapreduce.jobhistory.address is no set in job conf

2014-03-19 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941139#comment-13941139
 ] 

Nick Dimiduk commented on HBASE-10782:
--

+1

 Hadoop2 MR tests fail occasionally because of mapreduce.jobhistory.address is 
 no set in job conf
 

 Key: HBASE-10782
 URL: https://issues.apache.org/jira/browse/HBASE-10782
 Project: HBase
  Issue Type: Test
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Attachments: HBASE-10782-trunk-v1.diff


 Hadoop2 MR tests fail occasionally with output like this:
 {code}
 ---
 Test set: org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1
 ---
 Tests run: 5, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 347.57 sec 
  FAILURE!
 testScanEmptyToAPP(org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1)
   Time elapsed: 50.047 sec   ERROR!
 java.io.IOException: java.net.ConnectException: Call From 
 liushaohui-OptiPlex-990/127.0.0.1 to 0.0.0.0:10020 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:334)
   at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419)
   at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:524)
   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314)
   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311)
   at java.security.AccessController.doPrivileged(Native Method)
  ...
 {code}
 The reason is that when MR job was running, the job client pulled the job 
 status from AppMaster. When the job is completed, the AppMaster will exit. At 
 this time, if the job client have not got the job completed event from 
 AppMaster, it will switch to get job report from history server. 
 But in HBaseTestingUtility#startMiniMapReduceCluster, the config: 
 mapreduce.jobhistory.address is not copied to TestUtil's config.
  
 CRUNCH-249 reported the same problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-8963) Add configuration option to skip HFile archiving

2014-03-19 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941142#comment-13941142
 ] 

Enis Soztutar commented on HBASE-8963:
--

I don't imagine anyone wanting to run with skip archive as a global config on 
production. I think we should not do the global config at all, but allow drop 
table to have an option to skip. Doesn't snapshots refer to files in the 
archive? If we do SKIP_ARCHIVE as a table property, the previous snapshots will 
be broken with compactions I guess. 

I think we should do a rm -rf kind of think in drop table. If the files are 
not referred, they are not moved to archive, but deleted instead. 

 Add configuration option to skip HFile archiving
 

 Key: HBASE-8963
 URL: https://issues.apache.org/jira/browse/HBASE-8963
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: bharath v
 Fix For: 0.99.0

 Attachments: HBASE-8963.trunk.v1.patch, HBASE-8963.trunk.v2.patch, 
 HBASE-8963.trunk.v3.patch, HBASE-8963.trunk.v4.patch, 
 HBASE-8963.trunk.v5.patch, HBASE-8963.trunk.v6.patch, 
 HBASE-8963.trunk.v7.patch


 Currently HFileArchiver is always called when a table is dropped.
 A configuration option (either global or per table) should be provided so 
 that archiving can be skipped when table is deleted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-10690) Drop Hadoop-1 support

2014-03-19 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-10690.
---

  Resolution: Fixed
Assignee: stack
Release Note: Trunk no longer has support for hadoop1.  You cannot build 
against hadoop1.  We now generate one artifact only and our artifact naming no 
longer includes the hadoop version we were built against since we only build 
against one version, hadoop2: e.g. the hbase 1.0.0 release will be named 
hbase-1.0.0, not hbase-1.0.0-hadoop1 (or hbase-1.0.0-hadoop2).
Hadoop Flags: Incompatible change

Resolving this umbrella issue.  All sub tasks are done.  Documentation is in 
the refguide, the assembly does not include hadoop1, and the build is 
straight-forward w/ no need to make a hadoop1 or hadoop2 artifact -- it is all 
hadoop2 all the time from here on out -- and doc'd in the refguide.

 Drop Hadoop-1 support
 -

 Key: HBASE-10690
 URL: https://issues.apache.org/jira/browse/HBASE-10690
 Project: HBase
  Issue Type: Improvement
Reporter: Enis Soztutar
Assignee: stack
Priority: Critical
 Fix For: 0.99.0


 As per thread:
 http://mail-archives.apache.org/mod_mbox/hbase-dev/201403.mbox/%3ccamuu0w93mgp7zbbxgccov+be3etmkvn5atzowvzqd_gegdk...@mail.gmail.com%3E
 It seems that the consensus is that supporting Hadoop-1 in HBase-1.x will be 
 costly, so we should drop the support. 
 In this issue: 
  - We'll document that Hadoop-1 support is deprecated in HBase-0.98. And 
 users should switch to hadoop-2.2+ anyway. 
  - Document that upcoming HBase-0.99 and HBase-1.0 releases will not have 
 Hadoop-1 support. 
  - Document that there is no rolling upgrade support for going between 
 Hadoop-1 and Hadoop-2 (using HBase-0.96 or 0.98).
  - Release artifacts won't contain HBase build with Hadoop-1. 
  - We may keep the profile, jenkins job etc if we want.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates

2014-03-19 Thread Ted Yu (JIRA)
Ted Yu created HBASE-10795:
--

 Summary: TestHBaseFsck#testHBaseFsck() should drop the table it 
creates
 Key: HBASE-10795
 URL: https://issues.apache.org/jira/browse/HBASE-10795
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 10795-v1.txt

When investigating TestHBaseFsck test failures, I often saw the following 
(https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/):
{code}
Number of Tables: 3
  Table: tableBadMetaAssign rw   families: 1
  Table: hbase:namespacerw   families: 1
  Table: testSplitdaughtersNotInMetarw   families: 1
{code}
TestHBaseFsck#testHBaseFsck() should drop the table it creates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates

2014-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10795:
---

Attachment: 10795-v1.txt

Patch v1 adds a finally clause to drop the table.

 TestHBaseFsck#testHBaseFsck() should drop the table it creates
 --

 Key: HBASE-10795
 URL: https://issues.apache.org/jira/browse/HBASE-10795
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 10795-v1.txt


 When investigating TestHBaseFsck test failures, I often saw the following 
 (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/):
 {code}
 Number of Tables: 3
   Table: tableBadMetaAssign   rw   families: 1
   Table: hbase:namespace  rw   families: 1
   Table: testSplitdaughtersNotInMeta  rw   families: 1
 {code}
 TestHBaseFsck#testHBaseFsck() should drop the table it creates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates

2014-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10795:
---

Status: Patch Available  (was: Open)

 TestHBaseFsck#testHBaseFsck() should drop the table it creates
 --

 Key: HBASE-10795
 URL: https://issues.apache.org/jira/browse/HBASE-10795
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 10795-v1.txt


 When investigating TestHBaseFsck test failures, I often saw the following 
 (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/):
 {code}
 Number of Tables: 3
   Table: tableBadMetaAssign   rw   families: 1
   Table: hbase:namespace  rw   families: 1
   Table: testSplitdaughtersNotInMeta  rw   families: 1
 {code}
 TestHBaseFsck#testHBaseFsck() should drop the table it creates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10791) Add integration test to demonstrate performance improvement

2014-03-19 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941160#comment-13941160
 ] 

Nick Dimiduk commented on HBASE-10791:
--

Looks like I posted an outdated stash from yesterday, instead of the current 
patch, so some of these details don't make sense (like the PerfEvalCallable 
constructor args). Updating patch momentarily.

 Add integration test to demonstrate performance improvement
 ---

 Key: HBASE-10791
 URL: https://issues.apache.org/jira/browse/HBASE-10791
 Project: HBase
  Issue Type: Sub-task
  Components: Performance, test
Affects Versions: hbase-10070
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HBASE-10791.00.patch


 It would be good to demonstrate that use of region replicas reduces read 
 latency. PerformanceEvaluation can be used manually for this purpose, but 
 it's not able to use ChaosMonkey. An integration test can set up the monkey 
 actions and automate execution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10792) RingBufferTruck does not release its payload

2014-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941162#comment-13941162
 ] 

Hadoop QA commented on HBASE-10792:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12635650/HBASE-10792.00.patch
  against trunk revision .
  ATTACHMENT ID: 12635650

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9048//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9048//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9048//console

This message is automatically generated.

 RingBufferTruck does not release its payload
 

 Key: HBASE-10792
 URL: https://issues.apache.org/jira/browse/HBASE-10792
 Project: HBase
  Issue Type: Bug
  Components: Performance, wal
Affects Versions: 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HBASE-10792.00.patch, HBASE-10792.00.patch


 Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox 
 and watch as HBase eventually dies with an OOM: heap space. Examining the 
 heap dump shows an extremely large retained size of KeyValue and 
 RingBufferTrunk instances. By my eye, the default value of 
 {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a 
 small default heap size, or the RBT instances need to release their payloads 
 after consumers retrieve them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates

2014-03-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941172#comment-13941172
 ] 

stack commented on HBASE-10795:
---

Why?  Doesn't the cluster get shut down irrespective and then the dirs removed? 
 Why do more work?

 TestHBaseFsck#testHBaseFsck() should drop the table it creates
 --

 Key: HBASE-10795
 URL: https://issues.apache.org/jira/browse/HBASE-10795
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 10795-v1.txt


 When investigating TestHBaseFsck test failures, I often saw the following 
 (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/):
 {code}
 Number of Tables: 3
   Table: tableBadMetaAssign   rw   families: 1
   Table: hbase:namespace  rw   families: 1
   Table: testSplitdaughtersNotInMeta  rw   families: 1
 {code}
 TestHBaseFsck#testHBaseFsck() should drop the table it creates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates

2014-03-19 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-10795:
--

Priority: Trivial  (was: Major)

 TestHBaseFsck#testHBaseFsck() should drop the table it creates
 --

 Key: HBASE-10795
 URL: https://issues.apache.org/jira/browse/HBASE-10795
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Trivial
 Attachments: 10795-v1.txt


 When investigating TestHBaseFsck test failures, I often saw the following 
 (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/):
 {code}
 Number of Tables: 3
   Table: tableBadMetaAssign   rw   families: 1
   Table: hbase:namespace  rw   families: 1
   Table: testSplitdaughtersNotInMeta  rw   families: 1
 {code}
 TestHBaseFsck#testHBaseFsck() should drop the table it creates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure

2014-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941174#comment-13941174
 ] 

Hudson commented on HBASE-10786:


ABORTED: Integrated in HBase-0.98-on-Hadoop-1.1 #226 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/226/])
HBASE-10786 If snapshot verification fails with 'Regions moved', the message 
should contain the name of region causing the failure (tedyu: rev 1579373)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/MasterSnapshotVerifier.java


 If snapshot verification fails with 'Regions moved', the message should 
 contain the name of region causing the failure
 --

 Key: HBASE-10786
 URL: https://issues.apache.org/jira/browse/HBASE-10786
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.99.0, 0.98.2

 Attachments: 10786-v1.txt, 10786-v2.txt, 10786-v3.txt


 I was trying to find cause for test failure in 
 https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/
  :
 {code}
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: 
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
 ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an 
 error.  Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] }
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342)
   at 
 org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
 Failed taking snapshot { ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during 
 the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 
 type=FLUSH }'. expected=9 
 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
 Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8
   at 
 org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
   at 
 org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320)
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332)
   ... 11 more
 {code}
 However, it is not clear which region caused the verification to fail.
 I searched for log from balancer but found none.
 The exception message should include region name which caused the 
 verification to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure

2014-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941176#comment-13941176
 ] 

Hudson commented on HBASE-10786:


ABORTED: Integrated in HBase-TRUNK #5024 (See 
[https://builds.apache.org/job/HBase-TRUNK/5024/])
HBASE-10786 If snapshot verification fails with 'Regions moved', the message 
should contain the name of region causing the failure (tedyu: rev 1579374)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/MasterSnapshotVerifier.java


 If snapshot verification fails with 'Regions moved', the message should 
 contain the name of region causing the failure
 --

 Key: HBASE-10786
 URL: https://issues.apache.org/jira/browse/HBASE-10786
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.99.0, 0.98.2

 Attachments: 10786-v1.txt, 10786-v2.txt, 10786-v3.txt


 I was trying to find cause for test failure in 
 https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/
  :
 {code}
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: 
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
 ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an 
 error.  Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] }
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342)
   at 
 org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
 Failed taking snapshot { ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during 
 the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 
 type=FLUSH }'. expected=9 
 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
 Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8
   at 
 org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
   at 
 org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320)
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332)
   ... 11 more
 {code}
 However, it is not clear which region caused the verification to fail.
 I searched for log from balancer but found none.
 The exception message should include region name which caused the 
 verification to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10787) TestHCM#testConnection* take too long

2014-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941175#comment-13941175
 ] 

Hudson commented on HBASE-10787:


ABORTED: Integrated in HBase-TRUNK #5024 (See 
[https://builds.apache.org/job/HBase-TRUNK/5024/])
HBASE-10787 TestHCM#testConnection* takes too long (Ted Yu) (apurtell: rev 
1579358)
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java


 TestHCM#testConnection* take too long
 -

 Key: HBASE-10787
 URL: https://issues.apache.org/jira/browse/HBASE-10787
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.99.0

 Attachments: 10787-v1.txt


 TestHCM#testConnectionClose takes more than 5 minutes on Apache Jenkins.
 The test can be shortened when retry count is lowered.
 On my Mac, for TestHCM#testConnection* (two tests)
 without patch:
 {code}
 Running org.apache.hadoop.hbase.client.TestHCM
 2014-03-18 15:46:57.695 java[71368:1203] Unable to load realm info from 
 SCDynamicStore
 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 242.2 sec
 {code}
 with patch:
 {code}
 Running org.apache.hadoop.hbase.client.TestHCM
 2014-03-18 15:40:44.013 java[71184:1203] Unable to load realm info from 
 SCDynamicStore
 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 100.465 sec
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10786) If snapshot verification fails with 'Regions moved', the message should contain the name of region causing the failure

2014-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941173#comment-13941173
 ] 

Hudson commented on HBASE-10786:


ABORTED: Integrated in HBase-0.98 #242 (See 
[https://builds.apache.org/job/HBase-0.98/242/])
HBASE-10786 If snapshot verification fails with 'Regions moved', the message 
should contain the name of region causing the failure (tedyu: rev 1579373)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/MasterSnapshotVerifier.java


 If snapshot verification fails with 'Regions moved', the message should 
 contain the name of region causing the failure
 --

 Key: HBASE-10786
 URL: https://issues.apache.org/jira/browse/HBASE-10786
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.99.0, 0.98.2

 Attachments: 10786-v1.txt, 10786-v2.txt, 10786-v3.txt


 I was trying to find cause for test failure in 
 https://builds.apache.org/job/PreCommit-HBASE-Build/9036//testReport/org.apache.hadoop.hbase.snapshot/TestSecureExportSnapshot/testExportRetry/
  :
 {code}
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: 
 org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
 ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 type=FLUSH } had an 
 error.  Procedure emptySnaptb0-1395177346656 { waiting=[] done=[] }
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342)
   at 
 org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3007)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
 Failed taking snapshot { ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH } due to exception:Regions moved during 
 the snapshot '{ ss=emptySnaptb0-1395177346656 table=testtb-1395177346656 
 type=FLUSH }'. expected=9 
 snapshotted=8:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
 Regions moved during the snapshot '{ ss=emptySnaptb0-1395177346656 
 table=testtb-1395177346656 type=FLUSH }'. expected=9 snapshotted=8
   at 
 org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
   at 
 org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320)
   at 
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332)
   ... 11 more
 {code}
 However, it is not clear which region caused the verification to fail.
 I searched for log from balancer but found none.
 The exception message should include region name which caused the 
 verification to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates

2014-03-19 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941178#comment-13941178
 ] 

Ted Yu commented on HBASE-10795:


Looking at other tests in TestHBaseFsck, such as testHBaseFsckClean(), the 
pattern is:
{code}
try {
  HBaseFsck hbck = doFsck(conf, false);
  assertNoErrors(hbck);

  setupTable(table);
...
} finally {
  deleteTable(table);
}
{code}
TestHBaseFsck#testHBaseFsck() should be consistent with the other tests.

 TestHBaseFsck#testHBaseFsck() should drop the table it creates
 --

 Key: HBASE-10795
 URL: https://issues.apache.org/jira/browse/HBASE-10795
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Trivial
 Attachments: 10795-v1.txt


 When investigating TestHBaseFsck test failures, I often saw the following 
 (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/):
 {code}
 Number of Tables: 3
   Table: tableBadMetaAssign   rw   families: 1
   Table: hbase:namespace  rw   families: 1
   Table: testSplitdaughtersNotInMeta  rw   families: 1
 {code}
 TestHBaseFsck#testHBaseFsck() should drop the table it creates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10793) AuthFailed as a valid zookeeper state

2014-03-19 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941181#comment-13941181
 ] 

Demai Ni commented on HBASE-10793:
--

[~yuzhih...@gmail.com],[~andrew.purt...@gmail.com],[~stack], thanks a lot for 
the review Demai

 AuthFailed as a valid zookeeper state 
 --

 Key: HBASE-10793
 URL: https://issues.apache.org/jira/browse/HBASE-10793
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Affects Versions: 0.96.0
Reporter: Demai Ni
Assignee: Demai Ni
 Fix For: 0.96.2, 0.99.0, 0.98.2

 Attachments: HBASE-10793-trunk-v0.patch


 In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed 
 message indicates the client could not be authenticated, but it should 
 proceed anyway, because only access to znodes that require SASL 
 authentication will be denied and this client may never need to access them. 
 Furthermore, AuthFailed is a valid event supported by Zookeeper, and 
 following are valid Zookeeper events:
 case0: return KeeperState.Disconnected;
 case3: return KeeperState.SyncConnected;
 case4: return KeeperState.AuthFailed;
 case5: return KeeperState.ConnectedReadOnly;
 case6: return KeeperState.SaslAuthenticated;
 case -112: return KeeperState.Expired;
 Based on above, ZooKeeperWatcher should not throw exception for AuthFailed 
 event as an invalid event. For this kind of event, Zookeeper already logs it 
 as a warning and proceed with non-SASL connection.
 {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid}
 hbase(main):006:0 list
 TABLE 
   
 
 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher
 java.lang.IllegalStateException: Received event is not valid: AuthFailed
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 BIMonitoring  
   
 
 BIMonitoringSummary   
   
 
 BIMonitoringSummary180
   
 
 BIMonitoringSummary900
   
 
 LogMetadata   
   
 
 LogRecords
   
 
 Mtable
   
 
 t1
   
 
 t2
   
 
 9 row(s) in 0.4040 seconds
 = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, 
 BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2]
 {code}
 the patch will be similar as HBase-8757



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10793) AuthFailed as a valid zookeeper state

2014-03-19 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10793:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to 0.96-trunk.

 AuthFailed as a valid zookeeper state 
 --

 Key: HBASE-10793
 URL: https://issues.apache.org/jira/browse/HBASE-10793
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Affects Versions: 0.96.0
Reporter: Demai Ni
Assignee: Demai Ni
 Fix For: 0.96.2, 0.99.0, 0.98.2

 Attachments: HBASE-10793-trunk-v0.patch


 In kerberos mode, Zookeeper accepts SASL authentication. The AuthFailed 
 message indicates the client could not be authenticated, but it should 
 proceed anyway, because only access to znodes that require SASL 
 authentication will be denied and this client may never need to access them. 
 Furthermore, AuthFailed is a valid event supported by Zookeeper, and 
 following are valid Zookeeper events:
 case0: return KeeperState.Disconnected;
 case3: return KeeperState.SyncConnected;
 case4: return KeeperState.AuthFailed;
 case5: return KeeperState.ConnectedReadOnly;
 case6: return KeeperState.SaslAuthenticated;
 case -112: return KeeperState.Expired;
 Based on above, ZooKeeperWatcher should not throw exception for AuthFailed 
 event as an invalid event. For this kind of event, Zookeeper already logs it 
 as a warning and proceed with non-SASL connection.
 {code:title=IllegalStateException from ZookeeperWatcher|borderStyle=solid}
 hbase(main):006:0 list
 TABLE 
   
 
 14/01/23 17:26:11 ERROR zookeeper.ClientCnxn: Error while calling watcher
 java.lang.IllegalStateException: Received event is not valid: AuthFailed
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:410)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 BIMonitoring  
   
 
 BIMonitoringSummary   
   
 
 BIMonitoringSummary180
   
 
 BIMonitoringSummary900
   
 
 LogMetadata   
   
 
 LogRecords
   
 
 Mtable
   
 
 t1
   
 
 t2
   
 
 9 row(s) in 0.4040 seconds
 = [BIMonitoring, BIMonitoringSummary, BIMonitoringSummary180, 
 BIMonitoringSummary900, LogMetadata, LogRecords, Mtable, t1, t2]
 {code}
 the patch will be similar as HBase-8757



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10796) Set default log level as INFO

2014-03-19 Thread stack (JIRA)
stack created HBASE-10796:
-

 Summary: Set default log level as INFO
 Key: HBASE-10796
 URL: https://issues.apache.org/jira/browse/HBASE-10796
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack


When we roll out 1.0, the log level should be INFO-level by default, not DEBUG. 

Proposed on mailing list here 
http://search-hadoop.com/m/33P7E1GL08b/hbase+1.0subj=DISCUSSION+1+0+0 and at 
least one other +1 with no objection.





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-3014) Change UnknownScannerException log level to WARN

2014-03-19 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3014.
--

Resolution: Cannot Reproduce

Marking as 'can not repro'. I think this issue is actually fixed after doing a 
survey.  No where do we log this exception explicitly at the ERROR level (not 
any more at least).  It is all INFO-level that I can see.

 Change UnknownScannerException log level to WARN
 

 Key: HBASE-3014
 URL: https://issues.apache.org/jira/browse/HBASE-3014
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.20.6
Reporter: Ken Weiner
Priority: Trivial
 Attachments: hbase-3014.patch


 I see a lot of UnknownScannerException messages in the log at ERROR level 
 when I'm running a MapReduce job that scans an HBase table.  These messages 
 are logged under normal conditions, and according to [~jdcryans], should 
 probably be logged at a less severe log level like WARN.  
 Example error message:
 {code}
 2010-09-16 09:20:52,398 ERROR 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 org.apache.hadoop.hbase.UnknownScannerException: Name: -8711007779313115048
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1880)
   at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 {code}
 Reference to the HBase users mailing list thread where this was originally 
 discussed:
 http://markmail.org/thread/ttzbi6c7et6mrq6o
 This is a simple, change, so I didn't include a formal patch.  If one is 
 required, I will gladly create and attach one.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates

2014-03-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941198#comment-13941198
 ] 

stack commented on HBASE-10795:
---

Is this responsible for the test failure?

 TestHBaseFsck#testHBaseFsck() should drop the table it creates
 --

 Key: HBASE-10795
 URL: https://issues.apache.org/jira/browse/HBASE-10795
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Trivial
 Attachments: 10795-v1.txt


 When investigating TestHBaseFsck test failures, I often saw the following 
 (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/):
 {code}
 Number of Tables: 3
   Table: tableBadMetaAssign   rw   families: 1
   Table: hbase:namespace  rw   families: 1
   Table: testSplitdaughtersNotInMeta  rw   families: 1
 {code}
 TestHBaseFsck#testHBaseFsck() should drop the table it creates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10795) TestHBaseFsck#testHBaseFsck() should drop the table it creates

2014-03-19 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941203#comment-13941203
 ] 

Ted Yu commented on HBASE-10795:


I mentioned the test failure since the Standard Output led me to 
TestHBaseFsck#testHBaseFsck().

This JIRA aligns TestHBaseFsck#testHBaseFsck() with rest of the tests.

Investigation of test failure of TestHBaseFsck#testSplitDaughtersNotInMeta is 
on-going.

 TestHBaseFsck#testHBaseFsck() should drop the table it creates
 --

 Key: HBASE-10795
 URL: https://issues.apache.org/jira/browse/HBASE-10795
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Trivial
 Attachments: 10795-v1.txt


 When investigating TestHBaseFsck test failures, I often saw the following 
 (https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/223/testReport/junit/org.apache.hadoop.hbase.util/TestHBaseFsck/testSplitDaughtersNotInMeta/):
 {code}
 Number of Tables: 3
   Table: tableBadMetaAssign   rw   families: 1
   Table: hbase:namespace  rw   families: 1
   Table: testSplitdaughtersNotInMeta  rw   families: 1
 {code}
 TestHBaseFsck#testHBaseFsck() should drop the table it creates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10792) RingBufferTruck does not release its payload

2014-03-19 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-10792:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for the reviews.

 RingBufferTruck does not release its payload
 

 Key: HBASE-10792
 URL: https://issues.apache.org/jira/browse/HBASE-10792
 Project: HBase
  Issue Type: Bug
  Components: Performance, wal
Affects Versions: 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HBASE-10792.00.patch, HBASE-10792.00.patch


 Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox 
 and watch as HBase eventually dies with an OOM: heap space. Examining the 
 heap dump shows an extremely large retained size of KeyValue and 
 RingBufferTrunk instances. By my eye, the default value of 
 {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a 
 small default heap size, or the RBT instances need to release their payloads 
 after consumers retrieve them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10792) RingBufferTruck does not release its payload

2014-03-19 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-10792:
-

Fix Version/s: 0.99.0

 RingBufferTruck does not release its payload
 

 Key: HBASE-10792
 URL: https://issues.apache.org/jira/browse/HBASE-10792
 Project: HBase
  Issue Type: Bug
  Components: Performance, wal
Affects Versions: 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.99.0

 Attachments: HBASE-10792.00.patch, HBASE-10792.00.patch


 Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox 
 and watch as HBase eventually dies with an OOM: heap space. Examining the 
 heap dump shows an extremely large retained size of KeyValue and 
 RingBufferTrunk instances. By my eye, the default value of 
 {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a 
 small default heap size, or the RBT instances need to release their payloads 
 after consumers retrieve them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10797) Add support for -h and --help to rolling_restart.sh and fix the usage string output

2014-03-19 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-10797:
--

Attachment: 10797.txt

Small patch

 Add support for -h and --help to rolling_restart.sh and fix the usage string 
 output
 ---

 Key: HBASE-10797
 URL: https://issues.apache.org/jira/browse/HBASE-10797
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: stack
Priority: Trivial
 Attachments: 10797.txt


 Messing with rolling restart, when you pass -h or --help, you get a mess for 
 output w/ an odd 'bad argument' complaint
 The usage string printed also was incomplete w curlies in it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10797) Add support for -h and --help to rolling_restart.sh and fix the usage string output

2014-03-19 Thread stack (JIRA)
stack created HBASE-10797:
-

 Summary: Add support for -h and --help to rolling_restart.sh and 
fix the usage string output
 Key: HBASE-10797
 URL: https://issues.apache.org/jira/browse/HBASE-10797
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: stack
Priority: Trivial
 Attachments: 10797.txt

Messing with rolling restart, when you pass -h or --help, you get a mess for 
output w/ an odd 'bad argument' complaint

The usage string printed also was incomplete w curlies in it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-10797) Add support for -h and --help to rolling_restart.sh and fix the usage string output

2014-03-19 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-10797.
---

   Resolution: Fixed
Fix Version/s: 0.96.3
   0.98.2
   0.99.0

Committed trivial patch to 0.96-0.99

 Add support for -h and --help to rolling_restart.sh and fix the usage string 
 output
 ---

 Key: HBASE-10797
 URL: https://issues.apache.org/jira/browse/HBASE-10797
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: stack
Priority: Trivial
 Fix For: 0.99.0, 0.98.2, 0.96.3

 Attachments: 10797.txt


 Messing with rolling restart, when you pass -h or --help, you get a mess for 
 output w/ an odd 'bad argument' complaint
 The usage string printed also was incomplete w curlies in it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10797) Add support for -h and --help to rolling_restart.sh and fix the usage string output

2014-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941215#comment-13941215
 ] 

Hudson commented on HBASE-10797:


FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #123 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/123/])
HBASE-10797 Add support for -h and --help to rolling_restart.sh and fix the 
usage string output (stack: rev 1579477)
* /hbase/trunk/bin/rolling-restart.sh


 Add support for -h and --help to rolling_restart.sh and fix the usage string 
 output
 ---

 Key: HBASE-10797
 URL: https://issues.apache.org/jira/browse/HBASE-10797
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: stack
Priority: Trivial
 Fix For: 0.99.0, 0.98.2, 0.96.3

 Attachments: 10797.txt


 Messing with rolling restart, when you pass -h or --help, you get a mess for 
 output w/ an odd 'bad argument' complaint
 The usage string printed also was incomplete w curlies in it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10787) TestHCM#testConnection* take too long

2014-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941213#comment-13941213
 ] 

Hudson commented on HBASE-10787:


FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #123 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/123/])
HBASE-10787 TestHCM#testConnection* takes too long (Ted Yu) (apurtell: rev 
1579358)
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java


 TestHCM#testConnection* take too long
 -

 Key: HBASE-10787
 URL: https://issues.apache.org/jira/browse/HBASE-10787
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.99.0

 Attachments: 10787-v1.txt


 TestHCM#testConnectionClose takes more than 5 minutes on Apache Jenkins.
 The test can be shortened when retry count is lowered.
 On my Mac, for TestHCM#testConnection* (two tests)
 without patch:
 {code}
 Running org.apache.hadoop.hbase.client.TestHCM
 2014-03-18 15:46:57.695 java[71368:1203] Unable to load realm info from 
 SCDynamicStore
 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 242.2 sec
 {code}
 with patch:
 {code}
 Running org.apache.hadoop.hbase.client.TestHCM
 2014-03-18 15:40:44.013 java[71184:1203] Unable to load realm info from 
 SCDynamicStore
 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 100.465 sec
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10781) Remove hadoop-one-compat module and all references to hadoop1

2014-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941212#comment-13941212
 ] 

Hudson commented on HBASE-10781:


FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #123 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/123/])
HBASE-10781 Remove hadoop-one-compat module and all references to hadoop1 
(stack: rev 1579449)
* /hbase/trunk/dev-support/generate-hadoopX-poms.sh
* /hbase/trunk/dev-support/make_rc.sh
* /hbase/trunk/hbase-assembly/src/main/assembly/components.xml
* /hbase/trunk/hbase-assembly/src/main/assembly/hadoop-one-compat.xml
* /hbase/trunk/hbase-hadoop1-compat
* /hbase/trunk/pom.xml
* /hbase/trunk/src/main/docbkx/developer.xml


 Remove hadoop-one-compat module and all references to hadoop1
 -

 Key: HBASE-10781
 URL: https://issues.apache.org/jira/browse/HBASE-10781
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0

 Attachments: 10690.txt, 10690v2.txt, 10781v3.txt


 Clean out hadoop1 references.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10792) RingBufferTruck does not release its payload

2014-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941211#comment-13941211
 ] 

Hudson commented on HBASE-10792:


FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #123 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/123/])
HBASE-10792 RingBufferTruck does not release its payload (ndimiduk: rev 1579475)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/RingBufferTruck.java


 RingBufferTruck does not release its payload
 

 Key: HBASE-10792
 URL: https://issues.apache.org/jira/browse/HBASE-10792
 Project: HBase
  Issue Type: Bug
  Components: Performance, wal
Affects Versions: 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.99.0

 Attachments: HBASE-10792.00.patch, HBASE-10792.00.patch


 Run a write-heavy workload (PerfEval sequentialWrite) out of a trunk sandbox 
 and watch as HBase eventually dies with an OOM: heap space. Examining the 
 heap dump shows an extremely large retained size of KeyValue and 
 RingBufferTrunk instances. By my eye, the default value of 
 {{hbase.regionserver.wal.disruptor.event.count}} is too large for such a 
 small default heap size, or the RBT instances need to release their payloads 
 after consumers retrieve them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >