[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver

2013-09-14 Thread takeshi.miao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767713#comment-13767713
 ] 

takeshi.miao commented on HBASE-7525:
-

Dear [~stack]

I uploaded the new patches for both 0.95 and trunk with following changes.

1. added a check method for user wheteher pass the tableName with -regionserver 
option
{code}
# user pass tableNames 't1' and 't2' with '-regionserver' option
bin/hbase org.apache.hadoop.hbase.tool.Canary -regionserver t1 t2
...
# will see following error msg from stderr
Cannot pass a tablename when using the -regionserver option, tablenames:[t1, t2]
{code}

2. changed the usage output.
{code}
bin/hbase org.apache.hadoop.hbase.tool.Canary -help
Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table1 
[table2]...] | [regionserver1 [regionserver2]..]
...
{code}

3. removed 'DEBUG [main] tool.Canary: runCount=...' from log msg

Pls tell me if any question, tks~

> A canary monitoring program specifically for regionserver
> -
>
> Key: HBASE-7525
> URL: https://issues.apache.org/jira/browse/HBASE-7525
> Project: HBase
>  Issue Type: New Feature
>  Components: monitoring
>Affects Versions: 0.94.0
>Reporter: takeshi.miao
>Priority: Critical
> Fix For: 0.98.0
>
> Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, 
> HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, 
> HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, 
> HBASE-7525-trunk-v3.patch, HBASE-7525-v0.patch, RegionServerCanary.java
>
>
> *Motivation*
> This ticket is to provide a canary monitoring tool specifically for 
> HRegionserver, details as follows
> 1. This tool is required by operation team due to they thought that the 
> canary for each region of a HBase is too many for them, so I implemented this 
> coarse-granular one based on the original o.a.h.h.tool.Canary for them
> 2. And this tool is implemented by multi-threading, which means the each Get 
> request sent by a thread. the reason I use this way is due to we suffered the 
> region server hung issue by now the root cause is still not clear. so this 
> tool can help operation team to detect hung region server if any.
> *example*
> 1. the tool docs
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help
> Usage: [opts] [regionServerName 1 [regionServrName 2...]]
>  regionServerName - FQDN serverName, can use linux command:hostname -f to 
> check your serverName
>  where [-opts] are:
>-help Show this help and exit.
>-eUse regionServerName as regular expression
>   which means the regionServerName is regular expression pattern
>-f  stop whole program if first error occurs, default is true
>-t  timeout for a check, default is 60 (milisecs)
>-daemonContinuous check at defined intervals.
>-interval   Interval between checks (sec)
> 2. Will send a request to each regionserver in a HBase cluster
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary
> 3. Will send a request to a regionserver by given name
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname
> 4. Will send a request to regionserver(s) by given regular-expression
> /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e 
> rs1.domainname.pattern
> // another example
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e 
> tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org
> 5. Will send a request to a regionserver and also set a timeout limit for 
> this test
> // query regionserver:rs1.domainname with timeout limit 10sec
> // -f false, means that will not exit this program even test failed
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 
> rs1.domainname
> // echo "1" if timeout
> echo "$?"
> 6. Will run as daemon mode, which means it will send request to each 
> regionserver periodically
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HBASE-9523) Audit of hbase-common @InterfaceAudience.Public apis.

2013-09-14 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767711#comment-13767711
 ] 

Jonathan Hsieh edited comment on HBASE-9523 at 9/15/13 6:46 AM:


Committed v2 to 96 and trunk.  Thanks for taking a look nick and stack.

  was (Author: jmhsieh):
Committed v2 to 96 and trunk.  Thanks for taking a look nik and stack.
  
> Audit of hbase-common @InterfaceAudience.Public apis.
> -
>
> Key: HBASE-9523
> URL: https://issues.apache.org/jira/browse/HBASE-9523
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.95.2
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9523.patch, hbase-9523.v2.patch
>
>
> Do an audit of all public classes to make suare we are only publicly exposing 
> what must be exposed.   
> This was done by comparing the Public only version of the javadoc generated 
> by HBASE-9517 to a local javadoc for the hbase-common module (cd 
> hbase-common; mvn javadoc:javadoc).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9523) Audit of hbase-common @InterfaceAudience.Public apis.

2013-09-14 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767711#comment-13767711
 ] 

Jonathan Hsieh commented on HBASE-9523:
---

Committed v2 to 96 and trunk.  Thanks for taking a look nik and stack.

> Audit of hbase-common @InterfaceAudience.Public apis.
> -
>
> Key: HBASE-9523
> URL: https://issues.apache.org/jira/browse/HBASE-9523
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.95.2
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9523.patch, hbase-9523.v2.patch
>
>
> Do an audit of all public classes to make suare we are only publicly exposing 
> what must be exposed.   
> This was done by comparing the Public only version of the javadoc generated 
> by HBASE-9517 to a local javadoc for the hbase-common module (cd 
> hbase-common; mvn javadoc:javadoc).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver

2013-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767709#comment-13767709
 ] 

Hadoop QA commented on HBASE-7525:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12603227/HBASE-7525-0.95-v7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 hadoop1.0{color}.  The patch failed to compile against the 
hadoop 1.0 profile.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7232//console

This message is automatically generated.

> A canary monitoring program specifically for regionserver
> -
>
> Key: HBASE-7525
> URL: https://issues.apache.org/jira/browse/HBASE-7525
> Project: HBase
>  Issue Type: New Feature
>  Components: monitoring
>Affects Versions: 0.94.0
>Reporter: takeshi.miao
>Priority: Critical
> Fix For: 0.98.0
>
> Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, 
> HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, 
> HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, 
> HBASE-7525-trunk-v3.patch, HBASE-7525-v0.patch, RegionServerCanary.java
>
>
> *Motivation*
> This ticket is to provide a canary monitoring tool specifically for 
> HRegionserver, details as follows
> 1. This tool is required by operation team due to they thought that the 
> canary for each region of a HBase is too many for them, so I implemented this 
> coarse-granular one based on the original o.a.h.h.tool.Canary for them
> 2. And this tool is implemented by multi-threading, which means the each Get 
> request sent by a thread. the reason I use this way is due to we suffered the 
> region server hung issue by now the root cause is still not clear. so this 
> tool can help operation team to detect hung region server if any.
> *example*
> 1. the tool docs
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help
> Usage: [opts] [regionServerName 1 [regionServrName 2...]]
>  regionServerName - FQDN serverName, can use linux command:hostname -f to 
> check your serverName
>  where [-opts] are:
>-help Show this help and exit.
>-eUse regionServerName as regular expression
>   which means the regionServerName is regular expression pattern
>-f  stop whole program if first error occurs, default is true
>-t  timeout for a check, default is 60 (milisecs)
>-daemonContinuous check at defined intervals.
>-interval   Interval between checks (sec)
> 2. Will send a request to each regionserver in a HBase cluster
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary
> 3. Will send a request to a regionserver by given name
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname
> 4. Will send a request to regionserver(s) by given regular-expression
> /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e 
> rs1.domainname.pattern
> // another example
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e 
> tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org
> 5. Will send a request to a regionserver and also set a timeout limit for 
> this test
> // query regionserver:rs1.domainname with timeout limit 10sec
> // -f false, means that will not exit this program even test failed
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 
> rs1.domainname
> // echo "1" if timeout
> echo "$?"
> 6. Will run as daemon mode, which means it will send request to each 
> regionserver periodically
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9523) Audit of hbase-common @InterfaceAudience.Public apis.

2013-09-14 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-9523:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Audit of hbase-common @InterfaceAudience.Public apis.
> -
>
> Key: HBASE-9523
> URL: https://issues.apache.org/jira/browse/HBASE-9523
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.95.2
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9523.patch, hbase-9523.v2.patch
>
>
> Do an audit of all public classes to make suare we are only publicly exposing 
> what must be exposed.   
> This was done by comparing the Public only version of the javadoc generated 
> by HBASE-9517 to a local javadoc for the hbase-common module (cd 
> hbase-common; mvn javadoc:javadoc).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-9529) Audit of hbase-client @InterfaceAudience.Public apis

2013-09-14 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh reassigned HBASE-9529:
-

Assignee: Jonathan Hsieh

> Audit of hbase-client @InterfaceAudience.Public apis
> 
>
> Key: HBASE-9529
> URL: https://issues.apache.org/jira/browse/HBASE-9529
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9529.patch
>
>
> Similar to HBASE-9523, let's do an audit of the hbase-client public api.  
> This is easier to do now that the we can publish only the public api javadoc 
> http://hbase.apache.org/apidocs/  (notice it only has Public apis now!)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9529) Audit of hbase-client @InterfaceAudience.Public apis

2013-09-14 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-9529:
--

Status: Patch Available  (was: Open)

> Audit of hbase-client @InterfaceAudience.Public apis
> 
>
> Key: HBASE-9529
> URL: https://issues.apache.org/jira/browse/HBASE-9529
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9529.patch
>
>
> Similar to HBASE-9523, let's do an audit of the hbase-client public api.  
> This is easier to do now that the we can publish only the public api javadoc 
> http://hbase.apache.org/apidocs/  (notice it only has Public apis now!)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7525) A canary monitoring program specifically for regionserver

2013-09-14 Thread takeshi.miao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

takeshi.miao updated HBASE-7525:


Attachment: HBASE-7525-0.95-v7.patch
HBASE-7525-trunk-v3.patch

> A canary monitoring program specifically for regionserver
> -
>
> Key: HBASE-7525
> URL: https://issues.apache.org/jira/browse/HBASE-7525
> Project: HBase
>  Issue Type: New Feature
>  Components: monitoring
>Affects Versions: 0.94.0
>Reporter: takeshi.miao
>Priority: Critical
> Fix For: 0.98.0
>
> Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, 
> HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, 
> HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, 
> HBASE-7525-trunk-v3.patch, HBASE-7525-v0.patch, RegionServerCanary.java
>
>
> *Motivation*
> This ticket is to provide a canary monitoring tool specifically for 
> HRegionserver, details as follows
> 1. This tool is required by operation team due to they thought that the 
> canary for each region of a HBase is too many for them, so I implemented this 
> coarse-granular one based on the original o.a.h.h.tool.Canary for them
> 2. And this tool is implemented by multi-threading, which means the each Get 
> request sent by a thread. the reason I use this way is due to we suffered the 
> region server hung issue by now the root cause is still not clear. so this 
> tool can help operation team to detect hung region server if any.
> *example*
> 1. the tool docs
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help
> Usage: [opts] [regionServerName 1 [regionServrName 2...]]
>  regionServerName - FQDN serverName, can use linux command:hostname -f to 
> check your serverName
>  where [-opts] are:
>-help Show this help and exit.
>-eUse regionServerName as regular expression
>   which means the regionServerName is regular expression pattern
>-f  stop whole program if first error occurs, default is true
>-t  timeout for a check, default is 60 (milisecs)
>-daemonContinuous check at defined intervals.
>-interval   Interval between checks (sec)
> 2. Will send a request to each regionserver in a HBase cluster
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary
> 3. Will send a request to a regionserver by given name
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname
> 4. Will send a request to regionserver(s) by given regular-expression
> /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e 
> rs1.domainname.pattern
> // another example
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e 
> tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org
> 5. Will send a request to a regionserver and also set a timeout limit for 
> this test
> // query regionserver:rs1.domainname with timeout limit 10sec
> // -f false, means that will not exit this program even test failed
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 
> rs1.domainname
> // echo "1" if timeout
> echo "$?"
> 6. Will run as daemon mode, which means it will send request to each 
> regionserver periodically
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9529) Audit of hbase-client @InterfaceAudience.Public apis

2013-09-14 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-9529:
--

Attachment: hbase-9529.patch

should have the changes mentioned.  build java doc locally and it looks right.

> Audit of hbase-client @InterfaceAudience.Public apis
> 
>
> Key: HBASE-9529
> URL: https://issues.apache.org/jira/browse/HBASE-9529
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9529.patch
>
>
> Similar to HBASE-9523, let's do an audit of the hbase-client public api.  
> This is easier to do now that the we can publish only the public api javadoc 
> http://hbase.apache.org/apidocs/  (notice it only has Public apis now!)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9523) Audit of hbase-common @InterfaceAudience.Public apis.

2013-09-14 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-9523:
--

Attachment: hbase-9523.v2.patch

Missed one -- Classes.java

> Audit of hbase-common @InterfaceAudience.Public apis.
> -
>
> Key: HBASE-9523
> URL: https://issues.apache.org/jira/browse/HBASE-9523
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.95.2
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9523.patch, hbase-9523.v2.patch
>
>
> Do an audit of all public classes to make suare we are only publicly exposing 
> what must be exposed.   
> This was done by comparing the Public only version of the javadoc generated 
> by HBASE-9517 to a local javadoc for the hbase-common module (cd 
> hbase-common; mvn javadoc:javadoc).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-9536) Fix minor javadoc warnings

2013-09-14 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-9536.
--

Resolution: Fixed

Committed to trunk and 0.96.  Resolving.

> Fix minor javadoc warnings
> --
>
> Key: HBASE-9536
> URL: https://issues.apache.org/jira/browse/HBASE-9536
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Reporter: stack
>Assignee: stack
> Fix For: 0.98.0
>
> Attachments: 9536.096.txt, 9536.txt
>
>
> I applied the trunk patch.  Let me check 0.96 for warnings too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9536) Fix minor javadoc warnings

2013-09-14 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9536:
-

Attachment: 9536.096.txt

What I committed to 0.96.  Two fixes.

> Fix minor javadoc warnings
> --
>
> Key: HBASE-9536
> URL: https://issues.apache.org/jira/browse/HBASE-9536
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Reporter: stack
>Assignee: stack
> Fix For: 0.98.0
>
> Attachments: 9536.096.txt, 9536.txt
>
>
> I applied the trunk patch.  Let me check 0.96 for warnings too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9512) Regions can't get out InRecovery state sometimes when turn off distributeLogReplay and restart a cluster

2013-09-14 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9512:
-

Fix Version/s: 0.96.0
   0.98.0

> Regions can't get out InRecovery state sometimes when turn off 
> distributeLogReplay and restart a cluster
> 
>
> Key: HBASE-9512
> URL: https://issues.apache.org/jira/browse/HBASE-9512
> Project: HBase
>  Issue Type: Bug
>  Components: MTTR
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9512.patch
>
>
> We have a test suite where we turn on distributed log replay for one of test 
> case and turn it off for some test cases. Some times when a cluster isn't 
> clean shut down(there are some RS recovery work left) and cluster restarts, 
> there will be regions in recovery state which is left from previous shut down 
> after the cluster is restarted. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9512) Regions can't get out InRecovery state sometimes when turn off distributeLogReplay and restart a cluster

2013-09-14 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767703#comment-13767703
 ] 

Jeffrey Zhong commented on HBASE-9512:
--

Thanks Ted for reviews! I've integrated the change into 0.96 and trunk branch.

> Regions can't get out InRecovery state sometimes when turn off 
> distributeLogReplay and restart a cluster
> 
>
> Key: HBASE-9512
> URL: https://issues.apache.org/jira/browse/HBASE-9512
> Project: HBase
>  Issue Type: Bug
>  Components: MTTR
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Attachments: hbase-9512.patch
>
>
> We have a test suite where we turn on distributed log replay for one of test 
> case and turn it off for some test cases. Some times when a cluster isn't 
> clean shut down(there are some RS recovery work left) and cluster restarts, 
> there will be regions in recovery state which is left from previous shut down 
> after the cluster is restarted. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9512) Regions can't get out InRecovery state sometimes when turn off distributeLogReplay and restart a cluster

2013-09-14 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9512:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Regions can't get out InRecovery state sometimes when turn off 
> distributeLogReplay and restart a cluster
> 
>
> Key: HBASE-9512
> URL: https://issues.apache.org/jira/browse/HBASE-9512
> Project: HBase
>  Issue Type: Bug
>  Components: MTTR
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Attachments: hbase-9512.patch
>
>
> We have a test suite where we turn on distributed log replay for one of test 
> case and turn it off for some test cases. Some times when a cluster isn't 
> clean shut down(there are some RS recovery work left) and cluster restarts, 
> there will be regions in recovery state which is left from previous shut down 
> after the cluster is restarted. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9461) Some doc and cleanup in RPCServer

2013-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767697#comment-13767697
 ] 

Hudson commented on HBASE-9461:
---

SUCCESS: Integrated in HBase-TRUNK #4508 (See 
[https://builds.apache.org/job/HBase-TRUNK/4508/])
HBASE-9461 Some doc and cleanup in RPCServer (stack: rev 1523386)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/FifoRpcScheduler.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RequestContext.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcScheduler.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcSchedulerContext.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServerInterface.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestCallRunner.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestIPC.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestSimpleRpcScheduler.java


> Some doc and cleanup in RPCServer
> -
>
> Key: HBASE-9461
> URL: https://issues.apache.org/jira/browse/HBASE-9461
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Fix For: 0.98.0
>
> Attachments: 9461.txt, 9461v2.txt, ipc2.ucls
>
>
> RPC is a dog to follow.  I want to do buffer pooling for reading requests but 
> its tough drawing the diagram of who is doing what when.  HBASE-8884 seems to 
> have made it more involved still.  This issue is about doing a bit of 
> untangling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8143) HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767699#comment-13767699
 ] 

stack commented on HBASE-8143:
--

Just saying we will have to balance this sizing amongst the different needs.  
4k or 8k might work for the local block reader but might not be appropriate for 
something like HBASE-9535 (or any other feature we'd want to do off-heap).

> HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM 
> --
>
> Key: HBASE-8143
> URL: https://issues.apache.org/jira/browse/HBASE-8143
> Project: HBase
>  Issue Type: Bug
>  Components: hadoop2
>Affects Versions: 0.98.0, 0.94.7, 0.95.0
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.0, 0.94.13
>
> Attachments: OpenFileTest.java
>
>
> We've run into an issue with HBase 0.94 on Hadoop2, with SSR turned on that 
> the memory usage of the HBase process grows to 7g, on an -Xmx3g, after some 
> time, this causes OOM for the RSs. 
> Upon further investigation, I've found out that we end up with 200 regions, 
> each having 3-4 store files open. Under hadoop2 SSR, BlockReaderLocal 
> allocates DirectBuffers, which is unlike HDFS 1 where there is no direct 
> buffer allocation. 
> It seems that there is no guards against the memory used by local buffers in 
> hdfs 2, and having a large number of open files causes multiple GB of memory 
> to be consumed from the RS process. 
> This issue is to further investigate what is going on. Whether we can limit 
> the memory usage in HDFS, or HBase, and/or document the setup. 
> Possible mitigation scenarios are: 
>  - Turn off SSR for Hadoop 2
>  - Ensure that there is enough unallocated memory for the RS based on 
> expected # of store files
>  - Ensure that there is lower number of regions per region server (hence 
> number of open files)
> Stack trace:
> {code}
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> IntegrationTestLoadAndVerify,yC^P\xD7\x945\xD4,1363388517630.24655343d8d356ef708732f34cfe8946.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1560)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1439)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1380)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:449)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:215)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:632)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:97)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> at 
> org.apache.hadoop.hdfs.util.DirectBufferPool.getBuffer(DirectBufferPool.java:70)
> at 
> org.apache.hadoop.hdfs.BlockReaderLocal.(BlockReaderLocal.java:315)
> at 
> org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderLocal.java:208)
> at 
> org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java:790)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:888)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:455)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:645)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:689)
> at java.io.DataInputStream.readFully(DataInputStream.java:178)
> at 
> org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:312)
> at 
> org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:543)
> at 
> org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:589)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.(StoreFile.java:1261)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:512)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:603)
> at 
> org.apache.hadoop.hbase.regionserver.Store.validateStoreFile(Store.java:1568)
> at 
> org.apache.hadoop.hbase.regionserver.Store.commitFile(Store.java:845)
> at 
> org.apache.hadoop.hbase.regionserver.Store.access$500(Store.java:109)
> at 
> org.apache.hadoop.hbase.regi

[jira] [Commented] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767696#comment-13767696
 ] 

Lars Hofhansl commented on HBASE-9480:
--

[~jxiang], is this an issue in 0.94 as well?

> Regions are unexpectedly made offline in certain failure conditions
> ---
>
> Key: HBASE-9480
> URL: https://issues.apache.org/jira/browse/HBASE-9480
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.0
>
> Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
> trunk-9480_v1.2.patch, trunk-9480_v2.patch
>
>
> Came across this issue (HBASE-9338 test):
> 1. Client issues a request to move a region from ServerA to ServerB
> 2. ServerA is compacting that region and doesn't close region immediately. In 
> fact, it takes a while to complete the request.
> 3. The master in the meantime, sends another close request.
> 4. ServerA sends it a NotServingRegionException
> 5. Master handles the exception, deletes the znode, and invokes regionOffline 
> for the said region.
> 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
> deleted.
> The region is permanently offline.
> There are potentially other situations where when a RegionServer is offline 
> and the client asks for a region move off from that server, the master makes 
> the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9536) Fix minor javadoc warnings

2013-09-14 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9536:
-

  Component/s: documentation
  Description: I applied the trunk patch.  Let me check 0.96 for warnings 
too.
Fix Version/s: 0.98.0
 Assignee: stack

> Fix minor javadoc warnings
> --
>
> Key: HBASE-9536
> URL: https://issues.apache.org/jira/browse/HBASE-9536
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Reporter: stack
>Assignee: stack
> Fix For: 0.98.0
>
> Attachments: 9536.txt
>
>
> I applied the trunk patch.  Let me check 0.96 for warnings too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9536) Fix minor javadoc warnings

2013-09-14 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9536:
-

Attachment: 9536.txt

A few warnings on trunk.

> Fix minor javadoc warnings
> --
>
> Key: HBASE-9536
> URL: https://issues.apache.org/jira/browse/HBASE-9536
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
> Attachments: 9536.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8143) HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM

2013-09-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767693#comment-13767693
 ] 

Lars Hofhansl commented on HBASE-8143:
--

With a reasonable buffer size it should be OK. 1mb is clearly counter 
productive.
It's on my (long) list of things to test with a really smaller buffer size 
(like 4 or 8k) and see the impact of that.

At work we have this set to 128k and that has been working well.

> HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM 
> --
>
> Key: HBASE-8143
> URL: https://issues.apache.org/jira/browse/HBASE-8143
> Project: HBase
>  Issue Type: Bug
>  Components: hadoop2
>Affects Versions: 0.98.0, 0.94.7, 0.95.0
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.0, 0.94.13
>
> Attachments: OpenFileTest.java
>
>
> We've run into an issue with HBase 0.94 on Hadoop2, with SSR turned on that 
> the memory usage of the HBase process grows to 7g, on an -Xmx3g, after some 
> time, this causes OOM for the RSs. 
> Upon further investigation, I've found out that we end up with 200 regions, 
> each having 3-4 store files open. Under hadoop2 SSR, BlockReaderLocal 
> allocates DirectBuffers, which is unlike HDFS 1 where there is no direct 
> buffer allocation. 
> It seems that there is no guards against the memory used by local buffers in 
> hdfs 2, and having a large number of open files causes multiple GB of memory 
> to be consumed from the RS process. 
> This issue is to further investigate what is going on. Whether we can limit 
> the memory usage in HDFS, or HBase, and/or document the setup. 
> Possible mitigation scenarios are: 
>  - Turn off SSR for Hadoop 2
>  - Ensure that there is enough unallocated memory for the RS based on 
> expected # of store files
>  - Ensure that there is lower number of regions per region server (hence 
> number of open files)
> Stack trace:
> {code}
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> IntegrationTestLoadAndVerify,yC^P\xD7\x945\xD4,1363388517630.24655343d8d356ef708732f34cfe8946.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1560)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1439)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1380)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:449)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:215)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:632)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:97)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> at 
> org.apache.hadoop.hdfs.util.DirectBufferPool.getBuffer(DirectBufferPool.java:70)
> at 
> org.apache.hadoop.hdfs.BlockReaderLocal.(BlockReaderLocal.java:315)
> at 
> org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderLocal.java:208)
> at 
> org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java:790)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:888)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:455)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:645)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:689)
> at java.io.DataInputStream.readFully(DataInputStream.java:178)
> at 
> org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:312)
> at 
> org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:543)
> at 
> org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:589)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.(StoreFile.java:1261)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:512)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:603)
> at 
> org.apache.hadoop.hbase.regionserver.Store.validateStoreFile(Store.java:1568)
> at 
> org.apache.hadoop.hbase.regionserver.Store.commitFile(Store.java:845)
> at 
> org.apache.hadoop.hbase.regionserver.Store.access$500(Store.java:

[jira] [Created] (HBASE-9536) Fix minor javadoc warnings

2013-09-14 Thread stack (JIRA)
stack created HBASE-9536:


 Summary: Fix minor javadoc warnings
 Key: HBASE-9536
 URL: https://issues.apache.org/jira/browse/HBASE-9536
 Project: HBase
  Issue Type: Bug
Reporter: stack




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8143) HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767692#comment-13767692
 ] 

stack commented on HBASE-8143:
--

There is no way of getting a direct byte buffer w/o it being counted against 
the commit charge for the process?  Its a pity given we are just doing 
read-only.

All of this off-heap allocation will impinge in our being able to use off heap 
for other purposes.

> HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM 
> --
>
> Key: HBASE-8143
> URL: https://issues.apache.org/jira/browse/HBASE-8143
> Project: HBase
>  Issue Type: Bug
>  Components: hadoop2
>Affects Versions: 0.98.0, 0.94.7, 0.95.0
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.0, 0.94.13
>
> Attachments: OpenFileTest.java
>
>
> We've run into an issue with HBase 0.94 on Hadoop2, with SSR turned on that 
> the memory usage of the HBase process grows to 7g, on an -Xmx3g, after some 
> time, this causes OOM for the RSs. 
> Upon further investigation, I've found out that we end up with 200 regions, 
> each having 3-4 store files open. Under hadoop2 SSR, BlockReaderLocal 
> allocates DirectBuffers, which is unlike HDFS 1 where there is no direct 
> buffer allocation. 
> It seems that there is no guards against the memory used by local buffers in 
> hdfs 2, and having a large number of open files causes multiple GB of memory 
> to be consumed from the RS process. 
> This issue is to further investigate what is going on. Whether we can limit 
> the memory usage in HDFS, or HBase, and/or document the setup. 
> Possible mitigation scenarios are: 
>  - Turn off SSR for Hadoop 2
>  - Ensure that there is enough unallocated memory for the RS based on 
> expected # of store files
>  - Ensure that there is lower number of regions per region server (hence 
> number of open files)
> Stack trace:
> {code}
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> IntegrationTestLoadAndVerify,yC^P\xD7\x945\xD4,1363388517630.24655343d8d356ef708732f34cfe8946.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1560)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1439)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1380)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:449)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:215)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:632)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:97)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> at 
> org.apache.hadoop.hdfs.util.DirectBufferPool.getBuffer(DirectBufferPool.java:70)
> at 
> org.apache.hadoop.hdfs.BlockReaderLocal.(BlockReaderLocal.java:315)
> at 
> org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderLocal.java:208)
> at 
> org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java:790)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:888)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:455)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:645)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:689)
> at java.io.DataInputStream.readFully(DataInputStream.java:178)
> at 
> org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:312)
> at 
> org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:543)
> at 
> org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:589)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.(StoreFile.java:1261)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:512)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:603)
> at 
> org.apache.hadoop.hbase.regionserver.Store.validateStoreFile(Store.java:1568)
> at 
> org.apache.hadoop.hbase.regionserver.Store.commitFile(Store.java:845)
> at 
> org.apache.hadoop.hbase.regionserver.Store.access$500(Store.java:109)
> at 
> o

[jira] [Comment Edited] (HBASE-9529) Audit of hbase-client @InterfaceAudience.Public apis

2013-09-14 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767687#comment-13767687
 ] 

Jonathan Hsieh edited comment on HBASE-9529 at 9/15/13 5:23 AM:


A few updates
- JD says ReplicationAdmin is the only thing that should be public in the 
replication client packages.
- Make ClientScannre#getScannerCallable private (it returns a ScannerCallable 
which should be private)



No markings mean currently public and should remain public. 


Will make all of these private:
Action, ConnectionUtils, HConnectable, HTableUtil, MultiAction, MultiResponse, 
ScanMetrics, ColumnInterpreter

These are currently public and will remain public:
Mutation, Operation, OperationWithAttributes

RegionState is exposed by ClusterStatus in {{public 
Map 
getRegionsInTransition()}}.  We could open RegionState or hide just the 
ClusterStatus#getRegionsInTransition method.  I lean towards keeping it 
exposed. 


  was (Author: jmhsieh):

A few updates
- JD says ReplicationAdmin is the only thing that should be public
- Make ClientScannre#getScannerCallable private (it returns a ScannerCallable 
which should be private)



No markings mean currently public and should remain public. 


Will make all of these private:
Action, ConnectionUtils, HConnectable, HTableUtil, MultiAction, MultiResponse, 
ScanMetrics, ColumnInterpreter

These are currently public and will remain public:
Mutation, Operation, OperationWithAttributes

RegionState is exposed by ClusterStatus in {{public 
Map 
getRegionsInTransition()}}.  We could open RegionState or hide just the 
ClusterStatus#getRegionsInTransition method.  I lean towards keeping it 
exposed. 

  
> Audit of hbase-client @InterfaceAudience.Public apis
> 
>
> Key: HBASE-9529
> URL: https://issues.apache.org/jira/browse/HBASE-9529
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
>
> Similar to HBASE-9523, let's do an audit of the hbase-client public api.  
> This is easier to do now that the we can publish only the public api javadoc 
> http://hbase.apache.org/apidocs/  (notice it only has Public apis now!)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8810) Bring in code constants in line with default xml's

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767690#comment-13767690
 ] 

stack commented on HBASE-8810:
--

Purging the unused would be coolio and fixing up the mismatches would help too.

> Bring in code constants in line with default xml's
> --
>
> Key: HBASE-8810
> URL: https://issues.apache.org/jira/browse/HBASE-8810
> Project: HBase
>  Issue Type: Bug
>  Components: Usability
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: 8810.txt, 8810v2.txt, 
> hbase-default_to_java_constants.xsl, HBaseDefaultXMLConstants.java
>
>
> After the defaults were changed in the xml some constants were left the same.
> DEFAULT_HBASE_CLIENT_PAUSE for example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6581) Build with hadoop.profile=3.0

2013-09-14 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6581:
-

 Priority: Critical  (was: Major)
Fix Version/s: 0.98.0

Making critical 0.98.  Could come into 0.96 too.  Just needs someone taking it 
for a run on cluster making sure it basically works.

> Build with hadoop.profile=3.0
> -
>
> Key: HBASE-6581
> URL: https://issues.apache.org/jira/browse/HBASE-6581
> Project: HBase
>  Issue Type: Bug
>Reporter: Eric Charles
>Assignee: Eric Charles
>Priority: Critical
> Fix For: 0.98.0
>
> Attachments: HBASE-6581-1.patch, HBASE-6581-20130821.patch, 
> HBASE-6581-2.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, 
> HBASE-6581-5.patch, HBASE-6581.diff, HBASE-6581.diff
>
>
> Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to 
> change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT 
> instead of 3.0.0-SNAPSHOT in hbase-common).
> I can provide a patch that would move most of hadoop dependencies in their 
> respective profiles and will define the correct hadoop deps in the 3.0 
> profile.
> Please tell me if that's ok to go this way.
> Thx, Eric
> [1]
> $ mvn clean install -Dhadoop.profile=3.0
> [INFO] Scanning for projects...
> [ERROR] The build could not read 3 projects -> [Help 1]
> [ERROR]   
> [ERROR]   The project org.apache.hbase:hbase-server:0.95-SNAPSHOT 
> (/d/hbase.svn/hbase-server/pom.xml) has 3 errors
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21
> [ERROR]   
> [ERROR]   The project org.apache.hbase:hbase-common:0.95-SNAPSHOT 
> (/d/hbase.svn/hbase-common/pom.xml) has 3 errors
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21
> [ERROR]   
> [ERROR]   The project org.apache.hbase:hbase-it:0.95-SNAPSHOT 
> (/d/hbase.svn/hbase-it/pom.xml) has 3 errors
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21
> [ERROR] 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767688#comment-13767688
 ] 

stack commented on HBASE-7525:
--

bq. Yes, it's default behavior is just align with the old one, does the all 
regions monitoring

Ok.  The original behavior is a little 'surprising' but if it has been this way 
up to this, it is fair-enough changing it.

bq. It is the internal DEBUG msg, for counting how many loop of this monitor 
instance did; It can help user to observe the monitor instance's behavior 
whether as expected

I did not understand this log message.  I did not seem to ask for more than one 
loop so seeing more than one w/o asking for it is unexpected.

bq. The option '-regionserver' (regionserver mode) is exclusive with the 
default mode (region mode), which means user can only choose to use default 
mode or regionserver mode either

Understood.  We should fix the usage to make it more plain it exclusive w/ 
table ops:

Usage: ./bin/hbase Canary [opts] [table1 [table2]...] | [regionserver1 
[regionserver2]..]

... or something like that.  As is it would seem to mix the exlusive args.

Your suggestion would allow:

Canary table1 regionserver2 ,etc.

Suggest that in the usage you are more clear that it is table OR regionserver 
ops.

> A canary monitoring program specifically for regionserver
> -
>
> Key: HBASE-7525
> URL: https://issues.apache.org/jira/browse/HBASE-7525
> Project: HBase
>  Issue Type: New Feature
>  Components: monitoring
>Affects Versions: 0.94.0
>Reporter: takeshi.miao
>Priority: Critical
> Fix For: 0.98.0
>
> Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, 
> HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, 
> HBASE-7525-trunk-v2.patch, HBASE-7525-v0.patch, RegionServerCanary.java
>
>
> *Motivation*
> This ticket is to provide a canary monitoring tool specifically for 
> HRegionserver, details as follows
> 1. This tool is required by operation team due to they thought that the 
> canary for each region of a HBase is too many for them, so I implemented this 
> coarse-granular one based on the original o.a.h.h.tool.Canary for them
> 2. And this tool is implemented by multi-threading, which means the each Get 
> request sent by a thread. the reason I use this way is due to we suffered the 
> region server hung issue by now the root cause is still not clear. so this 
> tool can help operation team to detect hung region server if any.
> *example*
> 1. the tool docs
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help
> Usage: [opts] [regionServerName 1 [regionServrName 2...]]
>  regionServerName - FQDN serverName, can use linux command:hostname -f to 
> check your serverName
>  where [-opts] are:
>-help Show this help and exit.
>-eUse regionServerName as regular expression
>   which means the regionServerName is regular expression pattern
>-f  stop whole program if first error occurs, default is true
>-t  timeout for a check, default is 60 (milisecs)
>-daemonContinuous check at defined intervals.
>-interval   Interval between checks (sec)
> 2. Will send a request to each regionserver in a HBase cluster
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary
> 3. Will send a request to a regionserver by given name
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname
> 4. Will send a request to regionserver(s) by given regular-expression
> /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e 
> rs1.domainname.pattern
> // another example
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e 
> tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org
> 5. Will send a request to a regionserver and also set a timeout limit for 
> this test
> // query regionserver:rs1.domainname with timeout limit 10sec
> // -f false, means that will not exit this program even test failed
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 
> rs1.domainname
> // echo "1" if timeout
> echo "$?"
> 6. Will run as daemon mode, which means it will send request to each 
> regionserver periodically
> ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9529) Audit of hbase-client @InterfaceAudience.Public apis

2013-09-14 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767687#comment-13767687
 ] 

Jonathan Hsieh commented on HBASE-9529:
---


A few updates
- JD says ReplicationAdmin is the only thing that should be public
- Make ClientScannre#getScannerCallable private (it returns a ScannerCallable 
which should be private)



No markings mean currently public and should remain public. 


Will make all of these private:
Action, ConnectionUtils, HConnectable, HTableUtil, MultiAction, MultiResponse, 
ScanMetrics, ColumnInterpreter

These are currently public and will remain public:
Mutation, Operation, OperationWithAttributes

RegionState is exposed by ClusterStatus in {{public 
Map 
getRegionsInTransition()}}.  We could open RegionState or hide just the 
ClusterStatus#getRegionsInTransition method.  I lean towards keeping it 
exposed. 


> Audit of hbase-client @InterfaceAudience.Public apis
> 
>
> Key: HBASE-9529
> URL: https://issues.apache.org/jira/browse/HBASE-9529
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
>
> Similar to HBASE-9523, let's do an audit of the hbase-client public api.  
> This is easier to do now that the we can publish only the public api javadoc 
> http://hbase.apache.org/apidocs/  (notice it only has Public apis now!)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-9535) Try a pool of direct byte buffers handling incoming ipc requests

2013-09-14 Thread stack (JIRA)
stack created HBASE-9535:


 Summary: Try a pool of direct byte buffers handling incoming ipc 
requests
 Key: HBASE-9535
 URL: https://issues.apache.org/jira/browse/HBASE-9535
 Project: HBase
  Issue Type: Brainstorming
Reporter: stack
Assignee: stack


ipc takes in a query by allocating a ByteBuffer of the size of the request and 
then reading off the socket into this on-heap BB.

Experiment with keeping a pool of BBs so we have some buffer reuse to cut on 
garbage generated.  Could checkout from pool in RpcServer#Reader.  Could check 
back into the pool when Handler is done just before it queues the response on 
the Responder's queue.  We should be good since, at least for now, kvs get 
copied up into MSLAB (not references) when data gets stuffed into MemStore; 
this should make it so no references left over when we check the BB back into 
the pool for use next time around.

If on-heap BBs work, we could then try direct BBs (Allocation of DBBs takes 
time so if already allocated, should be good.  GC of DBBs is a pain but if in a 
pool, we shouldn't be wanting this to happen).  The copy from socket to the DBB 
will be off-heap (should be fast).

Could start w/ the HDFS DirectBufferPool.  It is unbounded and keeps items by 
size (we might want to bypass the pool if an object is > size N).

DBBs for this task would contend w/ offheap BBs used in BlockReadLocal when 
short-circuit reading.  It'd be a bummer if we had to allocate big objects 
on-heap.  Would still be an improvement.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9425) Starting a LocalHBaseCluster when 2181 is occupied results in "Too many open files"

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767665#comment-13767665
 ] 

stack commented on HBASE-9425:
--

[~jdcryans] Now e just fail if something on 2181?  I suppose that better than 
current situation.

> Starting a LocalHBaseCluster when 2181 is occupied results in "Too many open 
> files"
> ---
>
> Key: HBASE-9425
> URL: https://issues.apache.org/jira/browse/HBASE-9425
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.98.0, 0.96.0
>
> Attachments: HBASE-9425.patch
>
>
> This bug was introduced via HBASE-6677 "Random ZooKeeper port in test can 
> overrun max port".
> If 2181 is occupied but you start a LocalHBaseCluster (let's say you untar 
> hbase and start it right away) you'll get this:
> {noformat}
> 13/09/03 10:38:13 INFO server.NIOServerCnxnFactory: binding to port 
> 0.0.0.0/0.0.0.0:2181
> 13/09/03 10:38:13 INFO server.NIOServerCnxnFactory: binding to port 
> 0.0.0.0/0.0.0.0:2181
> 13/09/03 10:38:13 INFO server.NIOServerCnxnFactory: binding to port 
> 0.0.0.0/0.0.0.0:2181
> ...
> 13/09/03 10:38:44 INFO server.NIOServerCnxnFactory: binding to port 
> 0.0.0.0/0.0.0.0:2181
> 13/09/03 10:38:44 INFO server.NIOServerCnxnFactory: binding to port 
> 0.0.0.0/0.0.0.0:2181
> 13/09/03 10:38:44 ERROR master.HMasterCommandLine: Master exiting
> java.io.IOException: Too many open files
> at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
> at sun.nio.ch.EPollArrayWrapper.(EPollArrayWrapper.java:87)
> at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:68)
> at 
> sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
> at java.nio.channels.Selector.open(Selector.java:227)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.(NIOServerCnxnFactory.java:61)
> at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.startup(MiniZooKeeperCluster.java:165)
> at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.startup(MiniZooKeeperCluster.java:131)
> at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:164)
> at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:134)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at 
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:78)
> at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2812)
> {noformat}
> The reason is that MiniZookeeperCluster.selectClientPort returns 2181 if 
> defaultClientPort is greater than 0, which it always is when starting a 
> LocalHBaseCluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9473) Change UI to list 'system tables' rather than 'catalog tables'.

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767664#comment-13767664
 ] 

stack commented on HBASE-9473:
--

[~jdcryans] You going to let me commit this narrow-scoped UI-only patch or you 
want me to do the fix of catalog|system table throughout code base as part of 
this issue (you are a tough taskmaster)

> Change UI to list 'system tables' rather than 'catalog tables'.
> ---
>
> Key: HBASE-9473
> URL: https://issues.apache.org/jira/browse/HBASE-9473
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: stack
>Assignee: stack
> Fix For: 0.96.0
>
> Attachments: 9473.txt
>
>
> Minor, one-line, bit of polishing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9529) Audit of hbase-client @InterfaceAudience.Public apis

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767663#comment-13767663
 ] 

stack commented on HBASE-9529:
--

Hmm... in '@Public org.apache.hadoop.hbase', if no marking of what follows -- 
e.g. HCD -- means 'public', then +1.  If not, I think HCD, exceptions, etc. 
should be public.

Action is internal.
ConnectionUtils is internal.
Ditto HConnectable
HTableUtil shoudl be private
MultiAction should be private

Below should be private too?  Internal.

MultiResponse

Below are superclasses of Put, etc., so probably public?

Mutation
Operation
OperationWithAttributes


ScanMetrics seems internal
Ditto ColumnInterpreter

RegionState comes out in API?  If not, should be private I'd say.







> Audit of hbase-client @InterfaceAudience.Public apis
> 
>
> Key: HBASE-9529
> URL: https://issues.apache.org/jira/browse/HBASE-9529
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
>
> Similar to HBASE-9523, let's do an audit of the hbase-client public api.  
> This is easier to do now that the we can publish only the public api javadoc 
> http://hbase.apache.org/apidocs/  (notice it only has Public apis now!)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9519) fix NPE in EncodedScannerV2.getFirstKeyInBlock()

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767657#comment-13767657
 ] 

stack commented on HBASE-9519:
--

Javadoc warning is from elsewhere.  I don't know this code very well.  Patch 
lgtm w/ this caveat.

> fix NPE in EncodedScannerV2.getFirstKeyInBlock()
> 
>
> Key: HBASE-9519
> URL: https://issues.apache.org/jira/browse/HBASE-9519
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Affects Versions: 0.98.0, 0.96.1
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HBASE-9519.txt, HBASE-9519-v2.txt
>
>
> we observed a reproducable NPE while scanning special table under special 
> condition in our IntegratedTesting scenario, it was fixed by appling the 
> attached patch.
> org.apache.hadoop.hbase.client.ScannerCallable@67ee75a5, java.io.IOException: 
> java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1186)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1175)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2391)
> at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:456)
> at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.getFirstKeyInBlock(HFileReaderV2.java:1071)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:547)
> at 
> org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:159)
> at 
> org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:142)
> at 
> org.apache.hadoop.hbase.io.HalfStoreFileReader.getLastKey(HalfStoreFileReader.java:267)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.passesKeyRangeFilter(StoreFile.java:1543)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.shouldUseScanner(StoreFileScanner.java:375)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.selectScannersFrom(StoreScanner.java:298)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.getScannersNoCompaction(StoreScanner.java:262)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:149)
> at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2122)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:3460)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1645)
> at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1635)
> at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1610)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2377)
> ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9523) Audit of hbase-common @InterfaceAudience.Public apis.

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767655#comment-13767655
 ] 

stack commented on HBASE-9523:
--

lgtm +1

> Audit of hbase-common @InterfaceAudience.Public apis.
> -
>
> Key: HBASE-9523
> URL: https://issues.apache.org/jira/browse/HBASE-9523
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.95.2
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9523.patch
>
>
> Do an audit of all public classes to make suare we are only publicly exposing 
> what must be exposed.   
> This was done by comparing the Public only version of the javadoc generated 
> by HBASE-9517 to a local javadoc for the hbase-common module (cd 
> hbase-common; mvn javadoc:javadoc).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9461) Some doc and cleanup in RPCServer

2013-09-14 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9461:
-

   Resolution: Fixed
Fix Version/s: 0.98.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

> Some doc and cleanup in RPCServer
> -
>
> Key: HBASE-9461
> URL: https://issues.apache.org/jira/browse/HBASE-9461
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Fix For: 0.98.0
>
> Attachments: 9461.txt, 9461v2.txt, ipc2.ucls
>
>
> RPC is a dog to follow.  I want to do buffer pooling for reading requests but 
> its tough drawing the diagram of who is doing what when.  HBASE-8884 seems to 
> have made it more involved still.  This issue is about doing a bit of 
> untangling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767650#comment-13767650
 ] 

Hudson commented on HBASE-9480:
---

SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #729 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/729/])
HBASE-9480 Regions are unexpectedly made offline in certain failure conditions 
(jxiang: rev 1523303)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java


> Regions are unexpectedly made offline in certain failure conditions
> ---
>
> Key: HBASE-9480
> URL: https://issues.apache.org/jira/browse/HBASE-9480
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.0
>
> Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
> trunk-9480_v1.2.patch, trunk-9480_v2.patch
>
>
> Came across this issue (HBASE-9338 test):
> 1. Client issues a request to move a region from ServerA to ServerB
> 2. ServerA is compacting that region and doesn't close region immediately. In 
> fact, it takes a while to complete the request.
> 3. The master in the meantime, sends another close request.
> 4. ServerA sends it a NotServingRegionException
> 5. Master handles the exception, deletes the znode, and invokes regionOffline 
> for the said region.
> 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
> deleted.
> The region is permanently offline.
> There are potentially other situations where when a RegionServer is offline 
> and the client asks for a region move off from that server, the master makes 
> the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767614#comment-13767614
 ] 

Hudson commented on HBASE-9480:
---

FAILURE: Integrated in hbase-0.96-hadoop2 #27 (See 
[https://builds.apache.org/job/hbase-0.96-hadoop2/27/])
HBASE-9480 Regions are unexpectedly made offline in certain failure conditions 
(jxiang: rev 1523308)
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java


> Regions are unexpectedly made offline in certain failure conditions
> ---
>
> Key: HBASE-9480
> URL: https://issues.apache.org/jira/browse/HBASE-9480
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.0
>
> Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
> trunk-9480_v1.2.patch, trunk-9480_v2.patch
>
>
> Came across this issue (HBASE-9338 test):
> 1. Client issues a request to move a region from ServerA to ServerB
> 2. ServerA is compacting that region and doesn't close region immediately. In 
> fact, it takes a while to complete the request.
> 3. The master in the meantime, sends another close request.
> 4. ServerA sends it a NotServingRegionException
> 5. Master handles the exception, deletes the znode, and invokes regionOffline 
> for the said region.
> 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
> deleted.
> The region is permanently offline.
> There are potentially other situations where when a RegionServer is offline 
> and the client asks for a region move off from that server, the master makes 
> the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767606#comment-13767606
 ] 

Hudson commented on HBASE-9480:
---

SUCCESS: Integrated in HBase-TRUNK #4507 (See 
[https://builds.apache.org/job/HBase-TRUNK/4507/])
HBASE-9480 Regions are unexpectedly made offline in certain failure conditions 
(jxiang: rev 1523303)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java


> Regions are unexpectedly made offline in certain failure conditions
> ---
>
> Key: HBASE-9480
> URL: https://issues.apache.org/jira/browse/HBASE-9480
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.0
>
> Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
> trunk-9480_v1.2.patch, trunk-9480_v2.patch
>
>
> Came across this issue (HBASE-9338 test):
> 1. Client issues a request to move a region from ServerA to ServerB
> 2. ServerA is compacting that region and doesn't close region immediately. In 
> fact, it takes a while to complete the request.
> 3. The master in the meantime, sends another close request.
> 4. ServerA sends it a NotServingRegionException
> 5. Master handles the exception, deletes the znode, and invokes regionOffline 
> for the said region.
> 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
> deleted.
> The region is permanently offline.
> There are potentially other situations where when a RegionServer is offline 
> and the client asks for a region move off from that server, the master makes 
> the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9338) Test Big Linked List fails on Hadoop 2.1.0

2013-09-14 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767600#comment-13767600
 ] 

Elliott Clark commented on HBASE-9338:
--

[~enis] did that run have this patch in it?

{code}
13/09/13 03:39:54 INFO actions.Action: Killing region 
server:hor8n10,60020,1379043329928
13/09/13 03:39:56 INFO actions.Action: Killed region 
server:hor8n10,60020,1379043329928. Reported num of rs:8
13/09/13 03:39:56 INFO actions.Action: Sleeping for:6
13/09/13 03:40:05 INFO actions.Action: Performing action: Move random region of 
table IntegrationTestBigLinkedList
{code}

So there's only 11 seconds in between the kill and the move; even though the 
chaos monkey thread should be sleeping for 60 seconds.  After this issue the 
move should always sleep 20 seconds before moving a region.  And it shouldn't 
happen in parallel with a kill.

> Test Big Linked List fails on Hadoop 2.1.0
> --
>
> Key: HBASE-9338
> URL: https://issues.apache.org/jira/browse/HBASE-9338
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Blocker
> Fix For: 0.98.0, 0.96.0
>
> Attachments: HBASE-9338-0.patch, HBASE-9338-1.patch, 
> HBASE-9338-TESTING-2.patch, HBASE-9338-TESTING-3.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767588#comment-13767588
 ] 

Hudson commented on HBASE-9480:
---

FAILURE: Integrated in hbase-0.96 #48 (See 
[https://builds.apache.org/job/hbase-0.96/48/])
HBASE-9480 Regions are unexpectedly made offline in certain failure conditions 
(jxiang: rev 1523308)
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java


> Regions are unexpectedly made offline in certain failure conditions
> ---
>
> Key: HBASE-9480
> URL: https://issues.apache.org/jira/browse/HBASE-9480
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.0
>
> Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
> trunk-9480_v1.2.patch, trunk-9480_v2.patch
>
>
> Came across this issue (HBASE-9338 test):
> 1. Client issues a request to move a region from ServerA to ServerB
> 2. ServerA is compacting that region and doesn't close region immediately. In 
> fact, it takes a while to complete the request.
> 3. The master in the meantime, sends another close request.
> 4. ServerA sends it a NotServingRegionException
> 5. Master handles the exception, deletes the znode, and invokes regionOffline 
> for the said region.
> 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
> deleted.
> The region is permanently offline.
> There are potentially other situations where when a RegionServer is offline 
> and the client asks for a region move off from that server, the master makes 
> the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9523) Audit of hbase-common @InterfaceAudience.Public apis.

2013-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767582#comment-13767582
 ] 

Hadoop QA commented on HBASE-9523:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12603201/hbase-9523.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//console

This message is automatically generated.

> Audit of hbase-common @InterfaceAudience.Public apis.
> -
>
> Key: HBASE-9523
> URL: https://issues.apache.org/jira/browse/HBASE-9523
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.95.2
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9523.patch
>
>
> Do an audit of all public classes to make suare we are only publicly exposing 
> what must be exposed.   
> This was done by comparing the Public only version of the javadoc generated 
> by HBASE-9517 to a local javadoc for the hbase-common module (cd 
> hbase-common; mvn javadoc:javadoc).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9457) Master could fail start if region server with system table is down

2013-09-14 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-9457:
---

Attachment: trunk-9457_v2.2.patch

Attached v2.2, rebased to trunk latest.

> Master could fail start if region server with system table is down
> --
>
> Key: HBASE-9457
> URL: https://issues.apache.org/jira/browse/HBASE-9457
> Project: HBase
>  Issue Type: Bug
>  Components: master, Region Assignment
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Critical
> Attachments: trunk-9457.patch, trunk-9457_v2.1.patch, 
> trunk-9457_v2.2.patch, trunk-9457_v2.patch
>
>
> In the region server holding the system table is killed while master is 
> starting, master will hang there waiting for system table to be assigned 
> which won't happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-14 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-9480:
---

   Resolution: Fixed
Fix Version/s: 0.98.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Integrated into 0.96 and trunk. Thanks.

> Regions are unexpectedly made offline in certain failure conditions
> ---
>
> Key: HBASE-9480
> URL: https://issues.apache.org/jira/browse/HBASE-9480
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.0
>
> Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
> trunk-9480_v1.2.patch, trunk-9480_v2.patch
>
>
> Came across this issue (HBASE-9338 test):
> 1. Client issues a request to move a region from ServerA to ServerB
> 2. ServerA is compacting that region and doesn't close region immediately. In 
> fact, it takes a while to complete the request.
> 3. The master in the meantime, sends another close request.
> 4. ServerA sends it a NotServingRegionException
> 5. Master handles the exception, deletes the znode, and invokes regionOffline 
> for the said region.
> 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
> deleted.
> The region is permanently offline.
> There are potentially other situations where when a RegionServer is offline 
> and the client asks for a region move off from that server, the master makes 
> the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9523) Audit of hbase-common @InterfaceAudience.Public apis.

2013-09-14 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-9523:
--

Status: Patch Available  (was: In Progress)

> Audit of hbase-common @InterfaceAudience.Public apis.
> -
>
> Key: HBASE-9523
> URL: https://issues.apache.org/jira/browse/HBASE-9523
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.95.2
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9523.patch
>
>
> Do an audit of all public classes to make suare we are only publicly exposing 
> what must be exposed.   
> This was done by comparing the Public only version of the javadoc generated 
> by HBASE-9517 to a local javadoc for the hbase-common module (cd 
> hbase-common; mvn javadoc:javadoc).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9523) Audit of hbase-common @InterfaceAudience.Public apis.

2013-09-14 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767551#comment-13767551
 ] 

Jonathan Hsieh commented on HBASE-9523:
---

The patch attached should have taken into account nick and stack's comments, 
the findings from the hbase-client work, and the previous patch that made 
unmarked elements private.

> Audit of hbase-common @InterfaceAudience.Public apis.
> -
>
> Key: HBASE-9523
> URL: https://issues.apache.org/jira/browse/HBASE-9523
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.95.2
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9523.patch
>
>
> Do an audit of all public classes to make suare we are only publicly exposing 
> what must be exposed.   
> This was done by comparing the Public only version of the javadoc generated 
> by HBASE-9517 to a local javadoc for the hbase-common module (cd 
> hbase-common; mvn javadoc:javadoc).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9523) Audit of hbase-common @InterfaceAudience.Public apis.

2013-09-14 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-9523:
--

Attachment: hbase-9523.patch

> Audit of hbase-common @InterfaceAudience.Public apis.
> -
>
> Key: HBASE-9523
> URL: https://issues.apache.org/jira/browse/HBASE-9523
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.95.2
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9523.patch
>
>
> Do an audit of all public classes to make suare we are only publicly exposing 
> what must be exposed.   
> This was done by comparing the Public only version of the javadoc generated 
> by HBASE-9517 to a local javadoc for the hbase-common module (cd 
> hbase-common; mvn javadoc:javadoc).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9514) Prevent region from assigning before log splitting is done

2013-09-14 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767546#comment-13767546
 ] 

Jimmy Xiang commented on HBASE-9514:


How about a RS is dead but master doesn't know about it yet? So I was thinking 
to control it from the root, AM#assign() method, the final place an openRegion 
request is sent out to another RS.

> Prevent region from assigning before log splitting is done
> --
>
> Key: HBASE-9514
> URL: https://issues.apache.org/jira/browse/HBASE-9514
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>
> If a region is assigned before log splitting is done by the server shutdown 
> handler, the edits belonging to this region in the hlogs of the dead server 
> will be lost.
> Generally this is not an issue if users don't assign/unassign a region from 
> hbase shell or via hbase admin. These commands are marked for experts only in 
> the hbase shell help too.  However, chaos monkey doesn't care.
> If we can prevent from assigning such regions in a bad time, it would make 
> things a little safer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-9335) Zombie test detection should filter out non-HBase tests

2013-09-14 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-9335.
---

Resolution: Later

> Zombie test detection should filter out non-HBase tests
> ---
>
> Key: HBASE-9335
> URL: https://issues.apache.org/jira/browse/HBASE-9335
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>
> Zombie test detection in test-patch.sh sometimes picks up tests from other 
> TLP.
> e.g. from https://builds.apache.org/job/PreCommit-HBASE-Build/6869/console:
> {code}
> "main" prio=10 tid=0x091b4800 nid=0x7634 waiting on condition [0xf69b1000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestFailoverWithBlockTokensEnabled.TestFailoverAfterAccessKeyUpdate(TestFailoverWithBlockTokensEnabled.java:159)
> {code}
> When the zombie test doesn't belong to org.apache.hadoop.hbase namespace, it 
> shouldn't be listed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9519) fix NPE in EncodedScannerV2.getFirstKeyInBlock()

2013-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767537#comment-13767537
 ] 

Hadoop QA commented on HBASE-9519:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12602951/HBASE-9519-v2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//console

This message is automatically generated.

> fix NPE in EncodedScannerV2.getFirstKeyInBlock()
> 
>
> Key: HBASE-9519
> URL: https://issues.apache.org/jira/browse/HBASE-9519
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Affects Versions: 0.98.0, 0.96.1
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HBASE-9519.txt, HBASE-9519-v2.txt
>
>
> we observed a reproducable NPE while scanning special table under special 
> condition in our IntegratedTesting scenario, it was fixed by appling the 
> attached patch.
> org.apache.hadoop.hbase.client.ScannerCallable@67ee75a5, java.io.IOException: 
> java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1186)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1175)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2391)
> at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:456)
> at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.getFirstKeyInBlock(HFileReaderV2.java:1071)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:547)
> at 
> org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:159)
> at 
> org.apache.hadoop.hbase.io.HalfStoreFi

[jira] [Commented] (HBASE-9519) fix NPE in EncodedScannerV2.getFirstKeyInBlock()

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767522#comment-13767522
 ] 

stack commented on HBASE-9519:
--

[~xieliang007] I started it here 
https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/7227/ 
but hadoopqa is having some issues; one of its disks is full maybe this 
build will work.

> fix NPE in EncodedScannerV2.getFirstKeyInBlock()
> 
>
> Key: HBASE-9519
> URL: https://issues.apache.org/jira/browse/HBASE-9519
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Affects Versions: 0.98.0, 0.96.1
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HBASE-9519.txt, HBASE-9519-v2.txt
>
>
> we observed a reproducable NPE while scanning special table under special 
> condition in our IntegratedTesting scenario, it was fixed by appling the 
> attached patch.
> org.apache.hadoop.hbase.client.ScannerCallable@67ee75a5, java.io.IOException: 
> java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1186)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1175)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2391)
> at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:456)
> at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.getFirstKeyInBlock(HFileReaderV2.java:1071)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:547)
> at 
> org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:159)
> at 
> org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:142)
> at 
> org.apache.hadoop.hbase.io.HalfStoreFileReader.getLastKey(HalfStoreFileReader.java:267)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.passesKeyRangeFilter(StoreFile.java:1543)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.shouldUseScanner(StoreFileScanner.java:375)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.selectScannersFrom(StoreScanner.java:298)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.getScannersNoCompaction(StoreScanner.java:262)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:149)
> at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2122)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:3460)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1645)
> at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1635)
> at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1610)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2377)
> ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9047) Tool to handle finishing replication when the cluster is offline

2013-09-14 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767521#comment-13767521
 ] 

Demai Ni commented on HBASE-9047:
-

[~jdcryans] 

thank you so much for the review comments and suggestions. I will remove the 
'system.out.println', fix the typo, and remove the 'copyright' line. Also I 
will remove 'conf.setBoolean(HConstants.REPLICATION_ENABLE_KEY, true)', and 
change the testcase per your suggestion.

about the Thread.sleep(3), many thanks for pointing it out. It would be a 
bug in making. I will put some zookeeper checking with a timeout loop.

Let me look into the 'tool' class. I assume to make it a Runnable class, and 
use run() method as the main body is the key here. Thanks for the suggestion.

As for the code style, I am using eclipse and import a 
hbase_eclipse_formatter.xml(http://hbase.apache.org/book/ides.html), but I 
realized that I must miss something from the experience of this and past patch 
submission. Is it the right way to follow? Is there a style checking script 
that I can run before submit? Thanks

Have a nice weekend

Demai 

> Tool to handle finishing replication when the cluster is offline
> 
>
> Key: HBASE-9047
> URL: https://issues.apache.org/jira/browse/HBASE-9047
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jean-Daniel Cryans
>Assignee: Demai Ni
> Attachments: HBASE-9047-0.94.9-v0.PATCH, HBASE-9047-trunk-v0.patch
>
>
> We're having a discussion on the mailing list about replicating the data on a 
> cluster that was shut down in an offline fashion. The motivation could be 
> that you don't want to bring HBase back up but still need that data on the 
> slave.
> So I have this idea of a tool that would be running on the master cluster 
> while it is down, although it could also run at any time. Basically it would 
> be able to read the replication state of each master region server, finish 
> replicating what's missing to all the slave, and then clear that state in 
> zookeeper.
> The code that handles replication does most of that already, see 
> ReplicationSourceManager and ReplicationSource. Basically when 
> ReplicationSourceManager.init() is called, it will check all the queues in ZK 
> and try to grab those that aren't attached to a region server. If the whole 
> cluster is down, it will grab all of them.
> The beautiful thing here is that you could start that tool on all your 
> machines and the load will be spread out, but that might not be a big concern 
> if replication wasn't lagging since it would take a few seconds to finish 
> replicating the missing data for each region server.
> I'm guessing when starting ReplicationSourceManager you'd give it a fake 
> region server ID, and you'd tell it not to start its own source.
> FWIW the main difference in how replication is handled between Apache's HBase 
> and Facebook's is that the latter is always done separately of HBase itself. 
> This jira isn't about doing that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9528) Adaptive compaction

2013-09-14 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767504#comment-13767504
 ] 

Liang Xie commented on HBASE-9528:
--

yeh, both "central planning compaction" and "compaction scheduler" seems more 
suitable:)


> Adaptive compaction
> ---
>
> Key: HBASE-9528
> URL: https://issues.apache.org/jira/browse/HBASE-9528
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 0.98.0
>Reporter: Liang Xie
>
> Currently, the compaction policy granularity is based on single machine. we 
> had a thought that introduce a new cluster granularity decision, such that we 
> could improve those case per cluster running status:
> 1) many nodes are compacting aggressive, we call it cluster compaction storm, 
> we should throttle it.
> 2) do more compaction if low traffic in current cluster(similar with off-peak 
> feature), not limit by config timerange(like off-peak timerange), just 
> trigger by load or qps or other stuff.
> comments? thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9519) fix NPE in EncodedScannerV2.getFirstKeyInBlock()

2013-09-14 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767503#comment-13767503
 ] 

Liang Xie commented on HBASE-9519:
--

take it easy, nobody is pissed off :)

could we kick off another QA run manually on build server?

> fix NPE in EncodedScannerV2.getFirstKeyInBlock()
> 
>
> Key: HBASE-9519
> URL: https://issues.apache.org/jira/browse/HBASE-9519
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Affects Versions: 0.98.0, 0.96.1
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HBASE-9519.txt, HBASE-9519-v2.txt
>
>
> we observed a reproducable NPE while scanning special table under special 
> condition in our IntegratedTesting scenario, it was fixed by appling the 
> attached patch.
> org.apache.hadoop.hbase.client.ScannerCallable@67ee75a5, java.io.IOException: 
> java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1186)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1175)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2391)
> at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:456)
> at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.getFirstKeyInBlock(HFileReaderV2.java:1071)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:547)
> at 
> org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:159)
> at 
> org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:142)
> at 
> org.apache.hadoop.hbase.io.HalfStoreFileReader.getLastKey(HalfStoreFileReader.java:267)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.passesKeyRangeFilter(StoreFile.java:1543)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.shouldUseScanner(StoreFileScanner.java:375)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.selectScannersFrom(StoreScanner.java:298)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.getScannersNoCompaction(StoreScanner.java:262)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:149)
> at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2122)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:3460)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1645)
> at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1635)
> at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1610)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2377)
> ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9502) HStore.seekToScanner should handle magic value

2013-09-14 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767483#comment-13767483
 ] 

Liang Xie commented on HBASE-9502:
--

Hi [~stack], i thought you need to patch HBASE-9518 first, then run this test, 
and i thought it'll fail w/o the patch.

> HStore.seekToScanner should handle magic value
> --
>
> Key: HBASE-9502
> URL: https://issues.apache.org/jira/browse/HBASE-9502
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Scanners
>Affects Versions: 0.98.0, 0.95.2, 0.96.1
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HBASE-9502.txt
>
>
> due to faked key, the seekTo probably reture -2, and HStore.seekToScanner 
> should handle this corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9518) getFakedKey() improvement

2013-09-14 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767482#comment-13767482
 ] 

Liang Xie commented on HBASE-9518:
--

Hi [~stack] you can see the new TestKeyValue case:
if the last kv of previous block and the first kv of current block have same 
postfix and just 1 offset diff, e.g.  100abcdefg and 101abcdefg,
before 9518, the getShortMidpointKey() will fallback to the default right kv, 
say 101abcdefg.
after 9518, it'll return "101", a shorter faked value, still reasonable, right? 
:)
And i found this corner case existing in current hbase test cases as well, so 
i'd like to let it go into community codebase also.

> getFakedKey() improvement
> -
>
> Key: HBASE-9518
> URL: https://issues.apache.org/jira/browse/HBASE-9518
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.98.0, 0.96.1
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HBASE-9518.txt, HBASE-9518-v2.txt
>
>
> make generating faked key algo more aggressive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9390) coprocessors observers are not called during a recovery with the new log replay algorithm

2013-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767479#comment-13767479
 ] 

Hudson commented on HBASE-9390:
---

SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #728 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/728/])
hbase-9390: coprocessors observers are not called during a recovery with the 
new log replay algorithm - 1 (jeffreyz: rev 1523172)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java


> coprocessors observers are not called during a recovery with the new log 
> replay algorithm
> -
>
> Key: HBASE-9390
> URL: https://issues.apache.org/jira/browse/HBASE-9390
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, MTTR
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Jeffrey Zhong
> Attachments: copro.patch, hbase-9390.patch, hbase-9390-v2.patch
>
>
> See the patch to reproduce the issue: If we activate log replay we don't have 
> the events on WAL restore.
> Pinging [~jeffreyz], we discussed this offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9461) Some doc and cleanup in RPCServer

2013-09-14 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767422#comment-13767422
 ] 

Nicolas Liochon commented on HBASE-9461:


bq. was going to commit this since it some progress
sure. I was mainly hijacking the jira :-)

bq. The Delay stuff is unused I think. It was an experiment. Maybe I'll look at 
that next and purge it if I can.
It's my impression as well (the code is HBASE-3899). The idea seems very good, 
but if it's not used the ratio complexity vs. usefulness can't be good.

> Some doc and cleanup in RPCServer
> -
>
> Key: HBASE-9461
> URL: https://issues.apache.org/jira/browse/HBASE-9461
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Attachments: 9461.txt, 9461v2.txt, ipc2.ucls
>
>
> RPC is a dog to follow.  I want to do buffer pooling for reading requests but 
> its tough drawing the diagram of who is doing what when.  HBASE-8884 seems to 
> have made it more involved still.  This issue is about doing a bit of 
> untangling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira