[jira] [Commented] (HBASE-6012) AssignmentManager#asyncSetOfflineInZooKeeper wouldn't force node offline

2012-05-16 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276528#comment-13276528
 ] 

chunhui shen commented on HBASE-6012:
-

@ram
I think we should take care of the disable and disabling table before called 
bulk assign

> AssignmentManager#asyncSetOfflineInZooKeeper wouldn't force node offline
> 
>
> Key: HBASE-6012
> URL: https://issues.apache.org/jira/browse/HBASE-6012
> Project: HBase
>  Issue Type: Bug
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-6012.patch
>
>
> As the javadoc of method and the log message
> {code}
> /**
>* Set region as OFFLINED up in zookeeper asynchronously.
>*/
> boolean asyncSetOfflineInZooKeeper(
> ...
> master.abort("Unexpected ZK exception creating/setting node OFFLINE", e);
> ...
> }
> {code}
> I think AssignmentManager#asyncSetOfflineInZooKeeper should also force node 
> offline, just like AssignmentManager#setOfflineInZooKeeper do. Otherwise, it 
> may cause bulk assign failed which called this method.
> Error log on the master caused by the issue
> 2012-05-12 01:40:09,437 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
> was=writetest,1YTQDPGLXBTICHOPQ6IL,1336590857771.674da422fc7cb9a7d42c74499ace1d93.
>  state=PENDING_CLOSE, ts=1336757876856 
> 2012-05-12 01:40:09,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x23736bf74780082 Async create of unassigned node for 
> 674da422fc7cb9a7d42c74499ace1d93 with OFFLINE state 
> 2012-05-12 01:40:09,446 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback:
>  rc != 0 for /hbase-func1/unassigned/674da422fc7cb9a7d42c74499ace1d93 -- 
> retryable connectionloss -- FIX see 
> http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2 
> 2012-05-12 01:40:09,447 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Connectionloss writing unassigned at 
> /hbase-func1/unassigned/674da422fc7cb9a7d42c74499ace1d93, rc=-110 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-6012) AssignmentManager#asyncSetOfflineInZooKeeper wouldn't force node offline

2012-05-16 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276470#comment-13276470
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-6012 at 5/16/12 7:20 AM:


@Chunhuui
We observed similar problems when we tried to work on HBASE-5917.  Hence in 
HBASE-5917 we have tried to handle one such issue.  We are trying to see if any 
such problems can happen.
Because in normal assign flow we have many conditions added for various 
scenarios but the same doesn't apply in bulk assign.  Will keep you updated on 
that.

Mentioned the issue id wrongly. it is HBASE-5927
{edit}
We observed similar problems when we tried to work on HBASE-5927.  Hence in 
HBASE-5927 we have tried to handle one such issue.  We are trying to see if any 
such problems can happen.
Because in normal assign flow we have many conditions added for various 
scenarios but the same doesn't apply in bulk assign.  Will keep you updated on 
that.
{edit}

  was (Author: ram_krish):
@Chunhuui
We observed similar problems when we tried to work on HBASE-5917.  Hence in 
HBASE-5917 we have tried to handle one such issue.  We are trying to see if any 
such problems can happen.
Because in normal assign flow we have many conditions added for various 
scenarios but the same doesn't apply in bulk assign.  Will keep you updated on 
that.
  
> AssignmentManager#asyncSetOfflineInZooKeeper wouldn't force node offline
> 
>
> Key: HBASE-6012
> URL: https://issues.apache.org/jira/browse/HBASE-6012
> Project: HBase
>  Issue Type: Bug
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-6012.patch
>
>
> As the javadoc of method and the log message
> {code}
> /**
>* Set region as OFFLINED up in zookeeper asynchronously.
>*/
> boolean asyncSetOfflineInZooKeeper(
> ...
> master.abort("Unexpected ZK exception creating/setting node OFFLINE", e);
> ...
> }
> {code}
> I think AssignmentManager#asyncSetOfflineInZooKeeper should also force node 
> offline, just like AssignmentManager#setOfflineInZooKeeper do. Otherwise, it 
> may cause bulk assign failed which called this method.
> Error log on the master caused by the issue
> 2012-05-12 01:40:09,437 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
> was=writetest,1YTQDPGLXBTICHOPQ6IL,1336590857771.674da422fc7cb9a7d42c74499ace1d93.
>  state=PENDING_CLOSE, ts=1336757876856 
> 2012-05-12 01:40:09,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x23736bf74780082 Async create of unassigned node for 
> 674da422fc7cb9a7d42c74499ace1d93 with OFFLINE state 
> 2012-05-12 01:40:09,446 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback:
>  rc != 0 for /hbase-func1/unassigned/674da422fc7cb9a7d42c74499ace1d93 -- 
> retryable connectionloss -- FIX see 
> http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2 
> 2012-05-12 01:40:09,447 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Connectionloss writing unassigned at 
> /hbase-func1/unassigned/674da422fc7cb9a7d42c74499ace1d93, rc=-110 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Derek Wollenstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek Wollenstein updated HBASE-5920:
-

Status: Open  (was: Patch Available)

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea for other reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Derek Wollenstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek Wollenstein updated HBASE-5920:
-

Attachment: (was: HBASE-5920.patch)

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea for other reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Derek Wollenstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek Wollenstein updated HBASE-5920:
-

Attachment: HBASE-5920-0.92.1.patch

Creating a new version of the patch with svn

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea for other reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Derek Wollenstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek Wollenstein updated HBASE-5920:
-

Release Note: 
This patch makes the following changes:
1) Trace-level logging whenever a compaction is requested
2) Debug-level logging whenever compaction is changed
3) If a user requests a major compaction, this will stay a major compaction 
even if it violated max files (easy to take this part out)
3a) If a user-initiates a max compaction that requires too many files to be 
compacted, this will log an error. 
4) Migrates utility functions from HBaseTestCase (Deprecated?) to 
HBaseTestingUtility to ease testing compaction behavior in TestCompaction
  Status: Patch Available  (was: Open)

Trying one more time -- I was just using "diff" rather than svn diff last time. 

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea 

[jira] [Commented] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276552#comment-13276552
 ] 

Hadoop QA commented on HBASE-5920:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12527577/HBASE-5920-0.92.1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 23 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1885//console

This message is automatically generated.

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea for other reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA admin

[jira] [Commented] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276567#comment-13276567
 ] 

Hadoop QA commented on HBASE-5920:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12527581/HBASE-5920-0.92.1-1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 23 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1886//console

This message is automatically generated.

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1-1.patch, HBASE-5920-0.92.1.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea for other reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, p

[jira] [Updated] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Derek Wollenstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek Wollenstein updated HBASE-5920:
-

Attachment: HBASE-5920-0.92.1-1.patch

changing location again

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1-1.patch, HBASE-5920-0.92.1.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea for other reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Derek Wollenstein (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276570#comment-13276570
 ] 

Derek Wollenstein commented on HBASE-5920:
--

I just noticed that this seems to be a jenkins error(patches seem to have been 
failing for several days?)  I'll reach out via IRC / etc to see what's wrong 
here
"At revision 1339047.
HBASE-5920 patch is being downloaded at Wed May 16 08:01:44 UTC 2012 from
http://issues.apache.org/jira/secure/attachment/12527581/HBASE-5920-0.92.1-1.patch
cp: cannot stat `/home/jenkins/buildSupport/lib/*': No such file or directory
"

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1-1.patch, HBASE-5920-0.92.1.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea for other reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/s

[jira] [Commented] (HBASE-5927) SSH and DisableTableHandler happening together does not clear the znode of the region and RIT map.

2012-05-16 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276579#comment-13276579
 ] 

chunhui shen commented on HBASE-5927:
-

@Rajesh

bq.HBASE-5914 causing assignment of disabling/ disabled table regions. Avoided 
by removing regioninfo from toAssignRegions list if disable is happening to a 
table.

I think it's not right.

1.ServerShutdownHandler#processDeadRegion will return false if the table is 
disabled , so we wouldn't add the region to toAssignRegions in the process of 
ServerShutdownHandler.

2.For the disabling table regions, ServerShutdownHandler will assign it with 
HBASE-5914, however, OpenedRegionHandler will unassign it again after it is 
opened.


However, we found SSH and DisableTableHandler happening together may cause 
HBASE-6012.

Could we make a new method which cancel closing if region belongs to disabling ?

So we could fix the issue as the following

{code}
 if (!regions.containsKey(region)) {
LOG.debug("Attempted to unassign region " +
  region.getRegionNameAsString() + " but it is not " +
  "currently assigned anywhere");
cancelClosingRegionIfDisabling(region)
return;
  }


if (t instanceof NotServingRegionException) {
cancelClosingRegionIfDisabling(region)

}
...


private void cancelClosingRegionIfDisabling(HRegionInfo region) {
if (checkIfRegionBelongsToDisabling(region)) {
  // Remove from the regionsinTransition map
  LOG.info("While trying to recover the table "
  + region.getTableNameAsString()
  + " to DISABLED state the region " + region
  + " was offlined but the table was in DISABLING state");
  synchronized (this.regionsInTransition) {
this.regionsInTransition.remove(region.getEncodedName());
  }
  // Remove from the regionsMap
  synchronized (this.regions) {
this.regions.remove(region);
Set serverRegions = this.servers.get(server);
if (!serverRegions.remove(region)) {
  LOG.warn("No " + region + " on " + server);
}
  }
  deleteClosingOrClosedNode(region);
}

  }
{code}

Correct me if wrong, thanks!





> SSH and DisableTableHandler happening together does not clear the znode of 
> the region and RIT map.
> --
>
> Key: HBASE-5927
> URL: https://issues.apache.org/jira/browse/HBASE-5927
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.96.0, 0.94.1
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-5927_94.patch, HBASE-5927_94_v2.patch, 
> HBASE-5927_trunk.patch, HBASE-5927_trunk_2.patch, TestCaseForReProduce.txt
>
>
> A possible exception: If the related regionserver was just killed(But HMaster 
> has not perceived that), then we will get a local exception "Connection reset 
> by peer". If this region belongs to a disabling table. what will happen?
> ServerShutdownHandler will remove this region from AM#regions. So this region 
> is still existing in RIT. TimeoutMonitor will take care of it after it got 
> timeout. Then invoke unassign again. Since this region has been removed from 
> AM#regions, it will return directly due to the below code:
> {code}
> synchronized (this.regions) {
>   // Check if this region is currently assigned
>   if (!regions.containsKey(region)) {
> LOG.debug("Attempted to unassign region " +
>   region.getRegionNameAsString() + " but it is not " +
>   "currently assigned anywhere");
> return;
>   }
> }
> {code}
> Then it leads to an end-less loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5927) SSH and DisableTableHandler happening together does not clear the znode of the region and RIT map.

2012-05-16 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276586#comment-13276586
 ] 

ramkrishna.s.vasudevan commented on HBASE-5927:
---

bq.2.For the disabling table regions, ServerShutdownHandler will assign it with 
HBASE-5914, however, OpenedRegionHandler will unassign it again after it is 
opened.

But why should an assign happen and then go with reassign when we already know 
it is a disabling table? Anyway we will try to come up with a wholesome 
solution taking your suggestion also.
Good on you Chunhui.  

> SSH and DisableTableHandler happening together does not clear the znode of 
> the region and RIT map.
> --
>
> Key: HBASE-5927
> URL: https://issues.apache.org/jira/browse/HBASE-5927
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.96.0, 0.94.1
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-5927_94.patch, HBASE-5927_94_v2.patch, 
> HBASE-5927_trunk.patch, HBASE-5927_trunk_2.patch, TestCaseForReProduce.txt
>
>
> A possible exception: If the related regionserver was just killed(But HMaster 
> has not perceived that), then we will get a local exception "Connection reset 
> by peer". If this region belongs to a disabling table. what will happen?
> ServerShutdownHandler will remove this region from AM#regions. So this region 
> is still existing in RIT. TimeoutMonitor will take care of it after it got 
> timeout. Then invoke unassign again. Since this region has been removed from 
> AM#regions, it will return directly due to the below code:
> {code}
> synchronized (this.regions) {
>   // Check if this region is currently assigned
>   if (!regions.containsKey(region)) {
> LOG.debug("Attempted to unassign region " +
>   region.getRegionNameAsString() + " but it is not " +
>   "currently assigned anywhere");
> return;
>   }
> }
> {code}
> Then it leads to an end-less loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5927) SSH and DisableTableHandler happening together does not clear the znode of the region and RIT map.

2012-05-16 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276592#comment-13276592
 ] 

chunhui shen commented on HBASE-5927:
-

bq.But why should an assign happen and then go with reassign when we already 
know it is a disabling table

Yes, we needn't do the reassign. But I think it's the problem of 
ServerShutdownHandler#processDeadRegion,
why not return false for disabling table region.
So I think we could modify a little for the 
ServerShutdownHandler#processDeadRegion:
{code}
public static boolean processDeadRegion(HRegionInfo hri, Result result,
  AssignmentManager assignmentManager, CatalogTracker catalogTracker)
  throws IOException {
   ...
if (hri.isOffline() && hri.isSplit()) {
  LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
"; checking daughter presence");
  if (MetaReader.getRegion(catalogTracker, hri.getRegionName()) == null) {
return false;
  }
  fixupDaughters(result, assignmentManager, catalogTracker);
  return false;
}
// If table is not disabled but the region is offlined,
boolean disabling = assignmentManager.getZKTable().isDisablingTable(
hri.getTableNameAsString());
if (disabling) {
  LOG.info("The table " + hri.getTableNameAsString()
  + " is disabling.  Hence not assign it.");
  return false;
}
return true;
  }
{code}

> SSH and DisableTableHandler happening together does not clear the znode of 
> the region and RIT map.
> --
>
> Key: HBASE-5927
> URL: https://issues.apache.org/jira/browse/HBASE-5927
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.96.0, 0.94.1
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-5927_94.patch, HBASE-5927_94_v2.patch, 
> HBASE-5927_trunk.patch, HBASE-5927_trunk_2.patch, TestCaseForReProduce.txt
>
>
> A possible exception: If the related regionserver was just killed(But HMaster 
> has not perceived that), then we will get a local exception "Connection reset 
> by peer". If this region belongs to a disabling table. what will happen?
> ServerShutdownHandler will remove this region from AM#regions. So this region 
> is still existing in RIT. TimeoutMonitor will take care of it after it got 
> timeout. Then invoke unassign again. Since this region has been removed from 
> AM#regions, it will return directly due to the below code:
> {code}
> synchronized (this.regions) {
>   // Check if this region is currently assigned
>   if (!regions.containsKey(region)) {
> LOG.debug("Attempted to unassign region " +
>   region.getRegionNameAsString() + " but it is not " +
>   "currently assigned anywhere");
> return;
>   }
> }
> {code}
> Then it leads to an end-less loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5927) SSH and DisableTableHandler happening together does not clear the znode of the region and RIT map.

2012-05-16 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276605#comment-13276605
 ] 

rajeshbabu commented on HBASE-5927:
---


bq.2.For the disabling table regions, ServerShutdownHandler will assign it with 
HBASE-5914, however, OpenedRegionHandler will unassign it again after it is 
opened.

unassign happen only when znode deletion failed in OpenedRegionHandler
{code}
  openedNodeDeleted = deleteOpenedNode(expectedVersion);
{code}
if znode deleted success fully we end up with assignment of disabling table 
regions because we cannot call unassign in this case.

> SSH and DisableTableHandler happening together does not clear the znode of 
> the region and RIT map.
> --
>
> Key: HBASE-5927
> URL: https://issues.apache.org/jira/browse/HBASE-5927
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.96.0, 0.94.1
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-5927_94.patch, HBASE-5927_94_v2.patch, 
> HBASE-5927_trunk.patch, HBASE-5927_trunk_2.patch, TestCaseForReProduce.txt
>
>
> A possible exception: If the related regionserver was just killed(But HMaster 
> has not perceived that), then we will get a local exception "Connection reset 
> by peer". If this region belongs to a disabling table. what will happen?
> ServerShutdownHandler will remove this region from AM#regions. So this region 
> is still existing in RIT. TimeoutMonitor will take care of it after it got 
> timeout. Then invoke unassign again. Since this region has been removed from 
> AM#regions, it will return directly due to the below code:
> {code}
> synchronized (this.regions) {
>   // Check if this region is currently assigned
>   if (!regions.containsKey(region)) {
> LOG.debug("Attempted to unassign region " +
>   region.getRegionNameAsString() + " but it is not " +
>   "currently assigned anywhere");
> return;
>   }
> }
> {code}
> Then it leads to an end-less loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5927) SSH and DisableTableHandler happening together does not clear the znode of the region and RIT map.

2012-05-16 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276609#comment-13276609
 ] 

chunhui shen commented on HBASE-5927:
-

@rajeshbabu
I'm sorry I didn't see the current logic in OpenedRegionHandler which only call 
unassign if failed deleting znode.

So, what about change the ServerShutdownHandler#processDeadRegion, return false 
for the disabling table regions.

> SSH and DisableTableHandler happening together does not clear the znode of 
> the region and RIT map.
> --
>
> Key: HBASE-5927
> URL: https://issues.apache.org/jira/browse/HBASE-5927
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.96.0, 0.94.1
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-5927_94.patch, HBASE-5927_94_v2.patch, 
> HBASE-5927_trunk.patch, HBASE-5927_trunk_2.patch, TestCaseForReProduce.txt
>
>
> A possible exception: If the related regionserver was just killed(But HMaster 
> has not perceived that), then we will get a local exception "Connection reset 
> by peer". If this region belongs to a disabling table. what will happen?
> ServerShutdownHandler will remove this region from AM#regions. So this region 
> is still existing in RIT. TimeoutMonitor will take care of it after it got 
> timeout. Then invoke unassign again. Since this region has been removed from 
> AM#regions, it will return directly due to the below code:
> {code}
> synchronized (this.regions) {
>   // Check if this region is currently assigned
>   if (!regions.containsKey(region)) {
> LOG.debug("Attempted to unassign region " +
>   region.getRegionNameAsString() + " but it is not " +
>   "currently assigned anywhere");
> return;
>   }
> }
> {code}
> Then it leads to an end-less loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5927) SSH and DisableTableHandler happening together does not clear the znode of the region and RIT map.

2012-05-16 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276610#comment-13276610
 ] 

rajeshbabu commented on HBASE-5927:
---

bq.So I think we could modify a little for the 
ServerShutdownHandler#processDeadRegion:
{code}
 boolean disabling = assignmentManager.getZKTable().isDisablingTable(
hri.getTableNameAsString());
{code}
In this case we are not adding region info to toAssignRegions list no need to 
remove from this list
{code}
toAssignRegions.remove(hri)
{code}

But even we return false in case of disabling/disabled there may be chance of 
znode and rit present. 
we need to wait until timeout monitor(present 30min.) to trigger unassign and 
call 
{code}
cancelClosingRegionIfDisabling(region)
{code}

Thats why we need to have below statements to remove rit and clear znode.
{code}
  am.deleteClosingOrClosedNode(hri);
   am.regionOffline(hri);
{code}


> SSH and DisableTableHandler happening together does not clear the znode of 
> the region and RIT map.
> --
>
> Key: HBASE-5927
> URL: https://issues.apache.org/jira/browse/HBASE-5927
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.96.0, 0.94.1
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-5927_94.patch, HBASE-5927_94_v2.patch, 
> HBASE-5927_trunk.patch, HBASE-5927_trunk_2.patch, TestCaseForReProduce.txt
>
>
> A possible exception: If the related regionserver was just killed(But HMaster 
> has not perceived that), then we will get a local exception "Connection reset 
> by peer". If this region belongs to a disabling table. what will happen?
> ServerShutdownHandler will remove this region from AM#regions. So this region 
> is still existing in RIT. TimeoutMonitor will take care of it after it got 
> timeout. Then invoke unassign again. Since this region has been removed from 
> AM#regions, it will return directly due to the below code:
> {code}
> synchronized (this.regions) {
>   // Check if this region is currently assigned
>   if (!regions.containsKey(region)) {
> LOG.debug("Attempted to unassign region " +
>   region.getRegionNameAsString() + " but it is not " +
>   "currently assigned anywhere");
> return;
>   }
> }
> {code}
> Then it leads to an end-less loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5927) SSH and DisableTableHandler happening together does not clear the znode of the region and RIT map.

2012-05-16 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276622#comment-13276622
 ] 

chunhui shen commented on HBASE-5927:
-

@rajeshbabu 
bq.we need to wait until timeout monitor(present 30min.) to trigger unassign 
and call
So it would be better cancel closing disabling regions since we found it the in 
the process of ServerShutdownHandler.

I'm clear, thanks

> SSH and DisableTableHandler happening together does not clear the znode of 
> the region and RIT map.
> --
>
> Key: HBASE-5927
> URL: https://issues.apache.org/jira/browse/HBASE-5927
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.96.0, 0.94.1
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-5927_94.patch, HBASE-5927_94_v2.patch, 
> HBASE-5927_trunk.patch, HBASE-5927_trunk_2.patch, TestCaseForReProduce.txt
>
>
> A possible exception: If the related regionserver was just killed(But HMaster 
> has not perceived that), then we will get a local exception "Connection reset 
> by peer". If this region belongs to a disabling table. what will happen?
> ServerShutdownHandler will remove this region from AM#regions. So this region 
> is still existing in RIT. TimeoutMonitor will take care of it after it got 
> timeout. Then invoke unassign again. Since this region has been removed from 
> AM#regions, it will return directly due to the below code:
> {code}
> synchronized (this.regions) {
>   // Check if this region is currently assigned
>   if (!regions.containsKey(region)) {
> LOG.debug("Attempted to unassign region " +
>   region.getRegionNameAsString() + " but it is not " +
>   "currently assigned anywhere");
> return;
>   }
> }
> {code}
> Then it leads to an end-less loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6016) ServerShutdownHandler#processDeadRegion could return false for disabling table regions

2012-05-16 Thread chunhui shen (JIRA)
chunhui shen created HBASE-6016:
---

 Summary: ServerShutdownHandler#processDeadRegion could return 
false for disabling table regions
 Key: HBASE-6016
 URL: https://issues.apache.org/jira/browse/HBASE-6016
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: chunhui shen
Assignee: chunhui shen


{code}
   * @return Returns true if specified region should be assigned, false if not.
   * @throws IOException
   */
  public static boolean processDeadRegion(HRegionInfo hri, Result result,
  AssignmentManager assignmentManager, CatalogTracker catalogTracker)
{code}

For the disabling region, I think we needn't assign it , and processDeadRegion 
could return false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6016) ServerShutdownHandler#processDeadRegion could return false for disabling table regions

2012-05-16 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-6016:


Attachment: HBASE-6016.patch

> ServerShutdownHandler#processDeadRegion could return false for disabling 
> table regions
> --
>
> Key: HBASE-6016
> URL: https://issues.apache.org/jira/browse/HBASE-6016
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-6016.patch
>
>
> {code}
>* @return Returns true if specified region should be assigned, false if 
> not.
>* @throws IOException
>*/
>   public static boolean processDeadRegion(HRegionInfo hri, Result result,
>   AssignmentManager assignmentManager, CatalogTracker catalogTracker)
> {code}
> For the disabling region, I think we needn't assign it , and 
> processDeadRegion could return false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6016) ServerShutdownHandler#processDeadRegion could return false for disabling table regions

2012-05-16 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276638#comment-13276638
 ] 

rajeshbabu commented on HBASE-6016:
---

@chunhui
returning false in in case of table in disabling state 
{code}
public static boolean processDeadRegion(HRegionInfo hri, Result result,
  AssignmentManager assignmentManager, CatalogTracker catalogTracker)
{code}

will effect process dead servers during master start up.

-> lets suppose table disabling is happening ,disable table handler set state 
to disabling and unassign regions.
-> in the middle master restarted then with this change
{code}
  // Process with existing RS shutdown code
  boolean assign = ServerShutdownHandler.processDeadRegion(
  regionInfo, result, this, this.catalogTracker);
{code}

return false and then assign become false.

Now we cannot set offline in znode it will be in M_ZK_REGION_CLOSING state only.
{code}
  if (assign) {
ZKAssign.createOrForceNodeOffline(watcher, regionInfo,
master.getServerName());
if (!nodes.contains(regionInfo.getEncodedName())) {
  nodes.add(regionInfo.getEncodedName());
}
  }
{code}

We need to wait until timeout monitor to trigger(30 min) unassign.

But if set to offline
{code}
  if (assign) {
ZKAssign.createOrForceNodeOffline(watcher, regionInfo,
master.getServerName());
if (!nodes.contains(regionInfo.getEncodedName())) {
  nodes.add(regionInfo.getEncodedName());
}
  }

{code} 

During processRegionInTransition we will call assign and there we remove rit 
and znode if table is in disabling or disabled state.

Better not handle this case. Please correct me if wrong.

> ServerShutdownHandler#processDeadRegion could return false for disabling 
> table regions
> --
>
> Key: HBASE-6016
> URL: https://issues.apache.org/jira/browse/HBASE-6016
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-6016.patch
>
>
> {code}
>* @return Returns true if specified region should be assigned, false if 
> not.
>* @throws IOException
>*/
>   public static boolean processDeadRegion(HRegionInfo hri, Result result,
>   AssignmentManager assignmentManager, CatalogTracker catalogTracker)
> {code}
> For the disabling region, I think we needn't assign it , and 
> processDeadRegion could return false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6016) ServerShutdownHandler#processDeadRegion could return false for disabling table regions

2012-05-16 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276743#comment-13276743
 ] 

chunhui shen commented on HBASE-6016:
-

@rajeshbabu

When master startup after restart,

We will do the following:
AssignmentManager#joinCluster
->
AssignmentManager#processDeadServersAndRegionsInTransition
->
ZKUtil.listChildrenAndWatchForNewChildren(watcher,
  watcher.assignmentZNode);
AssignmentManager#processDeadServersAndRecoverLostRegions(deadServers, nodes);
the M_ZK_REGION_CLOSING or RS_ZK_REGION_CLOSED znode will be handled by
->
AssignmentManager#processRegionsInTransition

{code}
 case M_ZK_REGION_CLOSING:
// If zk node of the region was updated by a live server skip this
// region and just add it into RIT.
if (isOnDeadServer(regionInfo, deadServers) && (sn == null || 
!isServerOnline(sn))) {
  // If was on dead server, its closed now. Force to OFFLINE and this
  // will get it reassigned if appropriate
  forceOffline(regionInfo, rt);
} else {
  // Just insert region into RIT.
  // If this never updates the timeout will trigger new assignment
  regionsInTransition.put(encodedRegionName,
getRegionState(regionInfo, RegionState.State.CLOSING, rt));
}
failoverProcessedRegions.put(encodedRegionName, regionInfo);
break;

  case RS_ZK_REGION_CLOSED:
  case RS_ZK_REGION_FAILED_OPEN:
// Region is closed, insert into RIT and handle it
addToRITandCallClose(regionInfo, RegionState.State.CLOSED, rt);
failoverProcessedRegions.put(encodedRegionName, regionInfo);
break;
{code}

->AssignmentManager#addToRITandCallClose

So, we will close this region at last.

Thanks for the review, correct me if wrong.

> ServerShutdownHandler#processDeadRegion could return false for disabling 
> table regions
> --
>
> Key: HBASE-6016
> URL: https://issues.apache.org/jira/browse/HBASE-6016
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-6016.patch
>
>
> {code}
>* @return Returns true if specified region should be assigned, false if 
> not.
>* @throws IOException
>*/
>   public static boolean processDeadRegion(HRegionInfo hri, Result result,
>   AssignmentManager assignmentManager, CatalogTracker catalogTracker)
> {code}
> For the disabling region, I think we needn't assign it , and 
> processDeadRegion could return false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6016) ServerShutdownHandler#processDeadRegion could return false for disabling table regions

2012-05-16 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276764#comment-13276764
 ] 

rajeshbabu commented on HBASE-6016:
---

Yes,you are correct.

If we return false in case of disabling,we can aslo avoid creating znode and 
CloseRegionHandler/assign for already closed regions because 'assign' is false.
{code}
  if (assign) {
ZKAssign.createOrForceNodeOffline(watcher, regionInfo,
master.getServerName());
if (!nodes.contains(regionInfo.getEncodedName())) {
  nodes.add(regionInfo.getEncodedName());
}
{code}

By this we can only handle actual regions in transition. Its good.

Thanks. 

> ServerShutdownHandler#processDeadRegion could return false for disabling 
> table regions
> --
>
> Key: HBASE-6016
> URL: https://issues.apache.org/jira/browse/HBASE-6016
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-6016.patch
>
>
> {code}
>* @return Returns true if specified region should be assigned, false if 
> not.
>* @throws IOException
>*/
>   public static boolean processDeadRegion(HRegionInfo hri, Result result,
>   AssignmentManager assignmentManager, CatalogTracker catalogTracker)
> {code}
> For the disabling region, I think we needn't assign it , and 
> processDeadRegion could return false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6017) TestReplication fails occasionally

2012-05-16 Thread Devaraj Das (JIRA)
Devaraj Das created HBASE-6017:
--

 Summary: TestReplication fails occasionally
 Key: HBASE-6017
 URL: https://issues.apache.org/jira/browse/HBASE-6017
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: Devaraj Das


I see occasional failures in TestReplication on the 0.92 branch.

Running org.apache.hadoop.hbase.replication.TestReplication
Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 240.118 sec <<< 
FAILURE!

Results :

Failed tests:   
queueFailover(org.apache.hadoop.hbase.replication.TestReplication): Waited too 
much time for queueFailover replication

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276778#comment-13276778
 ] 

Zhihong Yu commented on HBASE-5920:
---

@Derek:
Hadoop QA only runs patch against trunk.

Please provide svn patch for trunk.

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1-1.patch, HBASE-5920-0.92.1.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea for other reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6018) hbck fails with a RejectedExecutionException

2012-05-16 Thread Jonathan Hsieh (JIRA)
Jonathan Hsieh created HBASE-6018:
-

 Summary: hbck fails with a RejectedExecutionException
 Key: HBASE-6018
 URL: https://issues.apache.org/jira/browse/HBASE-6018
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh


On a long running job 0.94.0rc3 cluster, we get to a point where hbck 
consistently encounters this error and fails:

{code}
Exception in thread "main" java.util.concurrent.RejectedExecutionException
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at 
org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionInfos(HBaseFsck.java:633)
at 
org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:354)
at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382)
at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120)
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6018) hbck fails with a RejectedExecutionException

2012-05-16 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276787#comment-13276787
 ] 

Jonathan Hsieh commented on HBASE-6018:
---


This line seems related to attempt to enqueue a work item into a 
SynchronousQueue introduced in HBASE-4859.  I don't understand why a 
SynchronousQueue is used (it has no capacity!) 

Problem goes away after this change:

{code}
diff --git a/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 
b/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
index 83aa316..8a050fd 100644
--- a/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
+++ b/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
@@ -33,7 +33,8 @@ import java.util.SortedSet;
 import java.util.TreeMap;
 import java.util.TreeSet;
 import java.util.concurrent.ConcurrentSkipListMap;
-import java.util.concurrent.SynchronousQueue;
+//import java.util.concurrent.SynchronousQueue;
+import java.util.concurrent.LinkedBlockingQueue;
 import java.util.concurrent.ThreadPoolExecutor;
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicInteger;
@@ -217,9 +218,9 @@ public class HBaseFsck {
 this.conf = conf;
 
 int numThreads = conf.getInt("hbasefsck.numthreads", MAX_NUM_THREADS);
-executor = new ThreadPoolExecutor(1, numThreads,
+executor = new ThreadPoolExecutor(numThreads, numThreads,
 THREADS_KEEP_ALIVE_SECONDS, TimeUnit.SECONDS,
-new SynchronousQueue());
+new LinkedBlockingQueue());
 executor.allowCoreThreadTimeOut(true);
   }
{code}

> hbck fails with a RejectedExecutionException
> 
>
> Key: HBASE-6018
> URL: https://issues.apache.org/jira/browse/HBASE-6018
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>
> On a long running job 0.94.0rc3 cluster, we get to a point where hbck 
> consistently encounters this error and fails:
> {code}
> Exception in thread "main" java.util.concurrent.RejectedExecutionException
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionInfos(HBaseFsck.java:633)
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:354)
>   at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382)
>   at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6018) hbck fails with a RejectedExecutionException

2012-05-16 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-6018:
--

Attachment: hbase-6018.patch

> hbck fails with a RejectedExecutionException
> 
>
> Key: HBASE-6018
> URL: https://issues.apache.org/jira/browse/HBASE-6018
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-6018.patch
>
>
> On a long running job 0.94.0rc3 cluster, we get to a point where hbck 
> consistently encounters this error and fails:
> {code}
> Exception in thread "main" java.util.concurrent.RejectedExecutionException
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionInfos(HBaseFsck.java:633)
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:354)
>   at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382)
>   at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6019) [refGuide] ported pseudo-distributed.html page to RefGuide Config chapter

2012-05-16 Thread Doug Meil (JIRA)
Doug Meil created HBASE-6019:


 Summary: [refGuide] ported pseudo-distributed.html page to 
RefGuide Config chapter
 Key: HBASE-6019
 URL: https://issues.apache.org/jira/browse/HBASE-6019
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor


Ported the separate pseudo-distributed.html page that existed outside the 
RefGuie into the Configuration chapter.

configuration.xml
* Added an example of a local pseudo-dist HDFS config file (the refguide didn't 
have this)
* Ported pseudo-dist extras to pseudo-dist section.

pseudo-distributed.xml
* This is the old page.  I'm leaving this backward compatibility so that the 
old links don't break, although the only thing this says now is the content has 
been moved to the RefGuide

site.xml
* Removing pseudo-dist extras from the left-hand nav.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6018) hbck fails with a RejectedExecutionException

2012-05-16 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-6018:
--

Status: Patch Available  (was: Open)

Tested on 0.94, applies on trunk.

> hbck fails with a RejectedExecutionException
> 
>
> Key: HBASE-6018
> URL: https://issues.apache.org/jira/browse/HBASE-6018
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.0, 0.92.1
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-6018.patch
>
>
> On a long running job 0.94.0rc3 cluster, we get to a point where hbck 
> consistently encounters this error and fails:
> {code}
> Exception in thread "main" java.util.concurrent.RejectedExecutionException
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionInfos(HBaseFsck.java:633)
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:354)
>   at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382)
>   at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6019) [refGuide] ported pseudo-distributed.html page to RefGuide Config chapter

2012-05-16 Thread Doug Meil (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-6019:
-

Status: Patch Available  (was: Open)

> [refGuide] ported pseudo-distributed.html page to RefGuide Config chapter
> -
>
> Key: HBASE-6019
> URL: https://issues.apache.org/jira/browse/HBASE-6019
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: hbase_hbase_6019.patch
>
>
> Ported the separate pseudo-distributed.html page that existed outside the 
> RefGuie into the Configuration chapter.
> configuration.xml
> * Added an example of a local pseudo-dist HDFS config file (the refguide 
> didn't have this)
> * Ported pseudo-dist extras to pseudo-dist section.
> pseudo-distributed.xml
> * This is the old page.  I'm leaving this backward compatibility so that the 
> old links don't break, although the only thing this says now is the content 
> has been moved to the RefGuide
> site.xml
> * Removing pseudo-dist extras from the left-hand nav.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6019) [refGuide] ported pseudo-distributed.html page to RefGuide Config chapter

2012-05-16 Thread Doug Meil (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-6019:
-

Attachment: hbase_hbase_6019.patch

> [refGuide] ported pseudo-distributed.html page to RefGuide Config chapter
> -
>
> Key: HBASE-6019
> URL: https://issues.apache.org/jira/browse/HBASE-6019
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: hbase_hbase_6019.patch
>
>
> Ported the separate pseudo-distributed.html page that existed outside the 
> RefGuie into the Configuration chapter.
> configuration.xml
> * Added an example of a local pseudo-dist HDFS config file (the refguide 
> didn't have this)
> * Ported pseudo-dist extras to pseudo-dist section.
> pseudo-distributed.xml
> * This is the old page.  I'm leaving this backward compatibility so that the 
> old links don't break, although the only thing this says now is the content 
> has been moved to the RefGuide
> site.xml
> * Removing pseudo-dist extras from the left-hand nav.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6019) [refGuide] ported pseudo-distributed.html page to RefGuide Config chapter

2012-05-16 Thread Doug Meil (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-6019:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> [refGuide] ported pseudo-distributed.html page to RefGuide Config chapter
> -
>
> Key: HBASE-6019
> URL: https://issues.apache.org/jira/browse/HBASE-6019
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: hbase_hbase_6019.patch
>
>
> Ported the separate pseudo-distributed.html page that existed outside the 
> RefGuie into the Configuration chapter.
> configuration.xml
> * Added an example of a local pseudo-dist HDFS config file (the refguide 
> didn't have this)
> * Ported pseudo-dist extras to pseudo-dist section.
> pseudo-distributed.xml
> * This is the old page.  I'm leaving this backward compatibility so that the 
> old links don't break, although the only thing this says now is the content 
> has been moved to the RefGuide
> site.xml
> * Removing pseudo-dist extras from the left-hand nav.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6018) hbck fails with a RejectedExecutionException

2012-05-16 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276813#comment-13276813
 ] 

Jonathan Hsieh commented on HBASE-6018:
---

When there are < MAX_NUM_THREADS regions, the SynchronousQueue version works, 
when it there are > MAX_NUM_THREADS regions it fails with the 
RejectedExectionException.

A workaround is to add hbase-site.xml hbasefsck.numthreads property set to a 
value larger than the number of regions in your hbase instance.  (you can 
purposely set it low to trigger the problem).


> hbck fails with a RejectedExecutionException
> 
>
> Key: HBASE-6018
> URL: https://issues.apache.org/jira/browse/HBASE-6018
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-6018.patch
>
>
> On a long running job 0.94.0rc3 cluster, we get to a point where hbck 
> consistently encounters this error and fails:
> {code}
> Exception in thread "main" java.util.concurrent.RejectedExecutionException
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionInfos(HBaseFsck.java:633)
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:354)
>   at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382)
>   at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Derek Wollenstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek Wollenstein updated HBASE-5920:
-

Attachment: HBASE-5920-trunk.patch

Thanks @zhihong - I've re-created my patch against trunk.  I've also removed 
some of the refactoring of unit tests since it looks like HBaseTestCase is 
continuing to see expanded use (and I don't want to mix a test refactoring with 
a compaction change)

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1-1.patch, HBASE-5920-0.92.1.patch, 
> HBASE-5920-trunk.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea for other reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276839#comment-13276839
 ] 

Hadoop QA commented on HBASE-5920:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12527641/HBASE-5920-trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1889//console

This message is automatically generated.

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1-1.patch, HBASE-5920-0.92.1.patch, 
> HBASE-5920-trunk.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see s

[jira] [Commented] (HBASE-6018) hbck fails with a RejectedExecutionException

2012-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276844#comment-13276844
 ] 

Hadoop QA commented on HBASE-6018:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12527633/hbase-6018.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 31 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1888//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1888//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1888//console

This message is automatically generated.

> hbck fails with a RejectedExecutionException
> 
>
> Key: HBASE-6018
> URL: https://issues.apache.org/jira/browse/HBASE-6018
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-6018.patch
>
>
> On a long running job 0.94.0rc3 cluster, we get to a point where hbck 
> consistently encounters this error and fails:
> {code}
> Exception in thread "main" java.util.concurrent.RejectedExecutionException
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionInfos(HBaseFsck.java:633)
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:354)
>   at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382)
>   at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Derek Wollenstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek Wollenstein updated HBASE-5920:
-

Attachment: (was: HBASE-5920-trunk.patch)

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1-1.patch, HBASE-5920-0.92.1.patch, 
> HBASE-5920-trunk.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea for other reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Derek Wollenstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek Wollenstein updated HBASE-5920:
-

Attachment: HBASE-5920-trunk.patch

That was odd -- the patch in the jenkins log was empty -- trying to upload again

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1-1.patch, HBASE-5920-0.92.1.patch, 
> HBASE-5920-trunk.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea for other reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6019) [refGuide] ported pseudo-distributed.html page to RefGuide Config chapter

2012-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276878#comment-13276878
 ] 

Hudson commented on HBASE-6019:
---

Integrated in HBase-TRUNK #2888 (See 
[https://builds.apache.org/job/HBase-TRUNK/2888/])
hbase-6019.  porting pseudo-dist html page to RefGuide Config chapter. 
(Revision 1339225)

 Result = FAILURE

> [refGuide] ported pseudo-distributed.html page to RefGuide Config chapter
> -
>
> Key: HBASE-6019
> URL: https://issues.apache.org/jira/browse/HBASE-6019
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: hbase_hbase_6019.patch
>
>
> Ported the separate pseudo-distributed.html page that existed outside the 
> RefGuie into the Configuration chapter.
> configuration.xml
> * Added an example of a local pseudo-dist HDFS config file (the refguide 
> didn't have this)
> * Ported pseudo-dist extras to pseudo-dist section.
> pseudo-distributed.xml
> * This is the old page.  I'm leaving this backward compatibility so that the 
> old links don't break, although the only thing this says now is the content 
> has been moved to the RefGuide
> site.xml
> * Removing pseudo-dist extras from the left-hand nav.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5453) Switch on-disk formats (reference files, HFile meta fields, etc) to PB

2012-05-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5453:
-

Attachment: 5453v9.txt

Address Gregory's comments.

I've changed the format of .regioninfo and .tableinfo.  Now instead of 
serialized Writable followed by toString of the serialized object, instead its 
just the serialized pb.

This removes our having a human readable .regioninfo/.tablinfo file but my 
guess no one relied on this anyways.

Having just serialized content in the file means a check of file length should 
be enough figuring whether the file properly serialized.If ever a chance 
that a Writable + its toString + two '\n' characters was equal to a serialized 
pb, I'd think this likely a pathological state.  If this state is not cleared 
up 'naturally' by splits or a schema change, then lets deal if it happens.

I only need this length-checking in one place on region open.  I want to avoid 
reading the .regioninfo file on region open.  The alternative means more load 
on NN and DNs at region open time which could be problematic at big-bang 
cluster start (Thinking 500 nodes w/ 80k regions, an actual known case).

Otherwise, Gregory's comments led to me to check and I was missing convertion 
of fs files to pb in all cases.  This should be fixed now.

There are some failing tests still but running by hadoopqa to see what it says 
anyways.  Also putting up on rb to get feedback if problem w/ this approach.

> Switch on-disk formats (reference files, HFile meta fields, etc) to PB
> --
>
> Key: HBASE-5453
> URL: https://issues.apache.org/jira/browse/HBASE-5453
> Project: HBase
>  Issue Type: Sub-task
>  Components: ipc, master, migration, regionserver
>Reporter: Todd Lipcon
>Assignee: stack
> Attachments: 5453.txt, 5453v2.txt, 5453v3.txt, 5453v6.txt, 5453v9.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5453) Switch on-disk formats (reference files, HFile meta fields, etc) to PB

2012-05-16 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276887#comment-13276887
 ] 

jirapos...@reviews.apache.org commented on HBASE-5453:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5130/
---

(Updated 2012-05-16 17:02:35.527708)


Review request for hbase.


Changes
---

Gregory.  I addressed your comments.

I've changed the format of .regioninfo and .tableinfo. Now instead of 
serialized Writable followed by toString of the serialized object, instead its 
just the serialized pb. This removes our having a human readable 
.regioninfo/.tablinfo file but my guess no one relied on this anyways.

Having just serialized content in the file means a check of file length should 
be enough figuring whether the file properly serialized. If ever a chance that 
a Writable + its toString + two '\n' characters was equal to a serialized pb, 
I'd think this a pathological state. If this state is not cleared up 
'naturally' by splits or a schema change, then lets deal if it happens.

I only need this length-checking in one place on region open. I want to avoid 
reading the .regioninfo file on region open. The alternative means more load on 
NN and DNs at region open time which could be problematic at big-bang cluster 
start (Thinking 500 nodes w/ 80k regions, an actual known case -- this is the 
case I have in mind when I'm trying to avoid more load on NN/DNs).

Otherwise, Gregory's comments led to me to check and I was missing convertion 
of fs files to pb in all cases (I was just reading the clusterid and 
hbase.version files, not converting if still Writable). This should be fixed 
now.

There are some failing tests still but running by hadoopqa to see what it says 
anyways. Also putting up on rb to get feedback if problem w/ this approach.


Summary
---

A b/src/main/java/org/apache/hadoop/hbase/ClusterId.java
  New  class to hold clusterid in.
M b/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
  Make it so can do pb serialization.  Deprecated Writable serialization.
M b/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
  Make it so methods in here follow the pattern in HCD an HTD pb 'ing.
  Deprecated Writable serialization.
M b/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
  Make it so can do pb serialization.  Deprecated Writable serialization.
M b/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
  ClusterId under ZK got renamed as ZKClusterId
M b/src/main/java/org/apache/hadoop/hbase/io/Reference.java
  Hide the Reference#Range enums.  Don't let them out of this class.
  Make it so can do pb serialization.
M b/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
  Use new methods on Reference for getting top and bottom.
M b/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
  ClusterId under zk has been renamed ZKClusterId.
  Use new ClusterId class too.
M b/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
  Use new clusterid class.
M b/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
  Move the RegionInfo convertion up into HRegionInfo instead of here.
  Added generic toDelimitedByteArray helper.
M b/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
  Use HRegionInfo convertions instead.
M b/src/main/java/org/apache/hadoop/hbase/protobuf/ResponseConverter.java
  Use HRegionInfo convertions instead.
M b/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  Use new utility writing out .regioninfo files.
M b/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  Formatting.
M b/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
M b/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  Range in Reference is no longer public.
  Range in Reference is no longer public.
M 
b/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
M 
b/src/main/java/org/apache/hadoop/hbase/security/token/AuthenticationTokenSecretManager.java
  ClusterId got renamed ZKClusterId
M b/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
  Use new serialization utlity in HTD.
M  b/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java
  Generic method for writing dot file content.
M b/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
  Reference#Range is not public any more
M b/src/main/java/org/apache/hadoop/hbase/util/Writables.java
  Deprecated getHRegionInfo, etc.
D b/src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterId.java
A b/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKClusterId.java
  Rename
A b/src/main/protobuf/ClusterId.proto
  Added file for ClusterId only since

[jira] [Commented] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276889#comment-13276889
 ] 

Hadoop QA commented on HBASE-5920:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12527649/HBASE-5920-trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 31 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.master.TestSplitLogManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1890//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1890//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1890//console

This message is automatically generated.

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1-1.patch, HBASE-5920-0.92.1.patch, 
> HBASE-5920-trunk.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  

[jira] [Commented] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276891#comment-13276891
 ] 

Zhihong Yu commented on HBASE-5920:
---

@Nicolas:
Can you take a look at the patch ?

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1-1.patch, HBASE-5920-0.92.1.patch, 
> HBASE-5920-trunk.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea for other reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6011) Unable to start master in local mode

2012-05-16 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-6011:
--

Status: Open  (was: Patch Available)

The test failures are unrelated but I'll add a test.

> Unable to start master in local mode
> 
>
> Key: HBASE-6011
> URL: https://issues.apache.org/jira/browse/HBASE-6011
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Attachments: 6011.patch
>
>
> Got this trying to launch head of 0.94 branch in local mode from the build 
> tree but it happens with trunk and 0.92 too:
> {noformat}
> 12/05/15 19:35:45 ERROR master.HMasterCommandLine: Failed to start master
> java.lang.ClassCastException: org.apache.hadoop.hbase.master.HMaster cannot 
> be cast to org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster
>   at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:142)
>   at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:103)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>   at 
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
>   at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1761)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5546) Master assigns region in the original region server when opening region failed

2012-05-16 Thread Ashutosh Jindal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Jindal updated HBASE-5546:
---

Attachment: hbase-5546_3.patch

@Ted
Attached patch with changes suggested.With this change if there is only one RS 
then new plan will not be created.

> Master assigns region in the original region server when opening region 
> failed  
> 
>
> Key: HBASE-5546
> URL: https://issues.apache.org/jira/browse/HBASE-5546
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Assignee: Ashutosh Jindal
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: hbase-5546.patch, hbase-5546_1.patch, 
> hbase-5546_2.patch, hbase-5546_3.patch
>
>
> Master assigns region in the original region server when 
> RS_ZK_REGION_FAILED_OPEN envent was coming.
> Maybe we should choose other region server.
> [2012-03-07 10:14:21,750] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:31,826] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:41,903] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:51,975] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:02,056] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:12,167] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:22,231] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:32,303] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:42,375] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:52,447] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:16:02,528] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:16:12,600] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:16:22,676] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5882) Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor

2012-05-16 Thread Ashutosh Jindal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Jindal updated HBASE-5882:
---

Attachment: hbase_5882.patch

Submitted patch for 0.96. Please review and provide your suggestions/comments.

> Prcoess RIT on master restart can try assigning the region if the region is 
> found on a dead server instead of waiting for Timeout Monitor
> -
>
> Key: HBASE-5882
> URL: https://issues.apache.org/jira/browse/HBASE-5882
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.6, 0.92.1
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: hbase_5882.patch
>
>
> Currently on  master restart if it tries to do processRIT, any region if 
> found on dead server tries to avoid the nwe assignment so that timeout 
> monitor can take care.
> This case is more prominent if the node is found in RS_ZK_REGION_OPENING 
> state. I think we can handle this by triggering a new assignment with a new 
> plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5882) Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor

2012-05-16 Thread Ashutosh Jindal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Jindal updated HBASE-5882:
---

Fix Version/s: (was: 0.94.1)
   (was: 0.96.0)
   (was: 0.90.7)
 Hadoop Flags: Reviewed
   Status: Patch Available  (was: Open)

> Prcoess RIT on master restart can try assigning the region if the region is 
> found on a dead server instead of waiting for Timeout Monitor
> -
>
> Key: HBASE-5882
> URL: https://issues.apache.org/jira/browse/HBASE-5882
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.1, 0.90.6
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: hbase_5882.patch
>
>
> Currently on  master restart if it tries to do processRIT, any region if 
> found on dead server tries to avoid the nwe assignment so that timeout 
> monitor can take care.
> This case is more prominent if the node is found in RS_ZK_REGION_OPENING 
> state. I think we can handle this by triggering a new assignment with a new 
> plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5959) Add other load balancers

2012-05-16 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276923#comment-13276923
 ] 

Phabricator commented on HBASE-5959:


tedyu has commented on the revision "HBASE-5959 [jira] Add other load 
balancers".

  More comments to follow.

INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:251
 Do we need to re-fetch these config parameters in each iteration ?
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:44
 There're extraneous empty lines such as this one.

  Please remove them.
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:78
 'one cluster' -> 'cluster with one server'
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:80
 "it's" -> "cluster has"
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:197
 'use' -> 'uses'
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:202
 Please add javadoc
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:204
 Typo: 'pickRandmoRegion' -> 'pickRandomRegion'
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:144
 Remove empty line.
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:154
 'plan' -> 'plans'
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:224
 Please finish the sentence.
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:226
 Specify what is returned.
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:241
 'balancer.stochastic' -> 'stochastic.balancer'
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:256
 I think locality cost should be given higher weight.

REVISION DETAIL
  https://reviews.facebook.net/D3189

To: JIRA, eclark
Cc: tedyu


> Add other load balancers
> 
>
> Key: HBASE-5959
> URL: https://issues.apache.org/jira/browse/HBASE-5959
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 0.96.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-5959-0.patch, HBASE-5959-1.patch, 
> HBASE-5959-2.patch, HBASE-5959-3.patch, HBASE-5959-6.patch, 
> HBASE-5959-7.patch, HBASE-5959.D3189.1.patch, HBASE-5959.D3189.2.patch, 
> HBASE-5959.D3189.3.patch, HBASE-5959.D3189.4.patch
>
>
> Now that balancers are pluggable we should give some options.b

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5546) Master assigns region in the original region server when opening region failed

2012-05-16 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276932#comment-13276932
 ] 

Zhihong Yu commented on HBASE-5546:
---

Patch looks good.
Minor comment:
{code}
+  // Here a new region plan is formed with a different destination 
regionServer and 
+  // updated in the regionPlans. But a new regionPlan is formed only 
if more than
{code}
The above two sentences are not consistent. Please remove '. But a new 
regionPlan is formed'

> Master assigns region in the original region server when opening region 
> failed  
> 
>
> Key: HBASE-5546
> URL: https://issues.apache.org/jira/browse/HBASE-5546
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Assignee: Ashutosh Jindal
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: hbase-5546.patch, hbase-5546_1.patch, 
> hbase-5546_2.patch, hbase-5546_3.patch
>
>
> Master assigns region in the original region server when 
> RS_ZK_REGION_FAILED_OPEN envent was coming.
> Maybe we should choose other region server.
> [2012-03-07 10:14:21,750] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:31,826] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:41,903] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:51,975] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:02,056] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:12,167] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:22,231] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:32,303] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:42,375] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:52,447] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:16:02,528] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:16:12,600] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:16:22,676] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5959) Add other load balancers

2012-05-16 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276942#comment-13276942
 ] 

Phabricator commented on HBASE-5959:


eclark has commented on the revision "HBASE-5959 [jira] Add other load 
balancers".

  I'll get the config storing in the next version

INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:241
 There are already config paramaters with balancer.  we should keep 
the hierarchy.
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:256
 And I actually think it should be given a lower weight as the table cost is 
something that represents an on going cost to the cluster and the locality cost 
is a one time transfer.  The impetus for doing this work was that we saw 
production clusters that were no balancing because the old balancer did not 
want to move regions around to balance the number of regions per table 
(Something I've seen on several different production clusters now; a real issue 
on anything that has big batch jobs).  Placing locality's weight above table 
cost would just mean that the same thing could happen.

  However I think the middle ground combined with the ability to change  per 
install is enough without testing on production clusters.
  
src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java:251
 Nope I'll add that in my next version.

REVISION DETAIL
  https://reviews.facebook.net/D3189

To: JIRA, eclark
Cc: tedyu


> Add other load balancers
> 
>
> Key: HBASE-5959
> URL: https://issues.apache.org/jira/browse/HBASE-5959
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 0.96.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-5959-0.patch, HBASE-5959-1.patch, 
> HBASE-5959-2.patch, HBASE-5959-3.patch, HBASE-5959-6.patch, 
> HBASE-5959-7.patch, HBASE-5959.D3189.1.patch, HBASE-5959.D3189.2.patch, 
> HBASE-5959.D3189.3.patch, HBASE-5959.D3189.4.patch
>
>
> Now that balancers are pluggable we should give some options.b

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement

2012-05-16 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276941#comment-13276941
 ] 

Phabricator commented on HBASE-5987:


mbautin has commented on the revision "[jira][89-fb] [HBASE-5987] 
HFileBlockIndex improvement".

  Looks good! A few minor comments inline. Also please submit the diff with 
lint (using "arc diff --preview" instead of "arc diff --only")/

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/HConstants.java:545 Please add a 
comment that the actual value is irrelevant because this is always compared by 
reference.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java:437-440 
This documentation is still confusing. Is i "the ith position", or is the 
actual key "the ith position"? I would say i is the "position" and the returned 
key is the "key at the ith position".
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:413 Clarify 
the meaning of "is equal", i.e. that it must be exactly the same object, not 
just an equal byte array.
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:63 
This is unnecessary (we don't use compression by default).
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:77 
It is not "schemMetricSnapshot", it is "schemaMetricSnapshot" ("schem" is not a 
word).

REVISION DETAIL
  https://reviews.facebook.net/D3237

To: Kannan, mbautin, Liyin
Cc: JIRA, todd, tedyu


> HFileBlockIndex improvement
> ---
>
> Key: HBASE-5987
> URL: https://issues.apache.org/jira/browse/HBASE-5987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: D3237.1.patch, D3237.2.patch, 
> screen_shot_of_sequential_scan_profiling.png
>
>
> Recently we find out a performance problem that it is quite slow when 
> multiple requests are reading the same block of data or index. 
> From the profiling, one of the causes is the IdLock contention which has been 
> addressed in HBASE-5898. 
> Another issue is that the HFileScanner will keep asking the HFileBlockIndex 
> about the data block location for each target key value during the scan 
> process(reSeekTo), even though the target key value has already been in the 
> current data block. This issue will cause certain index block very HOT, 
> especially when it is a sequential scan.
> To solve this issue, we propose the following solutions:
> First, we propose to lookahead for one more block index so that the 
> HFileScanner would know the start key value of next data block. So if the 
> target key value for the scan(reSeekTo) is "smaller" than that start kv of 
> next data block, it means the target key value has a very high possibility in 
> the current data block (if not in current data block, then the start kv of 
> next data block should be returned. +Indexing on the start key has some 
> defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
> the contrary, if the target key value is "bigger", then it shall query the 
> HFileBlockIndex. This improvement shall help to reduce the hotness of 
> HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
> Cache lookup.
> Secondary, we propose to push this idea a little further that the 
> HFileBlockIndex shall index on the last key value of each data block instead 
> of indexing on the start key value. The motivation is to solve the HBASE-4443 
> issue (avoid seeking to "previous" block when key you are interested in is 
> the first one of a block) as well as +the defects mentioned above+.
> For example, if the target key value is "smaller" than the start key value of 
> the data block N. There is no way for sure the target key value is in the 
> data block N or N-1. So it has to seek from data block N-1. However, if the 
> block index is based on the last key value for each data block and the target 
> key value is beween the last key value of data block N-1 and data block N, 
> then the target key value is supposed be data block N for sure. 
> As long as HBase only supports the forward scan, the last key value makes 
> more sense to be indexed on than the start key value. 
> Thanks Kannan and Mikhail for the insightful discussions and suggestions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6011) Unable to start master in local mode

2012-05-16 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-6011:
--

Status: Patch Available  (was: Open)

> Unable to start master in local mode
> 
>
> Key: HBASE-6011
> URL: https://issues.apache.org/jira/browse/HBASE-6011
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Attachments: 6011-v2.patch, 6011.patch
>
>
> Got this trying to launch head of 0.94 branch in local mode from the build 
> tree but it happens with trunk and 0.92 too:
> {noformat}
> 12/05/15 19:35:45 ERROR master.HMasterCommandLine: Failed to start master
> java.lang.ClassCastException: org.apache.hadoop.hbase.master.HMaster cannot 
> be cast to org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster
>   at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:142)
>   at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:103)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>   at 
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
>   at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1761)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6011) Unable to start master in local mode

2012-05-16 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-6011:
--

Attachment: 6011-v2.patch

v2 includes a test that passes locally for me.

> Unable to start master in local mode
> 
>
> Key: HBASE-6011
> URL: https://issues.apache.org/jira/browse/HBASE-6011
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Attachments: 6011-v2.patch, 6011.patch
>
>
> Got this trying to launch head of 0.94 branch in local mode from the build 
> tree but it happens with trunk and 0.92 too:
> {noformat}
> 12/05/15 19:35:45 ERROR master.HMasterCommandLine: Failed to start master
> java.lang.ClassCastException: org.apache.hadoop.hbase.master.HMaster cannot 
> be cast to org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster
>   at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:142)
>   at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:103)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>   at 
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
>   at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1761)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5954) Allow proper fsync support for HBase

2012-05-16 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276950#comment-13276950
 ] 

Luke Lu commented on HBASE-5954:


Thanks for the numbers, Lars! Are you using ext3? I wonder what the numbers 
would look like if you enable barrier=1 in the mount options or just use ext4 
(with barrier turned on by default). If the underlying fs doesn't do barrier, 
the result is somewhat meaningless (you might as well use hflush).

> Allow proper fsync support for HBase
> 
>
> Key: HBASE-5954
> URL: https://issues.apache.org/jira/browse/HBASE-5954
> Project: HBase
>  Issue Type: Improvement
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 5954-trunk-hdfs-trunk-v2.txt, 
> 5954-trunk-hdfs-trunk-v3.txt, 5954-trunk-hdfs-trunk.txt, hbase-hdfs-744.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6002) Possible chance of resource leak in HlogSplitter

2012-05-16 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HBASE-6002:


Attachment: HBASE-6002_trunk.patch
HBASE-6002_0.94_1.patch

> Possible chance of resource leak in HlogSplitter
> 
>
> Key: HBASE-6002
> URL: https://issues.apache.org/jira/browse/HBASE-6002
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.94.0
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HBASE-6002.patch, HBASE-6002_0.94_1.patch, 
> HBASE-6002_trunk.patch
>
>
> In HLogSplitter.splitLogFileToTemp-Reader(in) is not closed and in finally 
> block in loop while closing the writers(wap.w) if any exception comes other 
> writers won't close.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6002) Possible chance of resource leak in HlogSplitter

2012-05-16 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HBASE-6002:


Affects Version/s: 0.96.0
   Status: Patch Available  (was: Open)

> Possible chance of resource leak in HlogSplitter
> 
>
> Key: HBASE-6002
> URL: https://issues.apache.org/jira/browse/HBASE-6002
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HBASE-6002.patch, HBASE-6002_0.94_1.patch, 
> HBASE-6002_trunk.patch
>
>
> In HLogSplitter.splitLogFileToTemp-Reader(in) is not closed and in finally 
> block in loop while closing the writers(wap.w) if any exception comes other 
> writers won't close.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5959) Add other load balancers

2012-05-16 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276955#comment-13276955
 ] 

Zhihong Yu commented on HBASE-5959:
---

bq. we saw production clusters that were no balancing because the old balancer 
did not want to move regions around to balance the number of regions per table 
I think the above was due to lack of per table load balancing in 0.92

> Add other load balancers
> 
>
> Key: HBASE-5959
> URL: https://issues.apache.org/jira/browse/HBASE-5959
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 0.96.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-5959-0.patch, HBASE-5959-1.patch, 
> HBASE-5959-2.patch, HBASE-5959-3.patch, HBASE-5959-6.patch, 
> HBASE-5959-7.patch, HBASE-5959.D3189.1.patch, HBASE-5959.D3189.2.patch, 
> HBASE-5959.D3189.3.patch, HBASE-5959.D3189.4.patch
>
>
> Now that balancers are pluggable we should give some options.b

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5882) Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor

2012-05-16 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276963#comment-13276963
 ] 

Zhihong Yu commented on HBASE-5882:
---

Idea is good.
{code}
+  private boolean wasOpeningOnDeadServer(ServerName sn,
+  Map>> deadServers) {
+if (deadServers.keySet().contains(sn)) {
{code}
The above method doesn't check whether regionInfo is in opening state. So the 
name of method should be changed accordingly.

> Prcoess RIT on master restart can try assigning the region if the region is 
> found on a dead server instead of waiting for Timeout Monitor
> -
>
> Key: HBASE-5882
> URL: https://issues.apache.org/jira/browse/HBASE-5882
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.6, 0.92.1
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: hbase_5882.patch
>
>
> Currently on  master restart if it tries to do processRIT, any region if 
> found on dead server tries to avoid the nwe assignment so that timeout 
> monitor can take care.
> This case is more prominent if the node is found in RS_ZK_REGION_OPENING 
> state. I think we can handle this by triggering a new assignment with a new 
> plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6020) Publish Hbase jars compiled against Hadoop-23/2.0

2012-05-16 Thread Hari Shreedharan (JIRA)
Hari Shreedharan created HBASE-6020:
---

 Summary: Publish Hbase jars compiled against Hadoop-23/2.0
 Key: HBASE-6020
 URL: https://issues.apache.org/jira/browse/HBASE-6020
 Project: HBase
  Issue Type: Improvement
  Components: build
Reporter: Hari Shreedharan


Projects like Flume support both Hadoop 1.0 and Hadoop 23/2.0. But there are no 
hbase jars compiled against Hadoop 23/2.0 published, causing problems for our 
hadoop-23 profile builds/tests. Please publish jars compiled against hadoop-23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2012-05-16 Thread Mubarak Seyed (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276971#comment-13276971
 ] 

Mubarak Seyed commented on HBASE-4720:
--

Thanks Andy. Will fix your comments in test code and submit a patch. Thanks.

> Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
> client/server 
> 
>
> Key: HBASE-4720
> URL: https://issues.apache.org/jira/browse/HBASE-4720
> Project: HBase
>  Issue Type: Improvement
>Reporter: Daniel Lord
>Assignee: Mubarak Seyed
> Fix For: 0.94.1
>
> Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, 
> HBASE-4720.trunk.v3.patch, HBASE-4720.trunk.v4.patch, 
> HBASE-4720.trunk.v5.patch, HBASE-4720.trunk.v6.patch, 
> HBASE-4720.trunk.v7.patch, HBASE-4720.v1.patch, HBASE-4720.v3.patch
>
>
> I have several large application/HBase clusters where an application node 
> will occasionally need to talk to HBase from a different cluster.  In order 
> to help ensure some of my consistency guarantees I have a sentinel table that 
> is updated atomically as users interact with the system.  This works quite 
> well for the "regular" hbase client but the REST client does not implement 
> the checkAndPut and checkAndDelete operations.  This exposes the application 
> to some race conditions that have to be worked around.  It would be ideal if 
> the same checkAndPut/checkAndDelete operations could be supported by the REST 
> client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5920) New Compactions Logic can silently prevent user-initiated compactions from occurring

2012-05-16 Thread Derek Wollenstein (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276973#comment-13276973
 ] 

Derek Wollenstein commented on HBASE-5920:
--

Is there anything I can be doing better on my side?  I have been running tests 
on my side iwhtout running into these issues, but I'm clearly doing something 
wrong

> New Compactions Logic can silently prevent user-initiated compactions from 
> occurring
> 
>
> Key: HBASE-5920
> URL: https://issues.apache.org/jira/browse/HBASE-5920
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.92.1
>Reporter: Derek Wollenstein
>Priority: Minor
>  Labels: compaction
> Attachments: HBASE-5920-0.92.1-1.patch, HBASE-5920-0.92.1.patch, 
> HBASE-5920-trunk.patch
>
>
> There seem to be some tuning settings in which manually triggered major 
> compactions will do nothing, including loggic
> From Store.java in the function
>   List compactSelection(List candidates)
> When a user manually triggers a compaction, this follows the same logic as a 
> normal compaction check.  when a user manually triggers a major compaction, 
> something similar happens.  Putting this all together:
> 1. If a user triggers a major compaction, this is checked against a max files 
> threshold (hbase.hstore.compaction.max). If the number of storefiles to 
> compact is > max files, then we downgrade to a minor compaction
> 2. If we are in a minor compaction, we do the following checks:
>a. If the file is less than a minimum size 
> (hbase.hstore.compaction.min.size) we automatically include it
>b. Otherwise, we check how the size compares to the next largest size.  
> based on hbase.hstore.compaction.ratio.  
>   c. If the number of files included is less than a minimum count 
> (hbase.hstore.compaction.min) then don't compact.
> In many of the exit strategies, we aren't seeing an error message.
> The net-net of this is that if we have a mix of very large and very small 
> files, we may end up having too many files to do a major compact, but too few 
> files to do a minor compact.
> I'm trying to go through and see if I'm understanding things correctly, but 
> this seems like the bug
> To put it another way
> 2012-05-02 20:09:36,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction 
> requested: 
> regionName=str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.,
>  store
> Name=c, fileCount=15, fileSize=1.5g (20.2k, 362.5m, 155.3k, 3.0m, 30.7k, 
> 361.2m, 6.9m, 4.7m, 14.7k, 363.4m, 30.9m, 3.2m, 7.3k, 362.9m, 23.5m), 
> priority=-9, time=3175046817624398; Because: Recursive enqueue; 
> compaction_queue=(59:0), split_queue=0
> When we had a minimum compaction size of 128M, and default settings for 
> hbase.hstore.compaction.min,hbase.hstore.compaction.max,hbase.hstore.compaction.ratio,
>  we were not getting a compaction to run even if we ran
> major_compact 
> 'str,44594594594594592,1334939064521.f7aed25b55d4d7988af763bede9ce74e.' from 
> the ruby shell.  Note that we had many tiny regions (20k, 155k, 3m, 30k,..) 
> and several large regions (362.5m,361.2m,363.4m,362.9m).  I think the bimodal 
> nature of the sizes prevented us from doing a compaction.
> I'm not 100% sure where this errored out because when I manually triggered a 
> compaction, I did not see
> '  // if we don't have enough files to compact, just wait 
>   if (filesToCompact.size() < this.minFilesToCompact) {  
> if (LOG.isDebugEnabled()) {  
>   LOG.debug("Skipped compaction of " + this.storeNameStr 
> + ".  Only " + (end - start) + " file(s) of size "   
> + StringUtils.humanReadableInt(totalSize)
> + " have met compaction criteria."); 
> }
> ' 
> being printed in the logs (and I know DEBUG logging was enabled because I saw 
> this elsewhere).  
> I'd be happy with better error messages when we decide not to compact for 
> user enabled compactions.
> I'd also like to see some override that says "user triggered major compaction 
> always occurs", but maybe that's a bad idea for other reasons.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5546) Master assigns region in the original region server when opening region failed

2012-05-16 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276972#comment-13276972
 ] 

ramkrishna.s.vasudevan commented on HBASE-5546:
---

@Ted
I can remove that on commit, if it is ok with you?

> Master assigns region in the original region server when opening region 
> failed  
> 
>
> Key: HBASE-5546
> URL: https://issues.apache.org/jira/browse/HBASE-5546
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Assignee: Ashutosh Jindal
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: hbase-5546.patch, hbase-5546_1.patch, 
> hbase-5546_2.patch, hbase-5546_3.patch
>
>
> Master assigns region in the original region server when 
> RS_ZK_REGION_FAILED_OPEN envent was coming.
> Maybe we should choose other region server.
> [2012-03-07 10:14:21,750] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:31,826] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:41,903] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:51,975] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:02,056] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:12,167] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:22,231] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:32,303] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:42,375] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:52,447] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:16:02,528] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:16:12,600] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:16:22,676] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5882) Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor

2012-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276975#comment-13276975
 ] 

Hadoop QA commented on HBASE-5882:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12527664/hbase_5882.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 31 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestSplitLogManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1891//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1891//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1891//console

This message is automatically generated.

> Prcoess RIT on master restart can try assigning the region if the region is 
> found on a dead server instead of waiting for Timeout Monitor
> -
>
> Key: HBASE-5882
> URL: https://issues.apache.org/jira/browse/HBASE-5882
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.6, 0.92.1
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: hbase_5882.patch
>
>
> Currently on  master restart if it tries to do processRIT, any region if 
> found on dead server tries to avoid the nwe assignment so that timeout 
> monitor can take care.
> This case is more prominent if the node is found in RS_ZK_REGION_OPENING 
> state. I think we can handle this by triggering a new assignment with a new 
> plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5546) Master assigns region in the original region server when opening region failed

2012-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276976#comment-13276976
 ] 

Hadoop QA commented on HBASE-5546:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12527663/hbase-5546_3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 31 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.coprocessor.TestMasterObserver

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1892//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1892//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1892//console

This message is automatically generated.

> Master assigns region in the original region server when opening region 
> failed  
> 
>
> Key: HBASE-5546
> URL: https://issues.apache.org/jira/browse/HBASE-5546
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Assignee: Ashutosh Jindal
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: hbase-5546.patch, hbase-5546_1.patch, 
> hbase-5546_2.patch, hbase-5546_3.patch
>
>
> Master assigns region in the original region server when 
> RS_ZK_REGION_FAILED_OPEN envent was coming.
> Maybe we should choose other region server.
> [2012-03-07 10:14:21,750] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:31,826] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:41,903] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:51,975] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:02,056] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:12,167] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:22,231] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:32,303] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:42,375] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:52,447] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:16:02,528] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2

[jira] [Commented] (HBASE-5882) Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor

2012-05-16 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276981#comment-13276981
 ] 

ramkrishna.s.vasudevan commented on HBASE-5882:
---

@Ted
Is the name 'wasOnDeadServer' ok?
But the name was given lik that because this change is done for RS_ZK_OPENING 
state. Based on your suggestion i can change it and commit it.

> Prcoess RIT on master restart can try assigning the region if the region is 
> found on a dead server instead of waiting for Timeout Monitor
> -
>
> Key: HBASE-5882
> URL: https://issues.apache.org/jira/browse/HBASE-5882
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.6, 0.92.1
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: hbase_5882.patch
>
>
> Currently on  master restart if it tries to do processRIT, any region if 
> found on dead server tries to avoid the nwe assignment so that timeout 
> monitor can take care.
> This case is more prominent if the node is found in RS_ZK_REGION_OPENING 
> state. I think we can handle this by triggering a new assignment with a new 
> plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2012-05-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276983#comment-13276983
 ] 

Andrew Purtell commented on HBASE-4720:
---

While you are in there Mubarak please make sure that the tests also cover put 
and delete operations without '?check=...'.  

> Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
> client/server 
> 
>
> Key: HBASE-4720
> URL: https://issues.apache.org/jira/browse/HBASE-4720
> Project: HBase
>  Issue Type: Improvement
>Reporter: Daniel Lord
>Assignee: Mubarak Seyed
> Fix For: 0.94.1
>
> Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, 
> HBASE-4720.trunk.v3.patch, HBASE-4720.trunk.v4.patch, 
> HBASE-4720.trunk.v5.patch, HBASE-4720.trunk.v6.patch, 
> HBASE-4720.trunk.v7.patch, HBASE-4720.v1.patch, HBASE-4720.v3.patch
>
>
> I have several large application/HBase clusters where an application node 
> will occasionally need to talk to HBase from a different cluster.  In order 
> to help ensure some of my consistency guarantees I have a sentinel table that 
> is updated atomically as users interact with the system.  This works quite 
> well for the "regular" hbase client but the REST client does not implement 
> the checkAndPut and checkAndDelete operations.  This exposes the application 
> to some race conditions that have to be worked around.  It would be ideal if 
> the same checkAndPut/checkAndDelete operations could be supported by the REST 
> client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6001) Upgrade slf4j to 1.6.1

2012-05-16 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276985#comment-13276985
 ] 

Jimmy Xiang commented on HBASE-6001:


@Andrew, is the patch good to you?  I tried mvn -DskipTest clean package with 
different hadoop profile and got right slf4j jar used.
Unit tests seem to be fine for different profile too (I tried 1.0.2 the 
default, and 0.23).

> Upgrade slf4j to 1.6.1
> --
>
> Key: HBASE-6001
> URL: https://issues.apache.org/jira/browse/HBASE-6001
> Project: HBase
>  Issue Type: Task
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Attachments: hbase-6001.patch
>
>
> We need to upgrade slf4j to 1.6.1 since other hadoop components use 1.6.1 now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5546) Master assigns region in the original region server when opening region failed

2012-05-16 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276989#comment-13276989
 ] 

Zhihong Yu commented on HBASE-5546:
---

Modifying comment on commit is fine.

> Master assigns region in the original region server when opening region 
> failed  
> 
>
> Key: HBASE-5546
> URL: https://issues.apache.org/jira/browse/HBASE-5546
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Assignee: Ashutosh Jindal
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: hbase-5546.patch, hbase-5546_1.patch, 
> hbase-5546_2.patch, hbase-5546_3.patch
>
>
> Master assigns region in the original region server when 
> RS_ZK_REGION_FAILED_OPEN envent was coming.
> Maybe we should choose other region server.
> [2012-03-07 10:14:21,750] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:31,826] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:41,903] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:14:51,975] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:02,056] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:12,167] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:22,231] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:32,303] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:42,375] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:15:52,447] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:16:02,528] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:16:12,600] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053
> [2012-03-07 10:16:22,676] [DEBUG] [main-EventThread] 
> [org.apache.hadoop.hbase.master.AssignmentManager 553] Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=158-1-130-11,20020,1331108408232, 
> region=c70e98bdca98a0657a56436741523053

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6001) Upgrade slf4j to 1.6.1

2012-05-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276990#comment-13276990
 ] 

Andrew Purtell commented on HBASE-6001:
---

+1 Jimmy, thanks.

> Upgrade slf4j to 1.6.1
> --
>
> Key: HBASE-6001
> URL: https://issues.apache.org/jira/browse/HBASE-6001
> Project: HBase
>  Issue Type: Task
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Attachments: hbase-6001.patch
>
>
> We need to upgrade slf4j to 1.6.1 since other hadoop components use 1.6.1 now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2012-05-16 Thread Mubarak Seyed (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276993#comment-13276993
 ] 

Mubarak Seyed commented on HBASE-4720:
--

Sure, will do Andy. Thanks.

> Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
> client/server 
> 
>
> Key: HBASE-4720
> URL: https://issues.apache.org/jira/browse/HBASE-4720
> Project: HBase
>  Issue Type: Improvement
>Reporter: Daniel Lord
>Assignee: Mubarak Seyed
> Fix For: 0.94.1
>
> Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, 
> HBASE-4720.trunk.v3.patch, HBASE-4720.trunk.v4.patch, 
> HBASE-4720.trunk.v5.patch, HBASE-4720.trunk.v6.patch, 
> HBASE-4720.trunk.v7.patch, HBASE-4720.v1.patch, HBASE-4720.v3.patch
>
>
> I have several large application/HBase clusters where an application node 
> will occasionally need to talk to HBase from a different cluster.  In order 
> to help ensure some of my consistency guarantees I have a sentinel table that 
> is updated atomically as users interact with the system.  This works quite 
> well for the "regular" hbase client but the REST client does not implement 
> the checkAndPut and checkAndDelete operations.  This exposes the application 
> to some race conditions that have to be worked around.  It would be ideal if 
> the same checkAndPut/checkAndDelete operations could be supported by the REST 
> client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6010) Security audit logger configuration for log4j

2012-05-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276994#comment-13276994
 ] 

Andrew Purtell commented on HBASE-6010:
---

I'm going to commit this in a little while if there are no objections. I tested 
this on a Hadoop 1.0.2 + HBase 0.94.0 cluster and the audit messages showed up 
in the right file at the right location.

> Security audit logger configuration for log4j
> -
>
> Key: HBASE-6010
> URL: https://issues.apache.org/jira/browse/HBASE-6010
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: 6010.patch
>
>
> Set up a logger for security audit messages just as Hadoop core does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5987) HFileBlockIndex improvement

2012-05-16 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5987:
---

Attachment: D3237.3.patch

Liyin updated the revision "[jira][89-fb] [HBASE-5987] HFileBlockIndex 
improvement".
Reviewers: Kannan, mbautin

  Address Mikhail's comments

REVISION DETAIL
  https://reviews.facebook.net/D3237

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java

To: Kannan, mbautin, Liyin
Cc: JIRA, todd, tedyu


> HFileBlockIndex improvement
> ---
>
> Key: HBASE-5987
> URL: https://issues.apache.org/jira/browse/HBASE-5987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, 
> screen_shot_of_sequential_scan_profiling.png
>
>
> Recently we find out a performance problem that it is quite slow when 
> multiple requests are reading the same block of data or index. 
> From the profiling, one of the causes is the IdLock contention which has been 
> addressed in HBASE-5898. 
> Another issue is that the HFileScanner will keep asking the HFileBlockIndex 
> about the data block location for each target key value during the scan 
> process(reSeekTo), even though the target key value has already been in the 
> current data block. This issue will cause certain index block very HOT, 
> especially when it is a sequential scan.
> To solve this issue, we propose the following solutions:
> First, we propose to lookahead for one more block index so that the 
> HFileScanner would know the start key value of next data block. So if the 
> target key value for the scan(reSeekTo) is "smaller" than that start kv of 
> next data block, it means the target key value has a very high possibility in 
> the current data block (if not in current data block, then the start kv of 
> next data block should be returned. +Indexing on the start key has some 
> defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
> the contrary, if the target key value is "bigger", then it shall query the 
> HFileBlockIndex. This improvement shall help to reduce the hotness of 
> HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
> Cache lookup.
> Secondary, we propose to push this idea a little further that the 
> HFileBlockIndex shall index on the last key value of each data block instead 
> of indexing on the start key value. The motivation is to solve the HBASE-4443 
> issue (avoid seeking to "previous" block when key you are interested in is 
> the first one of a block) as well as +the defects mentioned above+.
> For example, if the target key value is "smaller" than the start key value of 
> the data block N. There is no way for sure the target key value is in the 
> data block N or N-1. So it has to seek from data block N-1. However, if the 
> block index is based on the last key value for each data block and the target 
> key value is beween the last key value of data block N-1 and data block N, 
> then the target key value is supposed be data block N for sure. 
> As long as HBase only supports the forward scan, the last key value makes 
> more sense to be indexed on than the start key value. 
> Thanks Kannan and Mikhail for the insightful discussions and suggestions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6002) Possible chance of resource leak in HlogSplitter

2012-05-16 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276995#comment-13276995
 ] 

Chinna Rao Lalam commented on HBASE-6002:
-

@Ted : Updated the patch with above comment

> Possible chance of resource leak in HlogSplitter
> 
>
> Key: HBASE-6002
> URL: https://issues.apache.org/jira/browse/HBASE-6002
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HBASE-6002.patch, HBASE-6002_0.94_1.patch, 
> HBASE-6002_trunk.patch
>
>
> In HLogSplitter.splitLogFileToTemp-Reader(in) is not closed and in finally 
> block in loop while closing the writers(wap.w) if any exception comes other 
> writers won't close.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement

2012-05-16 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277001#comment-13277001
 ] 

Phabricator commented on HBASE-5987:


mbautin has accepted the revision "[jira][89-fb] [HBASE-5987] HFileBlockIndex 
improvement".

  Just one minor comment (please address on commit).

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:413 
HContants -> HConstants (missed an "s")

REVISION DETAIL
  https://reviews.facebook.net/D3237

BRANCH
  HBASE-5987-fb

To: Kannan, mbautin, Liyin
Cc: JIRA, todd, tedyu


> HFileBlockIndex improvement
> ---
>
> Key: HBASE-5987
> URL: https://issues.apache.org/jira/browse/HBASE-5987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, 
> screen_shot_of_sequential_scan_profiling.png
>
>
> Recently we find out a performance problem that it is quite slow when 
> multiple requests are reading the same block of data or index. 
> From the profiling, one of the causes is the IdLock contention which has been 
> addressed in HBASE-5898. 
> Another issue is that the HFileScanner will keep asking the HFileBlockIndex 
> about the data block location for each target key value during the scan 
> process(reSeekTo), even though the target key value has already been in the 
> current data block. This issue will cause certain index block very HOT, 
> especially when it is a sequential scan.
> To solve this issue, we propose the following solutions:
> First, we propose to lookahead for one more block index so that the 
> HFileScanner would know the start key value of next data block. So if the 
> target key value for the scan(reSeekTo) is "smaller" than that start kv of 
> next data block, it means the target key value has a very high possibility in 
> the current data block (if not in current data block, then the start kv of 
> next data block should be returned. +Indexing on the start key has some 
> defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
> the contrary, if the target key value is "bigger", then it shall query the 
> HFileBlockIndex. This improvement shall help to reduce the hotness of 
> HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
> Cache lookup.
> Secondary, we propose to push this idea a little further that the 
> HFileBlockIndex shall index on the last key value of each data block instead 
> of indexing on the start key value. The motivation is to solve the HBASE-4443 
> issue (avoid seeking to "previous" block when key you are interested in is 
> the first one of a block) as well as +the defects mentioned above+.
> For example, if the target key value is "smaller" than the start key value of 
> the data block N. There is no way for sure the target key value is in the 
> data block N or N-1. So it has to seek from data block N-1. However, if the 
> block index is based on the last key value for each data block and the target 
> key value is beween the last key value of data block N-1 and data block N, 
> then the target key value is supposed be data block N for sure. 
> As long as HBase only supports the forward scan, the last key value makes 
> more sense to be indexed on than the start key value. 
> Thanks Kannan and Mikhail for the insightful discussions and suggestions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6002) Possible chance of resource leak in HlogSplitter

2012-05-16 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277010#comment-13277010
 ] 

Zhihong Yu commented on HBASE-6002:
---

The same construct is used for both places when closing writer:
{code}
+  try {
+wap.w.close();
+  } catch (IOException e) {
{code}
If the first close encountered some IOE, calling it the second time would most 
likely encounter similar error.
My comment @ 15/May/12 21:25 applies in the above scenario.

> Possible chance of resource leak in HlogSplitter
> 
>
> Key: HBASE-6002
> URL: https://issues.apache.org/jira/browse/HBASE-6002
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HBASE-6002.patch, HBASE-6002_0.94_1.patch, 
> HBASE-6002_trunk.patch
>
>
> In HLogSplitter.splitLogFileToTemp-Reader(in) is not closed and in finally 
> block in loop while closing the writers(wap.w) if any exception comes other 
> writers won't close.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6021) NullPointerException when running LoadTestTool without specifying compression type

2012-05-16 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-6021:
-

 Summary: NullPointerException when running LoadTestTool without 
specifying compression type
 Key: HBASE-6021
 URL: https://issues.apache.org/jira/browse/HBASE-6021
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.96.0, 0.94.1
 Environment: Hadoop 1.0.2, HBase 0.94.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor


If you don't specify a compression type on the LoadTestTool command line then 
this happens:

{noformat}
12/05/16 18:41:23 ERROR util.AbstractHBaseTool: Error running command-line tool
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.HColumnDescriptor.setCompressionType(HColumnDescriptor.java:535)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.createPreSplitLoadTestTable(HBaseTestingUtility.java:1885)
at 
org.apache.hadoop.hbase.util.LoadTestTool.doWork(LoadTestTool.java:297)
at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:103)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.doStaticMain(AbstractHBaseTool.java:173)
at org.apache.hadoop.hbase.util.LoadTestTool.main(LoadTestTool.java:341)
{noformat}

This should be handled better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5959) Add other load balancers

2012-05-16 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277013#comment-13277013
 ] 

Elliott Clark commented on HBASE-5959:
--

Yeah; Though I've been testing this with per table turned off as that's where 
this loadbalancer does it's best, when it has the whole knowledge of everything.

> Add other load balancers
> 
>
> Key: HBASE-5959
> URL: https://issues.apache.org/jira/browse/HBASE-5959
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 0.96.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-5959-0.patch, HBASE-5959-1.patch, 
> HBASE-5959-2.patch, HBASE-5959-3.patch, HBASE-5959-6.patch, 
> HBASE-5959-7.patch, HBASE-5959.D3189.1.patch, HBASE-5959.D3189.2.patch, 
> HBASE-5959.D3189.3.patch, HBASE-5959.D3189.4.patch
>
>
> Now that balancers are pluggable we should give some options.b

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement

2012-05-16 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277017#comment-13277017
 ] 

Phabricator commented on HBASE-5987:


todd has commented on the revision "[jira][89-fb] [HBASE-5987] HFileBlockIndex 
improvement".

  Would be nice to have a simple benchmark - eg load a million rows and time 
"count 'table', { CACHE => 1000 }" from the shell with and without.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java:23 
typo: references wrong class name here
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java:28 
could do with a short javadoc, eg:
  /**
* The first key in the next block following this one in the HFile.
* If this key is unknown, this is reference-equal with 
HConstants.NO_NEXT_INDEXED_KEY
*/

  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:526 are you 
guaranteed that firstKey.arrayOffset() == 0 here? I would have assumed firstKey 
could be an array slice

REVISION DETAIL
  https://reviews.facebook.net/D3237

BRANCH
  HBASE-5987-fb

To: Kannan, mbautin, Liyin
Cc: JIRA, todd, tedyu


> HFileBlockIndex improvement
> ---
>
> Key: HBASE-5987
> URL: https://issues.apache.org/jira/browse/HBASE-5987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, 
> screen_shot_of_sequential_scan_profiling.png
>
>
> Recently we find out a performance problem that it is quite slow when 
> multiple requests are reading the same block of data or index. 
> From the profiling, one of the causes is the IdLock contention which has been 
> addressed in HBASE-5898. 
> Another issue is that the HFileScanner will keep asking the HFileBlockIndex 
> about the data block location for each target key value during the scan 
> process(reSeekTo), even though the target key value has already been in the 
> current data block. This issue will cause certain index block very HOT, 
> especially when it is a sequential scan.
> To solve this issue, we propose the following solutions:
> First, we propose to lookahead for one more block index so that the 
> HFileScanner would know the start key value of next data block. So if the 
> target key value for the scan(reSeekTo) is "smaller" than that start kv of 
> next data block, it means the target key value has a very high possibility in 
> the current data block (if not in current data block, then the start kv of 
> next data block should be returned. +Indexing on the start key has some 
> defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
> the contrary, if the target key value is "bigger", then it shall query the 
> HFileBlockIndex. This improvement shall help to reduce the hotness of 
> HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
> Cache lookup.
> Secondary, we propose to push this idea a little further that the 
> HFileBlockIndex shall index on the last key value of each data block instead 
> of indexing on the start key value. The motivation is to solve the HBASE-4443 
> issue (avoid seeking to "previous" block when key you are interested in is 
> the first one of a block) as well as +the defects mentioned above+.
> For example, if the target key value is "smaller" than the start key value of 
> the data block N. There is no way for sure the target key value is in the 
> data block N or N-1. So it has to seek from data block N-1. However, if the 
> block index is based on the last key value for each data block and the target 
> key value is beween the last key value of data block N-1 and data block N, 
> then the target key value is supposed be data block N for sure. 
> As long as HBase only supports the forward scan, the last key value makes 
> more sense to be indexed on than the start key value. 
> Thanks Kannan and Mikhail for the insightful discussions and suggestions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6011) Unable to start master in local mode

2012-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277018#comment-13277018
 ] 

Hadoop QA commented on HBASE-6011:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12527668/6011-v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 31 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.util.TestHBaseFsck

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1893//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1893//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1893//console

This message is automatically generated.

> Unable to start master in local mode
> 
>
> Key: HBASE-6011
> URL: https://issues.apache.org/jira/browse/HBASE-6011
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Attachments: 6011-v2.patch, 6011.patch
>
>
> Got this trying to launch head of 0.94 branch in local mode from the build 
> tree but it happens with trunk and 0.92 too:
> {noformat}
> 12/05/15 19:35:45 ERROR master.HMasterCommandLine: Failed to start master
> java.lang.ClassCastException: org.apache.hadoop.hbase.master.HMaster cannot 
> be cast to org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster
>   at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:142)
>   at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:103)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>   at 
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
>   at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1761)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5986) Clients can see holes in the META table when regions are being split

2012-05-16 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277025#comment-13277025
 ] 

jirapos...@reviews.apache.org commented on HBASE-5986:
--



bq.  On 2012-05-16 05:20:29, ramkrishna vasudevan wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java, line 402
bq.  > 
bq.  >
bq.  > Blocking time out is one factor on which we will be waiting.  So 
this blockingtimeout should be in lieu with 'fileSplitTimeout'?

bq. From my understanding, fileSplitTimeout is a regionserver property, while 
the timeout parameter in this patch is a client side property, since the 
blocking will happen on the client. I think "hbase.client.operation.timeout" 
can serve us well. HConstants state that it is "Default HBase client operation 
timeout, which is tantamount to a blocking call".


- enis


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5133/#review7928
---


On 2012-05-16 01:53:09, enis wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/5133/
bq.  ---
bq.  
bq.  (Updated 2012-05-16 01:53:09)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  We found this issue when running large scale ingestion tests for 
HBASE-5754. The problem is that the .META. table updates are not atomic while 
splitting a region. In SplitTransaction, there is a time lap between the 
marking the parent offline, and adding of daughters to the META table. This can 
result in clients using MetaScanner, of HTable.getStartEndKeys (used by the 
TableInputFormat) missing regions which are made just offline, but the 
daughters are not added yet.
bq.  
bq.  This patch is the approach 2 mentioned in the issue comments, mainly 
during META scan, if we detect that the region is split, we block until the 
information for the child regions are available in META and manually feed those 
rows to the MetaScanner. Although approach 3 (using local region transactions) 
seems cleaner, they are not available under branch 0.92, which I think should 
also incorporate this fix. I'll provide ports once we are clear for trunk. 
bq.  
bq.  Also this patch does not fix MetaReader (see 
https://issues.apache.org/jira/browse/HBASE-3475). 
bq.  
bq.  
bq.  This addresses bug HBASE-5986.
bq.  https://issues.apache.org/jira/browse/HBASE-5986
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java 8873512 
bq.src/main/java/org/apache/hadoop/hbase/client/HTable.java b8290e4 
bq.src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java f404999 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java
 a8091e6 
bq.  
bq.  Diff: https://reviews.apache.org/r/5133/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  added extensive tests under TestEndToEndSplitTranscation, and ran existing 
unit tests. 
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  enis
bq.  
bq.



> Clients can see holes in the META table when regions are being split
> 
>
> Key: HBASE-5986
> URL: https://issues.apache.org/jira/browse/HBASE-5986
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.96.0, 0.94.1
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: HBASE-5986-test_v1.patch
>
>
> We found this issue when running large scale ingestion tests for HBASE-5754. 
> The problem is that the .META. table updates are not atomic while splitting a 
> region. In SplitTransaction, there is a time lap between the marking the 
> parent offline, and adding of daughters to the META table. This can result in 
> clients using MetaScanner, of HTable.getStartEndKeys (used by the 
> TableInputFormat) missing regions which are made just offline, but the 
> daughters are not added yet. 
> This is also related to HBASE-4335. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5986) Clients can see holes in the META table when regions are being split

2012-05-16 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277024#comment-13277024
 ] 

jirapos...@reviews.apache.org commented on HBASE-5986:
--



bq.  On 2012-05-16 02:52:04, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java, line 391
bq.  > 
bq.  >
bq.  > null is always returned by getRegionResultBlocking() in case of time 
out.
bq.  > How do we deal with that ?
bq.  >

we are checking for null return value for resultA, and resultB. But I'll also 
add a LOG warning. 


bq.  On 2012-05-16 02:52:04, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java, line 390
bq.  > 
bq.  >
bq.  > Do we have to instantiate HTable every time ?

This is a bit tricky. Ideally we should not. But there is no close() on 
MetaScannerVisitor, so we cannot close the HTable properly if we reuse the 
HTable across calls to processRow(). We can add a close() method and call it, 
or obtain the HTable from the context, but that would imply changing the class 
signature for MetaScannerVisitor. I assumed since we are reusing HConnection's 
HTable creation is cheap, is that not the case, wdyt? 


bq.  On 2012-05-16 02:52:04, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java, line 360
bq.  > 
bq.  >
bq.  > Can we add a config param for blockingTimeout ?

please see below


bq.  On 2012-05-16 02:52:04, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java, line 398
bq.  > 
bq.  >
bq.  > We shouldn't be passing blockingTimeout here.
bq.  > We need to consider the amount of time spent in the call @ line 388.

agreed


bq.  On 2012-05-16 02:52:04, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java, line 430
bq.  > 
bq.  >
bq.  > Please restore interrupted state of the thread.

agreed


- enis


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5133/#review7925
---


On 2012-05-16 01:53:09, enis wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/5133/
bq.  ---
bq.  
bq.  (Updated 2012-05-16 01:53:09)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  We found this issue when running large scale ingestion tests for 
HBASE-5754. The problem is that the .META. table updates are not atomic while 
splitting a region. In SplitTransaction, there is a time lap between the 
marking the parent offline, and adding of daughters to the META table. This can 
result in clients using MetaScanner, of HTable.getStartEndKeys (used by the 
TableInputFormat) missing regions which are made just offline, but the 
daughters are not added yet.
bq.  
bq.  This patch is the approach 2 mentioned in the issue comments, mainly 
during META scan, if we detect that the region is split, we block until the 
information for the child regions are available in META and manually feed those 
rows to the MetaScanner. Although approach 3 (using local region transactions) 
seems cleaner, they are not available under branch 0.92, which I think should 
also incorporate this fix. I'll provide ports once we are clear for trunk. 
bq.  
bq.  Also this patch does not fix MetaReader (see 
https://issues.apache.org/jira/browse/HBASE-3475). 
bq.  
bq.  
bq.  This addresses bug HBASE-5986.
bq.  https://issues.apache.org/jira/browse/HBASE-5986
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java 8873512 
bq.src/main/java/org/apache/hadoop/hbase/client/HTable.java b8290e4 
bq.src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java f404999 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java
 a8091e6 
bq.  
bq.  Diff: https://reviews.apache.org/r/5133/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  added extensive tests under TestEndToEndSplitTranscation, and ran existing 
unit tests. 
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  enis
bq.  
bq.



> Clients can see holes in the META table when regions are being split
> 
>
> Key: HBASE-5986
> 

[jira] [Created] (HBASE-6022) Include Junit in the libs when packaging so that TestAcidGaurntee can run

2012-05-16 Thread Elliott Clark (JIRA)
Elliott Clark created HBASE-6022:


 Summary: Include Junit in the libs when packaging so that 
TestAcidGaurntee can run
 Key: HBASE-6022
 URL: https://issues.apache.org/jira/browse/HBASE-6022
 Project: HBase
  Issue Type: Improvement
Reporter: Elliott Clark


If JUnit is not in the libs folder running the test acid command fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6002) Possible chance of resource leak in HlogSplitter

2012-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277029#comment-13277029
 ] 

Hadoop QA commented on HBASE-6002:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12527670/HBASE-6002_trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 31 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1894//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1894//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1894//console

This message is automatically generated.

> Possible chance of resource leak in HlogSplitter
> 
>
> Key: HBASE-6002
> URL: https://issues.apache.org/jira/browse/HBASE-6002
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HBASE-6002.patch, HBASE-6002_0.94_1.patch, 
> HBASE-6002_trunk.patch
>
>
> In HLogSplitter.splitLogFileToTemp-Reader(in) is not closed and in finally 
> block in loop while closing the writers(wap.w) if any exception comes other 
> writers won't close.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5986) Clients can see holes in the META table when regions are being split

2012-05-16 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277031#comment-13277031
 ] 

jirapos...@reviews.apache.org commented on HBASE-5986:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5133/#review7938
---



src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java


I was asking how do we convey the fact that MetaScanner has timed out to 
client.

I don't see exception being thrown which contains timeout information.


- Ted


On 2012-05-16 01:53:09, enis wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/5133/
bq.  ---
bq.  
bq.  (Updated 2012-05-16 01:53:09)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  We found this issue when running large scale ingestion tests for 
HBASE-5754. The problem is that the .META. table updates are not atomic while 
splitting a region. In SplitTransaction, there is a time lap between the 
marking the parent offline, and adding of daughters to the META table. This can 
result in clients using MetaScanner, of HTable.getStartEndKeys (used by the 
TableInputFormat) missing regions which are made just offline, but the 
daughters are not added yet.
bq.  
bq.  This patch is the approach 2 mentioned in the issue comments, mainly 
during META scan, if we detect that the region is split, we block until the 
information for the child regions are available in META and manually feed those 
rows to the MetaScanner. Although approach 3 (using local region transactions) 
seems cleaner, they are not available under branch 0.92, which I think should 
also incorporate this fix. I'll provide ports once we are clear for trunk. 
bq.  
bq.  Also this patch does not fix MetaReader (see 
https://issues.apache.org/jira/browse/HBASE-3475). 
bq.  
bq.  
bq.  This addresses bug HBASE-5986.
bq.  https://issues.apache.org/jira/browse/HBASE-5986
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java 8873512 
bq.src/main/java/org/apache/hadoop/hbase/client/HTable.java b8290e4 
bq.src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java f404999 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java
 a8091e6 
bq.  
bq.  Diff: https://reviews.apache.org/r/5133/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  added extensive tests under TestEndToEndSplitTranscation, and ran existing 
unit tests. 
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  enis
bq.  
bq.



> Clients can see holes in the META table when regions are being split
> 
>
> Key: HBASE-5986
> URL: https://issues.apache.org/jira/browse/HBASE-5986
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.96.0, 0.94.1
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: HBASE-5986-test_v1.patch
>
>
> We found this issue when running large scale ingestion tests for HBASE-5754. 
> The problem is that the .META. table updates are not atomic while splitting a 
> region. In SplitTransaction, there is a time lap between the marking the 
> parent offline, and adding of daughters to the META table. This can result in 
> clients using MetaScanner, of HTable.getStartEndKeys (used by the 
> TableInputFormat) missing regions which are made just offline, but the 
> daughters are not added yet. 
> This is also related to HBASE-4335. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5726) TestSplitTransactionOnCluster occasionally failing

2012-05-16 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5726:
--

Priority: Critical  (was: Major)

The test failure has become almost consistent in Hadoop QA builds.
Here is one recent example:
https://builds.apache.org/job/PreCommit-HBASE-Build/1894//testReport/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testShutdownFixupWhenDaughterHasSplit/

> TestSplitTransactionOnCluster occasionally failing
> --
>
> Key: HBASE-5726
> URL: https://issues.apache.org/jira/browse/HBASE-5726
> Project: HBase
>  Issue Type: Bug
>Reporter: Uma Maheswara Rao G
>Priority: Critical
> Attachments: Hbase.log_testExistingZnodeBlocksSplitAndWeRollback & 
> testShutdownFixupWhenDaughterHasSplit, 
> Hbase.log_testRSSplitEphemeralsDisappearButDaughtersAreOnlinedAfterShutdownHandling
>
>
> When I ran TestSplitTransactionOnCluster, some times tests are failing.
> {quote}
> java.lang.AssertionError: expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.failNotEquals(Assert.java:647)
>   at org.junit.Assert.assertEquals(Assert.java:128)
>   at org.junit.Assert.assertEquals(Assert.java:472)
>   at org.junit.Assert.assertEquals(Assert.java:456)
>   at 
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.getAndCheckSingleTableRegion(TestSplitTransactionOnCluster.java:89)
>   at 
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit(TestSplitTransactionOnCluster.java:298)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
> {quote}
> Seems like test is flaky, random other cases also fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5986) Clients can see holes in the META table when regions are being split

2012-05-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277035#comment-13277035
 ] 

stack commented on HBASE-5986:
--

Regards 1. and 2., what guarantees do we have that the daughter will not have 
split by the time we go into our wait on the daughters to come online?

> Clients can see holes in the META table when regions are being split
> 
>
> Key: HBASE-5986
> URL: https://issues.apache.org/jira/browse/HBASE-5986
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.96.0, 0.94.1
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: HBASE-5986-test_v1.patch
>
>
> We found this issue when running large scale ingestion tests for HBASE-5754. 
> The problem is that the .META. table updates are not atomic while splitting a 
> region. In SplitTransaction, there is a time lap between the marking the 
> parent offline, and adding of daughters to the META table. This can result in 
> clients using MetaScanner, of HTable.getStartEndKeys (used by the 
> TableInputFormat) missing regions which are made just offline, but the 
> daughters are not added yet. 
> This is also related to HBASE-4335. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism

2012-05-16 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277055#comment-13277055
 ] 

Zhihong Yu commented on HBASE-5104:
---

I stumbled on the following test failure twice (with D2799.6.patch on MacBook):
{code}
testExecDeserialization(org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint)
  Time elapsed: 0.028 sec  <<< ERROR!
java.io.EOFException
  at java.io.DataInputStream.readFully(DataInputStream.java:180)
  at java.io.DataInputStream.readUTF(DataInputStream.java:592)
  at java.io.DataInputStream.readUTF(DataInputStream.java:547)
  at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:120)
  at 
org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint.testExecDeserialization(TestCoprocessorEndpoint.java:201)
{code}

> Provide a reliable intra-row pagination mechanism
> -
>
> Key: HBASE-5104
> URL: https://issues.apache.org/jira/browse/HBASE-5104
> Project: HBase
>  Issue Type: Bug
>Reporter: Kannan Muthukkaruppan
>Assignee: Madhuwanti Vaidya
> Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, 
> D2799.4.patch, D2799.5.patch, D2799.6.patch, 
> jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch,
>  testFilterList.rb
>
>
> Addendum:
> Doing pagination (retrieving at most "limit" number of KVs at a particular 
> "offset") is currently supported via the ColumnPaginationFilter. However, it 
> is not a very clean way of supporting pagination.  Some of the problems with 
> it are:
> * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have 
> same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This 
> is not the case for ColumnPaginationFilter as its internal state gets updated 
> depending on whether or not Filter(A) returns TRUE/FALSE for a particular 
> cell.
> * When this Filter is used in combination with other filters (e.g., doing AND 
> with another filter using FilterList), the behavior of the query depends on 
> the order of filters in the FilterList. This is not ideal.
> * ColumnPaginationFilter is a stateful filter which ends up counting multiple 
> versions of the cell as separate values even if another filter upstream or 
> the ScanQueryMatcher is going to reject the value for other reasons.
> Seems like we need a reliable way to do pagination. The particular use case 
> that prompted this JIRA is pagination within the same rowKey. For example, 
> for a given row key R, get columns with prefix P, starting at offset X (among 
> columns which have prefix P) and limit Y. Some possible fixes might be:
> 1) enhance ColumnPrefixFilter to support another constructor which supports 
> limit/offset.
> 2) Support pagination (limit/offset) at the Scan/Get API level (rather than 
> as a filter) [Like SQL].
> Original Post:
> Thanks Jiakai Liu for reporting this issue and doing the initial 
> investigation. Email from Jiakai below:
> Assuming that we have an index column family with the following entries:
> "tag0:001:thread1"
> ...
> "tag1:001:thread1"
> "tag1:002:thread2"
> ...
> "tag1:010:thread10"
> ...
> "tag2:001:thread1"
> "tag2:005:thread5"
> ...
> To get threads with "tag1" in range [5, 10), I tried the following code:
> ColumnPrefixFilter filter1 = new 
> ColumnPrefixFilter(Bytes.toBytes("tag1"));
> ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
> */, 5 /* offset */);
> FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
> filters.addFilter(filter1);
> filters.addFilter(filter2);
> Get get = new Get(USER);
> get.addFamily(COLUMN_FAMILY);
> get.setMaxVersions(1);
> get.setFilter(filters);
> Somehow it didn't work as expected. It returned the entries as if the filter1 
> were not set.
> Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
> The FilterList filter does not handle this return code properly (treat it as 
> INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5941) improve multiDelete performance by grabbing locks ahead of time

2012-05-16 Thread Amitanand Aiyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer resolved HBASE-5941.


   Resolution: Fixed
Fix Version/s: 0.89-fb
 Hadoop Flags: Reviewed

> improve multiDelete performance by grabbing locks ahead of time
> ---
>
> Key: HBASE-5941
> URL: https://issues.apache.org/jira/browse/HBASE-5941
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.89.20100924, 0.89-fb, 0.94.1
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
>Priority: Minor
> Fix For: 0.89-fb
>
>
> Ning reported that the performance of deletes is slower than the performance 
> of Puts. This should not be the case. 
> On digging up, it turns out that there is a difference between multiPut and 
> multiDelete in the way we grab locks.
> multiPut grabs all the locks optimistically and processes the puts one by 
> one. multiDelete grabs locks and releases
>  them one at a time, for each delete operation, as if it were done 
> separately. This may be causing a performance
>  slow down for deletes. Trying to improve it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4823) long running scans lose benefit of bloomfilters and timerange hints

2012-05-16 Thread Amitanand Aiyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer resolved HBASE-4823.


   Resolution: Fixed
Fix Version/s: 0.89-fb
 Hadoop Flags: Reviewed

> long running scans lose benefit of bloomfilters and timerange hints
> ---
>
> Key: HBASE-4823
> URL: https://issues.apache.org/jira/browse/HBASE-4823
> Project: HBase
>  Issue Type: Bug
>Reporter: Kannan Muthukkaruppan
>Assignee: Amitanand Aiyer
> Fix For: 0.89-fb
>
> Attachments: HBASE-4823.D519.1.patch, TestScannerResets-89fb.txt
>
>
> When you have a long running scan due to say an MR job, you can lose the 
> benefit of timerange hints & bloom filters midway if your scanner gets reset. 
> [Note: The scanners can get reset say due to a flush or compaction].
> In one of our workloads, we periodically want to do rollups on recent 15 
> minutes of data in a column family... but the timerange hint benefit is lost 
> midway when this resetScannerStack (shown below) happens. And end result-- we 
> end up reading all the old HFiles rather than just the recent HFiles.
> {code}
>  private void resetScannerStack(KeyValue lastTopKey) throws IOException {
> if (heap != null) {
>   throw new RuntimeException("StoreScanner.reseek run on an existing 
> heap!");
> }
> /* When we have the scan object, should we not pass it to getScanners()
>  * to get a limited set of scanners? We did so in the constructor and we
>  * could have done it now by storing the scan object from the constructor 
> */
> List scanners = getScanners();
> {code}
> The comment in the code seems to be aware of this issue and even has the 
> suggested fix!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5962) interop issue: RowMutations should be added at the end in HbaseObjectWriteable class

2012-05-16 Thread Kannan Muthukkaruppan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277065#comment-13277065
 ] 

Kannan Muthukkaruppan commented on HBASE-5962:
--

Yes, Amit confirms that this issue was introduced in the backport, and is 
specific to 89-fb. Going to mark it as resolved.

> interop issue: RowMutations should be added at the end in 
> HbaseObjectWriteable class
> 
>
> Key: HBASE-5962
> URL: https://issues.apache.org/jira/browse/HBASE-5962
> Project: HBase
>  Issue Type: Bug
>Reporter: Kannan Muthukkaruppan
> Fix For: 0.96.0, 0.89-fb
>
>
> In HbaseObjectWriteable.java new classes should be added to the end; else, 
> old clients will not be able to talk to new HBase servers. This is causing 
> issues in test cluster with the following stack trace:
> 2012-05-08 11:24:32,416 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: 
> Can't find class
> java.lang.ClassNotFoundException: 
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:247)
> at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:792)
> at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.getClassByName(HbaseObjectWritable.java:552)
> at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:520)
> at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:136)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:953)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:895)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:471)
> at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:371)
> 2012-05-08 11:24:33,766 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: 
> Can't find class

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5962) interop issue: RowMutations should be added at the end in HbaseObjectWriteable class

2012-05-16 Thread Kannan Muthukkaruppan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan resolved HBASE-5962.
--

   Resolution: Fixed
Fix Version/s: 0.89-fb
   0.96.0

> interop issue: RowMutations should be added at the end in 
> HbaseObjectWriteable class
> 
>
> Key: HBASE-5962
> URL: https://issues.apache.org/jira/browse/HBASE-5962
> Project: HBase
>  Issue Type: Bug
>Reporter: Kannan Muthukkaruppan
> Fix For: 0.96.0, 0.89-fb
>
>
> In HbaseObjectWriteable.java new classes should be added to the end; else, 
> old clients will not be able to talk to new HBase servers. This is causing 
> issues in test cluster with the following stack trace:
> 2012-05-08 11:24:32,416 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: 
> Can't find class
> java.lang.ClassNotFoundException: 
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:247)
> at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:792)
> at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.getClassByName(HbaseObjectWritable.java:552)
> at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:520)
> at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:136)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:953)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:895)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:471)
> at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:371)
> 2012-05-08 11:24:33,766 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: 
> Can't find class

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5987) HFileBlockIndex improvement

2012-05-16 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5987:
---

Attachment: D3237.4.patch

Liyin updated the revision "[jira][89-fb] [HBASE-5987] HFileBlockIndex 
improvement".
Reviewers: Kannan, mbautin

  Good point, Todd !

REVISION DETAIL
  https://reviews.facebook.net/D3237

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java

To: Kannan, mbautin, Liyin
Cc: JIRA, todd, tedyu


> HFileBlockIndex improvement
> ---
>
> Key: HBASE-5987
> URL: https://issues.apache.org/jira/browse/HBASE-5987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, 
> D3237.4.patch, screen_shot_of_sequential_scan_profiling.png
>
>
> Recently we find out a performance problem that it is quite slow when 
> multiple requests are reading the same block of data or index. 
> From the profiling, one of the causes is the IdLock contention which has been 
> addressed in HBASE-5898. 
> Another issue is that the HFileScanner will keep asking the HFileBlockIndex 
> about the data block location for each target key value during the scan 
> process(reSeekTo), even though the target key value has already been in the 
> current data block. This issue will cause certain index block very HOT, 
> especially when it is a sequential scan.
> To solve this issue, we propose the following solutions:
> First, we propose to lookahead for one more block index so that the 
> HFileScanner would know the start key value of next data block. So if the 
> target key value for the scan(reSeekTo) is "smaller" than that start kv of 
> next data block, it means the target key value has a very high possibility in 
> the current data block (if not in current data block, then the start kv of 
> next data block should be returned. +Indexing on the start key has some 
> defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
> the contrary, if the target key value is "bigger", then it shall query the 
> HFileBlockIndex. This improvement shall help to reduce the hotness of 
> HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
> Cache lookup.
> Secondary, we propose to push this idea a little further that the 
> HFileBlockIndex shall index on the last key value of each data block instead 
> of indexing on the start key value. The motivation is to solve the HBASE-4443 
> issue (avoid seeking to "previous" block when key you are interested in is 
> the first one of a block) as well as +the defects mentioned above+.
> For example, if the target key value is "smaller" than the start key value of 
> the data block N. There is no way for sure the target key value is in the 
> data block N or N-1. So it has to seek from data block N-1. However, if the 
> block index is based on the last key value for each data block and the target 
> key value is beween the last key value of data block N-1 and data block N, 
> then the target key value is supposed be data block N for sure. 
> As long as HBase only supports the forward scan, the last key value makes 
> more sense to be indexed on than the start key value. 
> Thanks Kannan and Mikhail for the insightful discussions and suggestions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5826) Improve sync of HLog edits

2012-05-16 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5826:
--

Attachment: 5826-v2.txt

Patch rebased on trunk.

> Improve sync of HLog edits
> --
>
> Key: HBASE-5826
> URL: https://issues.apache.org/jira/browse/HBASE-5826
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhihong Yu
> Fix For: 0.96.0
>
> Attachments: 5826-v2.txt, 5826.txt
>
>
> HBASE-5782 solved the correctness issue for the sync of HLog edits.
> Todd provided a patch that would achieve higher throughput.
> This JIRA is a continuation of Todd's work submitted there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5826) Improve sync of HLog edits

2012-05-16 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5826:
--

Fix Version/s: 0.96.0

> Improve sync of HLog edits
> --
>
> Key: HBASE-5826
> URL: https://issues.apache.org/jira/browse/HBASE-5826
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhihong Yu
> Fix For: 0.96.0
>
> Attachments: 5826-v2.txt, 5826.txt
>
>
> HBASE-5782 solved the correctness issue for the sync of HLog edits.
> Todd provided a patch that would achieve higher throughput.
> This JIRA is a continuation of Todd's work submitted there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement

2012-05-16 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277090#comment-13277090
 ] 

Phabricator commented on HBASE-5987:


todd has commented on the revision "[jira][89-fb] [HBASE-5987] HFileBlockIndex 
improvement".

  Thanks for fixing. I'm surprised the unit tests weren't failing before. Is 
that because the ByteBuffer usually does have arrayOffset() == 0, so the bug 
wasn't actually causing a problem? Or do we need more test coverage?

REVISION DETAIL
  https://reviews.facebook.net/D3237

BRANCH
  HBASE-5987-fb

To: Kannan, mbautin, Liyin
Cc: JIRA, todd, tedyu


> HFileBlockIndex improvement
> ---
>
> Key: HBASE-5987
> URL: https://issues.apache.org/jira/browse/HBASE-5987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, 
> D3237.4.patch, screen_shot_of_sequential_scan_profiling.png
>
>
> Recently we find out a performance problem that it is quite slow when 
> multiple requests are reading the same block of data or index. 
> From the profiling, one of the causes is the IdLock contention which has been 
> addressed in HBASE-5898. 
> Another issue is that the HFileScanner will keep asking the HFileBlockIndex 
> about the data block location for each target key value during the scan 
> process(reSeekTo), even though the target key value has already been in the 
> current data block. This issue will cause certain index block very HOT, 
> especially when it is a sequential scan.
> To solve this issue, we propose the following solutions:
> First, we propose to lookahead for one more block index so that the 
> HFileScanner would know the start key value of next data block. So if the 
> target key value for the scan(reSeekTo) is "smaller" than that start kv of 
> next data block, it means the target key value has a very high possibility in 
> the current data block (if not in current data block, then the start kv of 
> next data block should be returned. +Indexing on the start key has some 
> defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
> the contrary, if the target key value is "bigger", then it shall query the 
> HFileBlockIndex. This improvement shall help to reduce the hotness of 
> HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
> Cache lookup.
> Secondary, we propose to push this idea a little further that the 
> HFileBlockIndex shall index on the last key value of each data block instead 
> of indexing on the start key value. The motivation is to solve the HBASE-4443 
> issue (avoid seeking to "previous" block when key you are interested in is 
> the first one of a block) as well as +the defects mentioned above+.
> For example, if the target key value is "smaller" than the start key value of 
> the data block N. There is no way for sure the target key value is in the 
> data block N or N-1. So it has to seek from data block N-1. However, if the 
> block index is based on the last key value for each data block and the target 
> key value is beween the last key value of data block N-1 and data block N, 
> then the target key value is supposed be data block N for sure. 
> As long as HBase only supports the forward scan, the last key value makes 
> more sense to be indexed on than the start key value. 
> Thanks Kannan and Mikhail for the insightful discussions and suggestions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5987) HFileBlockIndex improvement

2012-05-16 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5987:
---

Attachment: D3237.5.patch

Liyin updated the revision "[jira][89-fb] [HBASE-5987] HFileBlockIndex 
improvement".
Reviewers: Kannan, mbautin

REVISION DETAIL
  https://reviews.facebook.net/D3237

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java

To: Kannan, mbautin, Liyin
Cc: JIRA, todd, tedyu


> HFileBlockIndex improvement
> ---
>
> Key: HBASE-5987
> URL: https://issues.apache.org/jira/browse/HBASE-5987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, 
> D3237.4.patch, D3237.5.patch, screen_shot_of_sequential_scan_profiling.png
>
>
> Recently we find out a performance problem that it is quite slow when 
> multiple requests are reading the same block of data or index. 
> From the profiling, one of the causes is the IdLock contention which has been 
> addressed in HBASE-5898. 
> Another issue is that the HFileScanner will keep asking the HFileBlockIndex 
> about the data block location for each target key value during the scan 
> process(reSeekTo), even though the target key value has already been in the 
> current data block. This issue will cause certain index block very HOT, 
> especially when it is a sequential scan.
> To solve this issue, we propose the following solutions:
> First, we propose to lookahead for one more block index so that the 
> HFileScanner would know the start key value of next data block. So if the 
> target key value for the scan(reSeekTo) is "smaller" than that start kv of 
> next data block, it means the target key value has a very high possibility in 
> the current data block (if not in current data block, then the start kv of 
> next data block should be returned. +Indexing on the start key has some 
> defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
> the contrary, if the target key value is "bigger", then it shall query the 
> HFileBlockIndex. This improvement shall help to reduce the hotness of 
> HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
> Cache lookup.
> Secondary, we propose to push this idea a little further that the 
> HFileBlockIndex shall index on the last key value of each data block instead 
> of indexing on the start key value. The motivation is to solve the HBASE-4443 
> issue (avoid seeking to "previous" block when key you are interested in is 
> the first one of a block) as well as +the defects mentioned above+.
> For example, if the target key value is "smaller" than the start key value of 
> the data block N. There is no way for sure the target key value is in the 
> data block N or N-1. So it has to seek from data block N-1. However, if the 
> block index is based on the last key value for each data block and the target 
> key value is beween the last key value of data block N-1 and data block N, 
> then the target key value is supposed be data block N for sure. 
> As long as HBase only supports the forward scan, the last key value makes 
> more sense to be indexed on than the start key value. 
> Thanks Kannan and Mikhail for the insightful discussions and suggestions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement

2012-05-16 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277131#comment-13277131
 ] 

Phabricator commented on HBASE-5987:


Liyin has commented on the revision "[jira][89-fb] [HBASE-5987] HFileBlockIndex 
improvement".

  I think we haven't done a seekBefore to the previous block with a reSeekTo in 
this previous block together. I shall create a unit test to cover that.


REVISION DETAIL
  https://reviews.facebook.net/D3237

BRANCH
  HBASE-5987-fb

To: Kannan, mbautin, Liyin
Cc: JIRA, todd, tedyu


> HFileBlockIndex improvement
> ---
>
> Key: HBASE-5987
> URL: https://issues.apache.org/jira/browse/HBASE-5987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, 
> D3237.4.patch, D3237.5.patch, screen_shot_of_sequential_scan_profiling.png
>
>
> Recently we find out a performance problem that it is quite slow when 
> multiple requests are reading the same block of data or index. 
> From the profiling, one of the causes is the IdLock contention which has been 
> addressed in HBASE-5898. 
> Another issue is that the HFileScanner will keep asking the HFileBlockIndex 
> about the data block location for each target key value during the scan 
> process(reSeekTo), even though the target key value has already been in the 
> current data block. This issue will cause certain index block very HOT, 
> especially when it is a sequential scan.
> To solve this issue, we propose the following solutions:
> First, we propose to lookahead for one more block index so that the 
> HFileScanner would know the start key value of next data block. So if the 
> target key value for the scan(reSeekTo) is "smaller" than that start kv of 
> next data block, it means the target key value has a very high possibility in 
> the current data block (if not in current data block, then the start kv of 
> next data block should be returned. +Indexing on the start key has some 
> defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
> the contrary, if the target key value is "bigger", then it shall query the 
> HFileBlockIndex. This improvement shall help to reduce the hotness of 
> HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
> Cache lookup.
> Secondary, we propose to push this idea a little further that the 
> HFileBlockIndex shall index on the last key value of each data block instead 
> of indexing on the start key value. The motivation is to solve the HBASE-4443 
> issue (avoid seeking to "previous" block when key you are interested in is 
> the first one of a block) as well as +the defects mentioned above+.
> For example, if the target key value is "smaller" than the start key value of 
> the data block N. There is no way for sure the target key value is in the 
> data block N or N-1. So it has to seek from data block N-1. However, if the 
> block index is based on the last key value for each data block and the target 
> key value is beween the last key value of data block N-1 and data block N, 
> then the target key value is supposed be data block N for sure. 
> As long as HBase only supports the forward scan, the last key value makes 
> more sense to be indexed on than the start key value. 
> Thanks Kannan and Mikhail for the insightful discussions and suggestions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5927) SSH and DisableTableHandler happening together does not clear the znode of the region and RIT map.

2012-05-16 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277132#comment-13277132
 ] 

Zhihong Yu commented on HBASE-5927:
---

I ran the 3 failed tests and they passed.

Will wait till tomorrow for more comments.

> SSH and DisableTableHandler happening together does not clear the znode of 
> the region and RIT map.
> --
>
> Key: HBASE-5927
> URL: https://issues.apache.org/jira/browse/HBASE-5927
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.96.0, 0.94.1
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-5927_94.patch, HBASE-5927_94_v2.patch, 
> HBASE-5927_trunk.patch, HBASE-5927_trunk_2.patch, TestCaseForReProduce.txt
>
>
> A possible exception: If the related regionserver was just killed(But HMaster 
> has not perceived that), then we will get a local exception "Connection reset 
> by peer". If this region belongs to a disabling table. what will happen?
> ServerShutdownHandler will remove this region from AM#regions. So this region 
> is still existing in RIT. TimeoutMonitor will take care of it after it got 
> timeout. Then invoke unassign again. Since this region has been removed from 
> AM#regions, it will return directly due to the below code:
> {code}
> synchronized (this.regions) {
>   // Check if this region is currently assigned
>   if (!regions.containsKey(region)) {
> LOG.debug("Attempted to unassign region " +
>   region.getRegionNameAsString() + " but it is not " +
>   "currently assigned anywhere");
> return;
>   }
> }
> {code}
> Then it leads to an end-less loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6004) Adding more logging to help debugging MR job

2012-05-16 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277140#comment-13277140
 ] 

jirapos...@reviews.apache.org commented on HBASE-6004:
--



bq.  On 2012-05-15 23:11:04, Andrew Purtell wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java, line 
54
bq.  > 
bq.  >
bq.  > We control logging via log4j.properties files. Here also new 
configuration in another file to switch some additional logging on and off. 
Would it make more sense to create a logger class e.g. 
ScannerCallable.ActivityLog that logs at TRACE level and update 
log4j.properties with 
log4j.logger.org.apache.hadoop.hbase.client.ScannerCallable.ActivityLog = TRACE 
(default is INFO, i.e. disabled). Just a thought. We did something like this 
for security audit logging.
bq.  
bq.  Jimmy Xiang wrote:
bq.  It is a good idea.  Because of other parameters introduced, it's 
better to put them in the same place for now.

Fair enough, there isn't a way to get the value of properties in the 
log4j.properties file using either commons logging API nor java.util.logging 
that I can see.


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5131/#review7918
---


On 2012-05-16 02:56:16, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/5131/
bq.  ---
bq.  
bq.  (Updated 2012-05-16 02:56:16)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Added some logging for MR debugging in case scanner times out.  The 
logging is disabled by default.
bq.  It will be helpful to know how much time spent in the scanner and, how 
much in the mapper task.
bq.  In case scanner issue, it is helpful to know the region server id, last 
successful rows and so on.
bq.  
bq.  
bq.  This addresses bug HBASE-6004.
bq.  https://issues.apache.org/jira/browse/HBASE-6004
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 
46b1c56 
bq.src/main/java/org/apache/hadoop/hbase/mapred/TableRecordReaderImpl.java 
42569fb 
bq.
src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java 
1c8a393 
bq.  
bq.  Diff: https://reviews.apache.org/r/5131/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



> Adding more logging to help debugging MR job
> 
>
> Key: HBASE-6004
> URL: https://issues.apache.org/jira/browse/HBASE-6004
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 0.94.0, 0.96.0
>
> Attachments: hbase-6004.patch
>
>
> MR job sometime fails because scanner expired. In this case, it will be 
> helpful to know the last successful row, the ip of the region sever, and so 
> on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6010) Security audit logger configuration for log4j

2012-05-16 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-6010:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12527549/6010.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 31 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.master.TestSplitLogManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1883//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1883//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1883//console

This message is automatically generated.)

> Security audit logger configuration for log4j
> -
>
> Key: HBASE-6010
> URL: https://issues.apache.org/jira/browse/HBASE-6010
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: 6010.patch
>
>
> Set up a logger for security audit messages just as Hadoop core does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   >