[jira] [Created] (HBASE-3723) Major compact should be done when there is only one storefile and some keyvalue is outdated.

2011-04-01 Thread zhoushuaifeng (JIRA)
Major compact should be done when there is only one storefile and some keyvalue 
is outdated.


 Key: HBASE-3723
 URL: https://issues.apache.org/jira/browse/HBASE-3723
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.1, 0.90.0
Reporter: zhoushuaifeng
 Fix For: 0.90.2


In the function store.isMajorCompaction:
  if (filesToCompact.size() == 1) {
// Single file
StoreFile sf = filesToCompact.get(0);
long oldest =
(sf.getReader().timeRangeTracker == null) ?
Long.MIN_VALUE :
now - sf.getReader().timeRangeTracker.minimumTimestamp;
if (sf.isMajorCompaction() &&
(this.ttl == HConstants.FOREVER || oldest < this.ttl)) {
  if (LOG.isDebugEnabled()) {
LOG.debug("Skipping major compaction of " + this.storeNameStr +
" because one (major) compacted file only and oldestTime " +
oldest + "ms is < ttl=" + this.ttl);
  }
}
  } else {
When there is only one storefile in the store, and some keyvalues' TTL are 
overtime, the majorcompactchecker should send this region to the compactquene 
and run a majorcompact to clean these outdated data. But according to the code 
in 0.90.1, it will do nothing. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-1512:
--

Attachment: (was: AggregateProtocolImpl.java)

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-1512:
--

Attachment: (was: AggregateCpProtocol.java)

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-1512:
--

Attachment: ColumnInterpreter.java
AggregationClient.java
AggregateProtocolImpl.java
AggregateCpProtocol.java

Attaching generic implementation.

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-1512:
--

Attachment: (was: AggregationClient.java)

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-1512:
--

Attachment: (was: ColumnInterpreter.java)

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014564#comment-13014564
 ] 

Ted Yu commented on HBASE-1512:
---

See my thoughts on 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3724) Load balancer improvements

2011-04-01 Thread stack (JIRA)
Load balancer improvements
--

 Key: HBASE-3724
 URL: https://issues.apache.org/jira/browse/HBASE-3724
 Project: HBase
  Issue Type: Umbrella
Reporter: stack


Umbrella issue under which we hang all regions related to balancer

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3723) Major compact should be done when there is only one storefile and some keyvalue is outdated.

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014710#comment-13014710
 ] 

stack commented on HBASE-3723:
--

If you add a patch zhoushuaifeng, I'll apply it.  Thanks for reporting this bug.

> Major compact should be done when there is only one storefile and some 
> keyvalue is outdated.
> 
>
> Key: HBASE-3723
> URL: https://issues.apache.org/jira/browse/HBASE-3723
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.0, 0.90.1
>Reporter: zhoushuaifeng
> Fix For: 0.90.2
>
>
> In the function store.isMajorCompaction:
>   if (filesToCompact.size() == 1) {
> // Single file
> StoreFile sf = filesToCompact.get(0);
> long oldest =
> (sf.getReader().timeRangeTracker == null) ?
> Long.MIN_VALUE :
> now - sf.getReader().timeRangeTracker.minimumTimestamp;
> if (sf.isMajorCompaction() &&
> (this.ttl == HConstants.FOREVER || oldest < this.ttl)) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Skipping major compaction of " + this.storeNameStr +
> " because one (major) compacted file only and oldestTime " +
> oldest + "ms is < ttl=" + this.ttl);
>   }
> }
>   } else {
> When there is only one storefile in the store, and some keyvalues' TTL are 
> overtime, the majorcompactchecker should send this region to the compactquene 
> and run a majorcompact to clean these outdated data. But according to the 
> code in 0.90.1, it will do nothing. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3719) Workload has to drain before hlog can be rolled

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014712#comment-13014712
 ] 

stack commented on HBASE-3719:
--

bq. It will be nice if we can continue to write new transactions to the new 
HLog (but maybe not commit those transactions) while the old HLog is being 
closed.

We'd hold up clients while this was going on?  We'd not return to clients until 
after the commital to the new HLog had gone through?

> Workload has to drain before hlog can be rolled
> ---
>
> Key: HBASE-3719
> URL: https://issues.apache.org/jira/browse/HBASE-3719
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> In the current implementation, the regionserver blocks new transactions from 
> occuring when the HLog is rolled. Closing the existing HLog sometimes takes 
> more than a few seconds and during this time all new puts/increments are 
> blocked. It will be nice if we can continue to write new transactions to the 
> new HLog (but maybe not commit those transactions) while the old HLog is 
> being closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3713) Hmaster had crashed as disabling table

2011-04-01 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3713:
-

Fix Version/s: 0.92.0

Thank you for digging in.  While I see the sequence described as being 
relatively 'rare' in operation, it does expose a 'hole' that others might fall 
in to doing other than the above described sequence.

> Hmaster had crashed as disabling table
> --
>
> Key: HBASE-3713
> URL: https://issues.apache.org/jira/browse/HBASE-3713
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.1
> Environment: startup cluster with HA master and 5 datanode.
>Reporter: gaojinchao
> Fix For: 0.92.0
>
>
> Operation step:
> 1, startup cluster with HA master
> 2, the active master crashed while it is creating table with region 
> 3, backup master become active.
> 4, I want to drop the table
> 5, the active master crashed
> So the issue is that if a region was closed and disabled when the first 
> master was running, it won't be assigned anywhere and won't be in transition 
> either (it's called being in RIT in the code). When the new master comes 
> around, and disable is called, it does a check to see if the region is in RIT 
> but not if it was already disabled, and fails on NPE because it's not 
> assigned to anyone.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014733#comment-13014733
 ] 

Ted Yu commented on HBASE-1512:
---

In AggregateProtocolImpl, I think the boolean done should be renamed. It 
actually indicates whether more rows exist after the current one.
The following loop condition may confuse someone:
{code}
  } while (done);
{code}


> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1476) scaling compaction with multiple threads

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014735#comment-13014735
 ] 

stack commented on HBASE-1476:
--

OK.  I'm good w/ committing this to trunk with your extra comments.  Good stuff 
N.

> scaling compaction with multiple threads
> 
>
> Key: HBASE-1476
> URL: https://issues.apache.org/jira/browse/HBASE-1476
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Billy Pearson
>Assignee: Nicolas Spiegelberg
>  Labels: moved_from_0_20_5
> Fix For: 0.92.0
>
>
> Was thinking we should build in support to be able to handle more then one 
> thread for compactions this will allow us to keep up with compactions when we 
> get to the point where we store Tb's of data per node and may regions
> Maybe a configurable setting to set how many threads a region server can use 
> for compactions.
> With compression turned on my compactions are limited by cpu speed with multi 
> cores then it would be nice to be able to scale compactions to 2 or more 
> cores.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3707) Flush memstore after a configurable number of inserts not simply based on size

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014738#comment-13014738
 ] 

stack commented on HBASE-3707:
--

Patch looks fine Andrew but does seem like a bandaid for the issue you are 
seeing with CSLM.  You thinking of applying it with unbounded number for the 
default.  You'd add config. on the CF-level for setting max?

> Flush memstore after a configurable number of inserts not simply based on size
> --
>
> Key: HBASE-3707
> URL: https://issues.apache.org/jira/browse/HBASE-3707
> Project: HBase
>  Issue Type: Improvement
>Reporter: Andrew Purtell
> Attachments: HBASE-3707.patch
>
>
> Memstore upsert performance may be impacted by having a large number of 
> values in the map. Consider flushing the store after a configurable number of 
> inserts.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1476) scaling compaction with multiple threads

2011-04-01 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014740#comment-13014740
 ] 

Nicolas Spiegelberg commented on HBASE-1476:


I did some minor refactoring in an internal peer review that's ongoing.  I 
think it's all wrapped up, but waiting to confirm.  I will re-post the patch 
and commit once done.

> scaling compaction with multiple threads
> 
>
> Key: HBASE-1476
> URL: https://issues.apache.org/jira/browse/HBASE-1476
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Billy Pearson
>Assignee: Nicolas Spiegelberg
>  Labels: moved_from_0_20_5
> Fix For: 0.92.0
>
>
> Was thinking we should build in support to be able to handle more then one 
> thread for compactions this will allow us to keep up with compactions when we 
> get to the point where we store Tb's of data per node and may regions
> Maybe a configurable setting to set how many threads a region server can use 
> for compactions.
> With compression turned on my compactions are limited by cpu speed with multi 
> cores then it would be nice to be able to scale compactions to 2 or more 
> cores.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3707) Flush memstore after a configurable number of inserts not simply based on size

2011-04-01 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014741#comment-13014741
 ] 

Andrew Purtell commented on HBASE-3707:
---

Yes unbounded by default.

Let me look at per-CF.

Agree it is a band aid. I don't suggest applying the patch, just keeping it 
around if use of it turns out to be expedient somehow.




> Flush memstore after a configurable number of inserts not simply based on size
> --
>
> Key: HBASE-3707
> URL: https://issues.apache.org/jira/browse/HBASE-3707
> Project: HBase
>  Issue Type: Improvement
>Reporter: Andrew Purtell
> Attachments: HBASE-3707.patch
>
>
> Memstore upsert performance may be impacted by having a large number of 
> values in the map. Consider flushing the store after a configurable number of 
> inserts.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3725) HBase increments from old value after delete and write to disk

2011-04-01 Thread Nathaniel Cook (JIRA)
HBase increments from old value after delete and write to disk
--

 Key: HBASE-3725
 URL: https://issues.apache.org/jira/browse/HBASE-3725
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.1
Reporter: Nathaniel Cook


Deleted row values are sometimes used for starting points on new increments.

To reproduce:
Create a row "r". Set column "x" to some default value.
Force hbase to write that value to the file system (such as restarting the 
cluster).
Delete the row.
Call table.incrementColumnValue with "some_value"
Get the row.
The returned value in the column was incremented from the old value before the 
row was deleted instead of being initialized to "some_value".

Code to reproduce:
{code}

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTableInterface;
import org.apache.hadoop.hbase.client.HTablePool;
import org.apache.hadoop.hbase.client.Increment;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

public class HBaseTestIncrement
{
static String tableName  = "testIncrement";

static byte[] infoCF = Bytes.toBytes("info");

static byte[] rowKey = Bytes.toBytes("test-rowKey");

static byte[] newInc = Bytes.toBytes("new");
static byte[] oldInc = Bytes.toBytes("old");

/**
 * This code reproduces a bug with increment column values in hbase
 * Usage: First run part one by passing '1' as the first arg
 *Then restart the hbase cluster so it writes everything to disk
 *Run part two by passing '2' as the first arg
 *
 * This will result in the old deleted data being found and used for 
the increment calls
 *
 * @param args
 * @throws IOException
 */
public static void main(String[] args) throws IOException
{
if("1".equals(args[0]))
partOne();
if("2".equals(args[0]))
partTwo();
if ("both".equals(args[0]))
{
partOne();
partTwo();
}
}

/**
 * Creates a table and increments a column value 10 times by 10 each 
time.
 * Results in a value of 100 for the column
 *
 * @throws IOException
 */
static void partOne()throws IOException
{

Configuration conf = HBaseConfiguration.create();


HBaseAdmin admin = new HBaseAdmin(conf);
HTableDescriptor tableDesc = new HTableDescriptor(tableName);
tableDesc.addFamily(new HColumnDescriptor(infoCF));
if(admin.tableExists(tableName))
{
admin.disableTable(tableName);
admin.deleteTable(tableName);
}
admin.createTable(tableDesc);

HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
HTableInterface table = pool.getTable(Bytes.toBytes(tableName));

//Increment unitialized column
for (int j = 0; j < 10; j++)
{
table.incrementColumnValue(rowKey, infoCF, oldInc, 
(long)10);
Increment inc = new Increment(rowKey);
inc.addColumn(infoCF, newInc, (long)10);
table.increment(inc);
}

Get get = new Get(rowKey);
Result r = table.get(get);
System.out.println("initial values: new " + 
Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + 
Bytes.toLong(r.getValue(infoCF, oldInc)));

}

/**
 * First deletes the data then increments the column 10 times by 1 each 
time
 *
 * Should result in a value of 10 but it doesn't, it results in a 
values of 110
 *
 * @throws IOException
 */
static void partTwo()throws IOException
{
Configuration conf = HBaseConfiguration.create();

HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
HTableInterface table = pool.getTable(Bytes.toBytes(tableName));

Delete delete = new Delete(rowKey);
table.delete(delete);


//Increment columns
for (int j = 0; j < 10; j++)
{

[jira] [Commented] (HBASE-3677) Generate a globally unique identifier for a cluster and store in /hbase/hbase.id

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014744#comment-13014744
 ] 

stack commented on HBASE-3677:
--

Posted review over on rb G.

> Generate a globally unique identifier for a cluster and store in 
> /hbase/hbase.id
> 
>
> Key: HBASE-3677
> URL: https://issues.apache.org/jira/browse/HBASE-3677
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Fix For: 0.92.0
>
>
> We don't currently have a way to uniquely identify an HBase cluster, apart 
> for where it's stored in HDFS or configuration of the ZooKeeper quorum 
> managing it.  It would be generally useful to be able to identify a cluster 
> via API.
> The proposal here is pretty simple:
> # When master initializes the filesystem, generate a globally unique ID and 
> store in /hbase/hbase.id
> # For existing clusters, generate hbase.id on master startup if it does not 
> exist
> # Include unique ID in ClusterStatus returned from master
> For token authentication, this will be required to allow selecting the 
> correct token to pass to a cluster when a single client is communicating to 
> more than one HBase instance.
> Chatting with J-D, replication stores it's own cluster id in place with each 
> HLog edit, so requires as small as possible an identifier, but I think we 
> could automate a mapping from unique cluster ID -> short ID if we had the 
> unique ID available.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3238) HBase needs to have the CREATE permission on the parent of its ZooKeeper parent znode

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014747#comment-13014747
 ] 

stack commented on HBASE-3238:
--

+1 on patch.  Mathias, what do you think?

> HBase needs to have the CREATE permission on the parent of its ZooKeeper 
> parent znode
> -
>
> Key: HBASE-3238
> URL: https://issues.apache.org/jira/browse/HBASE-3238
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: Mathias Herberts
>Assignee: Alex Newman
>Priority: Blocker
> Attachments: 1, HBASE-3238-v2.patch, HBASE-3238-v3.patch, 
> HBASE-3238.patch
>
>
> Upon startup, HBase attempts to create its zookeeper.parent.znode in 
> ZooKeeper, it does so using ZKUtil.createAndFailSilent which as its name 
> seems to imply will fail silent if the znode exists. But if HBase does not 
> have the CREATE permission on its zookeeper.parent.znode parent znode then 
> the create attempt will fail with a 
> org.apache.zookeeper.KeeperException$NoAuthException and will terminate the 
> process.
> In a production environment where ZooKeeper has a managed namespace it is not 
> possible to give HBase CREATE permission on the parent of its parent znode.
> ZKUtil.createAndFailSilent should therefore be modified to check that the 
> znode exists using ZooKeeper.exist prior to attempting to create it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match

2011-04-01 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014748#comment-13014748
 ] 

Jonathan Gray commented on HBASE-3562:
--

Thanks for looking into this Evert.  This is definitely some tricky stuff.

A few comments on your patch...

- Our convention in conditionals is to put the variable first.  I find it a 
little tricky to read the code when the constant is first.  For example:
{code}
if (MatchCode.INCLUDE == mc)
{code}
should be
{code}
if (mc == MatchCode.INCLUDE)
{code}
(And all the other places where you have this type of logic)

- The unit test {{TestColumnMatchAndFilterOrder}} is clever how you check 
correctness, but I think it would be good to actually do a read query and 
verify the results for a few different combinations of the query to prove 
correctness of the overall algorithm.  Other changes to SQM down the road might 
change more behavior / order of operations, so this test may no longer apply or 
give full coverage for correctness.  Having some tests which don't rely on the 
precise server-side interactions but rather confirm the end results will be 
more applicable as we move forward.

- You have some lines that are > 80 characters, especially in some of the 
javadoc.  Just wrap that so all lines are <= 80 chars.

- There was a comment in SQM that described why the filter was checked first.  
Can you write some inline comments to describe how this works now?  There are a 
couple lines at the end but it will be useful to have some explanation on why 
this has changed and what the behavior is now.

- Is there any particular reason that you had includeLatestColumn take 
timestamp as a parameter?  The timestamp is passed in the check call, and we 
could just hang on to that.  It just feels a little strange to me since you 
should never pass a different timestamp, and the tracker can know which was the 
latest column.

Overall this is really solid!  Great work Evert!

> ValueFilter is being evaluated before performing the column match
> -
>
> Key: HBASE-3562
> URL: https://issues.apache.org/jira/browse/HBASE-3562
> Project: HBase
>  Issue Type: Bug
>  Components: filters
>Affects Versions: 0.90.0
>Reporter: Evert Arckens
> Attachments: HBASE-3562.patch
>
>
> When performing a Get operation where a both a column is specified and a 
> ValueFilter, the ValueFilter is evaluated before making the column match as 
> is indicated in the javadoc of Get.setFilter()  : " {@link 
> Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column 
> match, deletes and max versions have been run. "
> The is shown in the little test below, which uses a TestComparator extending 
> a WritableByteArrayComparable.
> public void testFilter() throws Exception {
>   byte[] cf = Bytes.toBytes("cf");
>   byte[] row = Bytes.toBytes("row");
>   byte[] col1 = Bytes.toBytes("col1");
>   byte[] col2 = Bytes.toBytes("col2");
>   Put put = new Put(row);
>   put.add(cf, col1, new byte[]{(byte)1});
>   put.add(cf, col2, new byte[]{(byte)2});
>   table.put(put);
>   Get get = new Get(row);
>   get.addColumn(cf, col2); // We only want to retrieve col2
>   TestComparator testComparator = new TestComparator();
>   Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator);
>   get.setFilter(filter);
>   Result result = table.get(get);
> }
> public class TestComparator extends WritableByteArrayComparable {
> /**
>  * Nullary constructor, for Writable
>  */
> public TestComparator() {
> super();
> }
> 
> @Override
> public int compareTo(byte[] theirValue) {
> if (theirValue[0] == (byte)1) {
> // If the column match was done before evaluating the filter, we 
> should never get here.
> throw new RuntimeException("I only expect (byte)2 in col2, not 
> (byte)1 from col1");
> }
> if (theirValue[0] == (byte)2) {
> return 0;
> }
> else return 1;
> }
> }
> When only one column should be retrieved, this can be worked around by using 
> a SingleColumnValueFilter instead of the ValueFilter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014750#comment-13014750
 ] 

stack commented on HBASE-3725:
--

Good one Nathaniel.  Any chance of a patch, preferrably one that includes your 
nice test above?

> HBase increments from old value after delete and write to disk
> --
>
> Key: HBASE-3725
> URL: https://issues.apache.org/jira/browse/HBASE-3725
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.90.1
>Reporter: Nathaniel Cook
>
> Deleted row values are sometimes used for starting points on new increments.
> To reproduce:
> Create a row "r". Set column "x" to some default value.
> Force hbase to write that value to the file system (such as restarting the 
> cluster).
> Delete the row.
> Call table.incrementColumnValue with "some_value"
> Get the row.
> The returned value in the column was incremented from the old value before 
> the row was deleted instead of being initialized to "some_value".
> Code to reproduce:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HColumnDescriptor;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HBaseAdmin;
> import org.apache.hadoop.hbase.client.HTableInterface;
> import org.apache.hadoop.hbase.client.HTablePool;
> import org.apache.hadoop.hbase.client.Increment;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.util.Bytes;
> public class HBaseTestIncrement
> {
>   static String tableName  = "testIncrement";
>   static byte[] infoCF = Bytes.toBytes("info");
>   static byte[] rowKey = Bytes.toBytes("test-rowKey");
>   static byte[] newInc = Bytes.toBytes("new");
>   static byte[] oldInc = Bytes.toBytes("old");
>   /**
>* This code reproduces a bug with increment column values in hbase
>* Usage: First run part one by passing '1' as the first arg
>*Then restart the hbase cluster so it writes everything to disk
>*Run part two by passing '2' as the first arg
>*
>* This will result in the old deleted data being found and used for 
> the increment calls
>*
>* @param args
>* @throws IOException
>*/
>   public static void main(String[] args) throws IOException
>   {
>   if("1".equals(args[0]))
>   partOne();
>   if("2".equals(args[0]))
>   partTwo();
>   if ("both".equals(args[0]))
>   {
>   partOne();
>   partTwo();
>   }
>   }
>   /**
>* Creates a table and increments a column value 10 times by 10 each 
> time.
>* Results in a value of 100 for the column
>*
>* @throws IOException
>*/
>   static void partOne()throws IOException
>   {
>   Configuration conf = HBaseConfiguration.create();
>   HBaseAdmin admin = new HBaseAdmin(conf);
>   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>   tableDesc.addFamily(new HColumnDescriptor(infoCF));
>   if(admin.tableExists(tableName))
>   {
>   admin.disableTable(tableName);
>   admin.deleteTable(tableName);
>   }
>   admin.createTable(tableDesc);
>   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
>   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
>   //Increment unitialized column
>   for (int j = 0; j < 10; j++)
>   {
>   table.incrementColumnValue(rowKey, infoCF, oldInc, 
> (long)10);
>   Increment inc = new Increment(rowKey);
>   inc.addColumn(infoCF, newInc, (long)10);
>   table.increment(inc);
>   }
>   Get get = new Get(rowKey);
>   Result r = table.get(get);
>   System.out.println("initial values: new " + 
> Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + 
> Bytes.toLong(r.getValue(infoCF, oldInc)));
>   }
>   /**
>* First deletes the data then increments the column 10 times by 1 each 
> time
>*
>* Should result in a value of 10 but it doesn't, it results in a 
> values of 110
>*
>* @throws IOException
>*/
>   static void partTwo()throws IOException
>   {
>   Configuration conf = HBaseConfig

[jira] [Commented] (HBASE-3488) Allow RowCounter to retrieve multiple versions of rows

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014756#comment-13014756
 ] 

stack commented on HBASE-3488:
--

License says 2008.

Class comment seems wrong.  Says mapper-only job but I see a Reducer declared 
and specified in the job setup.

Otherwise, patch looks great to me.  Nice functionality.  Could make for crazy 
report on big table but could be just what the doctor ordered diagnosing state 
of an hbase table.

Do you want to fix above Subbu or should I on commit?

> Allow RowCounter to retrieve multiple versions of rows
> --
>
> Key: HBASE-3488
> URL: https://issues.apache.org/jira/browse/HBASE-3488
> Project: HBase
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.90.0
>Reporter: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows.patch
>
>
> Currently RowCounter only retrieves latest version for each row.
> Some applications would store multiple versions for the same row.
> RowCounter should accept a new parameter for the number of versions to return.
> Scan object would be configured with version parameter (for scan.maxVersions).
> Then the following API should be called:
> {code}
>   public KeyValue[] raw() {
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3647) Distinguish read and write request count in region

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014762#comment-13014762
 ] 

stack commented on HBASE-3647:
--

To be clear, for this patch to be committed, we will need to up our rpc version 
on all proxy interfaces.  HServerLoad is buried inside HServerInfo at the 
moment.  Its looking like pressure is building such that we will have to up our 
version numbers -- hbase-1502 where we rejigger heartbeat includes deprecation 
of HSA and redo of HSI -- but also Gary has patch to include cluster id in 
ClusterStatue  So, the RPC version bump looks like it'll happen soon.  
We'll apply this patch then.

> Distinguish read and write request count in region
> --
>
> Key: HBASE-3647
> URL: https://issues.apache.org/jira/browse/HBASE-3647
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: hbase-3647.txt
>
>
> Distinguishing read and write request counts, on top of HBASE-3507, would 
> benefit load balancer.
> The action for balancing read vs. write load should be different. For read 
> load, region movement should be low (to keep scanner happy). For write load, 
> region movement is allowed.
> Now that we have cheap(er) counters, it should not be too burdensome keeping 
> up the extra count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3647) Distinguish read and write request count in region

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014763#comment-13014763
 ] 

stack commented on HBASE-3647:
--

Oh, we should add versioning to HSL and to CS and to HSI as part of version 
bump so in future they can self-migrate if they change.

> Distinguish read and write request count in region
> --
>
> Key: HBASE-3647
> URL: https://issues.apache.org/jira/browse/HBASE-3647
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: hbase-3647.txt
>
>
> Distinguishing read and write request counts, on top of HBASE-3507, would 
> benefit load balancer.
> The action for balancing read vs. write load should be different. For read 
> load, region movement should be low (to keep scanner happy). For write load, 
> region movement is allowed.
> Now that we have cheap(er) counters, it should not be too burdensome keeping 
> up the extra count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3704) Show per region request count in table.jsp

2011-04-01 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3704:
-

Fix Version/s: 0.92.0

> Show per region request count in table.jsp
> --
>
> Key: HBASE-3704
> URL: https://issues.apache.org/jira/browse/HBASE-3704
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.1
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 3704.txt
>
>
> table.jsp should display per region request count.
> It should also display region count per region server.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-3704) Show per region request count in table.jsp

2011-04-01 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reassigned HBASE-3704:


Assignee: stack  (was: Ted Yu)

> Show per region request count in table.jsp
> --
>
> Key: HBASE-3704
> URL: https://issues.apache.org/jira/browse/HBASE-3704
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.1
>Reporter: Ted Yu
>Assignee: stack
> Fix For: 0.92.0
>
> Attachments: 3704.txt
>
>
> table.jsp should display per region request count.
> It should also display region count per region server.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3704) Show per region request count in table.jsp

2011-04-01 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3704:
-

Status: Patch Available  (was: In Progress)

> Show per region request count in table.jsp
> --
>
> Key: HBASE-3704
> URL: https://issues.apache.org/jira/browse/HBASE-3704
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.1
>Reporter: Ted Yu
>Assignee: stack
> Fix For: 0.92.0
>
> Attachments: 3704.txt
>
>
> table.jsp should display per region request count.
> It should also display region count per region server.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-3704) Show per region request count in table.jsp

2011-04-01 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reassigned HBASE-3704:


Assignee: Ted Yu  (was: stack)

> Show per region request count in table.jsp
> --
>
> Key: HBASE-3704
> URL: https://issues.apache.org/jira/browse/HBASE-3704
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.1
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 3704.txt
>
>
> table.jsp should display per region request count.
> It should also display region count per region server.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3685) when multiple columns are combined with TimestampFilter, only one column is returned

2011-04-01 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3685:
-

Labels: noob  (was: )

> when multiple columns are combined with TimestampFilter, only one column is 
> returned
> 
>
> Key: HBASE-3685
> URL: https://issues.apache.org/jira/browse/HBASE-3685
> Project: HBase
>  Issue Type: Bug
>  Components: filters, regionserver
>Affects Versions: 0.89.20100924
>Reporter: Jerry Chen
>Priority: Minor
>  Labels: noob
>
> As reported by an Hbase user: 
> "I have a ThreadMetadata column family, and there are two columns in it: 
> v12:th: and v12:me. The following code only returns v12:me
> get.addColumn(Bytes.toBytes("ThreadMetadata"), Bytes.toBytes("v12:th:");
> get.addColumn(Bytes.toBytes("ThreadMetadata"), Bytes.toBytes("v12:me:");
> List threadIds = new ArrayList();
> threadIds.add(10709L);
> TimestampFilter filter = new TimestampFilter(threadIds);
> get.setFilter(filter);
> get.setMaxVersions();
> Result result = table.get(get);
> I checked hbase for the key/value, they are present. Also other combinations 
> like no timestampfilter, it returns both."
> Kannan was able to do a small repro of the issue and commented that if we 
> drop the get.setMaxVersions(), then the problem goes away. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3642) Web UI should be available during startup

2011-04-01 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3642:
-

 Priority: Critical  (was: Minor)
Fix Version/s: 0.92.0
   Labels: noob  (was: )

A few of us were chatting yesterday and fingered this as an important issue 
(and it wouldn't be hard to do).  Upping priority and bringing into 0.92.

> Web UI should be available during startup
> -
>
> Key: HBASE-3642
> URL: https://issues.apache.org/jira/browse/HBASE-3642
> Project: HBase
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
>Priority: Critical
>  Labels: noob
> Fix For: 0.92.0
>
>
> Currently, HBase does not provide a web interface during its start-up period 
> -- while it's waiting for RSes to report in, replaying logs, etc. It would be 
> great if the Web UI was available and showed the current status.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3660) HMaster will exit when starting with stale data in cached locations such as -ROOT- or .META.

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014782#comment-13014782
 ] 

stack commented on HBASE-3660:
--

Confirmation that this patch fixes the above issue beyond Cosmin's thumbs-up 
can be found here: 
http://search-hadoop.com/m/lsm232yCTxf/oRouteToHostException+causes+Master+abort+when+the+RegionServer+hosting+ROOT+is+not+available%2522&subj=NoRouteToHostException+causes+Master+abort+when+the+RegionServer+hosting+ROOT+is+not+available

> HMaster will exit when starting with stale data in cached locations such as 
> -ROOT- or .META.
> 
>
> Key: HBASE-3660
> URL: https://issues.apache.org/jira/browse/HBASE-3660
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Affects Versions: 0.90.1
>Reporter: Cosmin Lehene
>Priority: Critical
> Fix For: 0.90.2
>
> Attachments: 3660.txt, HBASE-3660.patch
>
>
> later edit: I've mixed up two issues here. The main problem is that a client 
> (that could be HMaster) will read stale data from -ROOT- or .META. and not 
> deal correctly with the raised exceptions. 
> I've noticed this when the IP on my machine changed (it's even easier to 
> detect when LZO doesn't work)
> Master loads .META. successfully and then starts assigning regions.
> However LZO doesn't work so HRegionServer can't open the regions. 
> A client attempts to get data from a table so it reads the location from 
> .META. but goes to a totally different server (the old value in .META.)
> This could happen without the LZO story too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3647) Distinguish read and write request count in region

2011-04-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3647:
--

Attachment: 3647-versioned.txt

Realigned patch with current trunk.
Also made HServerInfo and HServerLoad extend VersionedWritable.

> Distinguish read and write request count in region
> --
>
> Key: HBASE-3647
> URL: https://issues.apache.org/jira/browse/HBASE-3647
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 3647-versioned.txt, hbase-3647.txt
>
>
> Distinguishing read and write request counts, on top of HBASE-3507, would 
> benefit load balancer.
> The action for balancing read vs. write load should be different. For read 
> load, region movement should be low (to keep scanner happy). For write load, 
> region movement is allowed.
> Now that we have cheap(er) counters, it should not be too burdensome keeping 
> up the extra count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3726) Allow coprocessor callback RPC calls to be batched at region server level

2011-04-01 Thread Ted Yu (JIRA)
Allow coprocessor callback RPC calls to be batched at region server level
-

 Key: HBASE-3726
 URL: https://issues.apache.org/jira/browse/HBASE-3726
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Ted Yu
 Fix For: 0.92.0


Cuurently the Callback.update() method is called for each Call.call() return 
value obtained from each region.  Each Call.call() invocation is a separate 
RPC, so there is currently one RPC per region. So there's no place at the 
moment for the region server to be involved in any aggregation across regions.

There is some preliminary support in 
HConnectionManager.HConnectionImplementation.processBatch() that would allow 
doing 1 RPC per region server, same as we do for multi-get and multi-put.

We should provide ability to batch callback RPC calls.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3647) Distinguish read and write request count in region

2011-04-01 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3647:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to TRUNK.  My dementia on versions above should be ignored.  We've 
already upped our rpc version across the board because of CoProcessors change.  
This change goes in under that same version change.  Thanks for the patch Ted 
and you did us a service Versioning the HSI and HSL classes.

> Distinguish read and write request count in region
> --
>
> Key: HBASE-3647
> URL: https://issues.apache.org/jira/browse/HBASE-3647
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 3647-versioned.txt, hbase-3647.txt
>
>
> Distinguishing read and write request counts, on top of HBASE-3507, would 
> benefit load balancer.
> The action for balancing read vs. write load should be different. For read 
> load, region movement should be low (to keep scanner happy). For write load, 
> region movement is allowed.
> Now that we have cheap(er) counters, it should not be too burdensome keeping 
> up the extra count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3647) Distinguish read and write request count in region

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014869#comment-13014869
 ] 

stack commented on HBASE-3647:
--

Ted, your patch is missing the new RegionLoad class.  Thanks boss.

> Distinguish read and write request count in region
> --
>
> Key: HBASE-3647
> URL: https://issues.apache.org/jira/browse/HBASE-3647
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 3647-versioned.txt, hbase-3647.txt
>
>
> Distinguishing read and write request counts, on top of HBASE-3507, would 
> benefit load balancer.
> The action for balancing read vs. write load should be different. For read 
> load, region movement should be low (to keep scanner happy). For write load, 
> region movement is allowed.
> Now that we have cheap(er) counters, it should not be too burdensome keeping 
> up the extra count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3647) Distinguish read and write request count in region

2011-04-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3647:
--

Attachment: 3647-versioned-2.txt

Fix the error in HRegionServer.java

> Distinguish read and write request count in region
> --
>
> Key: HBASE-3647
> URL: https://issues.apache.org/jira/browse/HBASE-3647
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 3647-versioned-2.txt, 3647-versioned.txt, hbase-3647.txt
>
>
> Distinguishing read and write request counts, on top of HBASE-3507, would 
> benefit load balancer.
> The action for balancing read vs. write load should be different. For read 
> load, region movement should be low (to keep scanner happy). For write load, 
> region movement is allowed.
> Now that we have cheap(er) counters, it should not be too burdensome keeping 
> up the extra count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3065) Retry all 'retryable' zk operations; e.g. connection loss

2011-04-01 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014889#comment-13014889
 ] 

Liyin Tang commented on HBASE-3065:
---

Changing the zk data format has a compatible problem.
Right now, I put some MAGIC number at the beginning of the new format zk data. 
So it can handle the old format zk data easily.
 
But the origin code cannot handle the new format of zk data. 
When deploying, it needs a fresh restart and deploy all the new jar files 
together.


> Retry all 'retryable' zk operations; e.g. connection loss
> -
>
> Key: HBASE-3065
> URL: https://issues.apache.org/jira/browse/HBASE-3065
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Liyin Tang
> Fix For: 0.92.0
>
>
> The 'new' master refactored our zk code tidying up all zk accesses and 
> coralling them behind nice zk utility classes.  One improvement was letting 
> out all KeeperExceptions letting the client deal.  Thats good generally 
> because in old days, we'd suppress important state zk changes in state.  But 
> there is at least one case the new zk utility could handle for the 
> application and thats the class of retryable KeeperExceptions.  The one that 
> comes to mind is conection loss.  On connection loss we should retry the 
> just-failed operation.  Usually the retry will just work.  At worse, on 
> reconnect, we'll pick up the expired session event. 
> Adding in this change shouldn't be too bad given the refactor of zk corralled 
> all zk access into one or two classes only.
> One thing to consider though is how much we should retry.  We could retry on 
> a timer or we could retry for ever as long as the Stoppable interface is 
> passed so if another thread has stopped or aborted the hosting service, we'll 
> notice and give up trying.  Doing the latter is probably better than some 
> kinda timeout.
> HBASE-3062 adds a timed retry on the first zk operation.  This issue is about 
> generalizing what is over there across all zk access.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced

2011-04-01 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014906#comment-13014906
 ] 

gaojinchao commented on HBASE-3373:
---

In hbase version 0.20.6, If contiguous regions, do not assign adjacent 
regions in same region server. So it can break daughters of splits in same 
region server and avoid hot spot. The performance can improve.

In version 0.90.1, daughter is opened in region server that his parent is 
opened.
In the case A region server has thousands of regions. the contiguous region is 
difficult to
Choose by random. So the region server always is hot spot. 

Should the balance method be choose the contiguous region and then random or 
other way avoid hot spot? (eg: add configue parameter choose balance method 
base on applications ?)

> Allow regions of specific table to be load-balanced
> ---
>
> Key: HBASE-3373
> URL: https://issues.apache.org/jira/browse/HBASE-3373
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.20.6
>Reporter: Ted Yu
> Fix For: 0.92.0
>
> Attachments: HbaseBalancerTest2.java
>
>
> From our experience, cluster can be well balanced and yet, one table's 
> regions may be badly concentrated on few region servers.
> For example, one table has 839 regions (380 regions at time of table 
> creation) out of which 202 are on one server.
> It would be desirable for load balancer to distribute regions for specified 
> tables evenly across the cluster. Each of such tables has number of regions 
> many times the cluster size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Himanshu Vashishtha (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-1512:
---

Attachment: patch-1512-3.txt

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014908#comment-13014908
 ] 

Himanshu Vashishtha commented on HBASE-1512:


Thanks for the suggestions Ted.

a) Added generics functionality to the AggregationClient. As suggested by Ted, 
there should be a ColumnInterpreter thing to give the client a chance to 
describe the cell value type. I made this thing generic, in the sense that now 
client is supposed to give the column interpreter object along with the agg 
function calls. AggregationClient has such a implementation where client says 
that its cell value is a long. Other cell values can be used with a similar 
approach.

b) While client can define the cell value type by implementing 
ColumnInterpreter,I still think the average and Standard deviation will be a 
double value. So, I added a wrapper on these methods to support the generic 
functionality. Please refer to AggreagationClient.getStdParams & getAvgParams. 
Let me know if it is "un-intuitive". I think it is right though :)

c) Added a filter to each of the agg functions. They are just passed along with 
the call, and are stuffed in the Scan object at the region level during 
scanning. In case of row count, if client provides a filter, that one will be 
used. If neither a filter nor a qualifier is provided, FirstKeyValueFilter is 
used.

d) Added more test cases for testing filter use cases (44 in total :)). 

e) refactored the "done" variable as suggested by Ted.

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3709) HFile compression not sharing configuration

2011-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014911#comment-13014911
 ] 

Hudson commented on HBASE-3709:
---

Integrated in HBase-TRUNK #1824 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1824/])


> HFile compression not sharing configuration
> ---
>
> Key: HBASE-3709
> URL: https://issues.apache.org/jira/browse/HBASE-3709
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 0.90.2, 0.92.0
>
> Attachments: HBASE-3709.patch
>
>
> In o.a.h.h.io.hfile.Compression, we defeat codec pooling. We also cause the 
> XML resources of the configuration to be read and parsed upon every reinit().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3720) Book.xml - porting conceptual-view / physical-view sections of HBaseArchitecture wiki

2011-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014910#comment-13014910
 ] 

Hudson commented on HBASE-3720:
---

Integrated in HBase-TRUNK #1824 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1824/])


> Book.xml - porting conceptual-view / physical-view sections of 
> HBaseArchitecture wiki
> -
>
> Key: HBASE-3720
> URL: https://issues.apache.org/jira/browse/HBASE-3720
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: book.xml.patch
>
>
> Ported the conceptual-view and physical-view examples from the 
> HBaseArchitecture wiki and added them to the Data Model section.  Docbook 
> tables turned out to be a lot easier to do by hand than I thought.
> I refactored these to be more consistent with terminology.  Instead of 
> referring to "Columns" per the wiki, I used "ColumnFamily").  Referred to the 
> data in the "contents:" CF in the same way as the "anchor:" CF with 
> "columnfamily:column" notation (in the wiki it didn't have a column name and 
> it looked a little odd).  I also dropped the "mime" column family to simplify 
> the example.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3715) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce

2011-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014912#comment-13014912
 ] 

Hudson commented on HBASE-3715:
---

Integrated in HBase-TRUNK #1824 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1824/])


> Book.xml - adding architecture section on client, adding section on spec-ex 
> under mapreduce
> ---
>
> Key: HBASE-3715
> URL: https://issues.apache.org/jira/browse/HBASE-3715
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: book.xml.patch
>
>
> Small changes to book.xml
> * added small section under MapReduce saying that it's generally advisable to 
> turn off speculative execution when using HBase as a source
> * Adding 'client' section under architecture that is a simplified port of the 
> client section in the HBaseArchitecture wiki page. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3717) deprecate HTable isTableEnabled() methods in favor of HBaseAdmin methods

2011-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014914#comment-13014914
 ] 

Hudson commented on HBASE-3717:
---

Integrated in HBase-TRUNK #1824 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1824/])


> deprecate HTable isTableEnabled() methods in favor of HBaseAdmin methods
> 
>
> Key: HBASE-3717
> URL: https://issues.apache.org/jira/browse/HBASE-3717
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.90.1
>Reporter: David Buttler
>Assignee: David Buttler
>Priority: Trivial
> Attachments: deprecate_HTable_isTableEnabled.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> the static methods on HTable.isTableEnabled() can lead to unintended 
> consequences if used naively without understanding potential side-effects.  
> Suggest deprecating these methods and pointing at the HBaseAdmin methods to 
> accomplish same task instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3716) Intermittent TestRegionRebalancing failure

2011-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014913#comment-13014913
 ] 

Hudson commented on HBASE-3716:
---

Integrated in HBase-TRUNK #1824 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1824/])


> Intermittent TestRegionRebalancing failure
> --
>
> Key: HBASE-3716
> URL: https://issues.apache.org/jira/browse/HBASE-3716
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 3716-addendum.txt, 3716.txt
>
>
> See HBase-TRUNK build #1820
> This could be due to HBASE-3681
> In trunk, default value of "hbase.regions.slop" is 20%. It is possible for 
> load balancer to see region distribution which falls within 20% of optimal 
> distribution.
> However, assertRegionsAreBalanced() uses 10% slop.
> One solution is to align the slop in assertRegionsAreBalanced() with 
> "hbase.regions.slop" value.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3705) Allow passing timestamp into importtsv

2011-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014915#comment-13014915
 ] 

Hudson commented on HBASE-3705:
---

Integrated in HBase-TRUNK #1824 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1824/])


> Allow passing timestamp into importtsv
> --
>
> Key: HBASE-3705
> URL: https://issues.apache.org/jira/browse/HBASE-3705
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 0.90.1
>Reporter: Andy Sautins
>Priority: Minor
> Attachments: 3705.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> importtsv sets the timestamp for the imported records to the current time in 
> ImportTsv.TsvImporter.Mapper.setup.  It can be useful to be able to set the 
> specific timestamp used for the import.  This JIRA adds a 
> -Dimporttsv.timestamp parameter to the importtsv job.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3712) HTable.close() doesn't shutdown thread pool

2011-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014917#comment-13014917
 ] 

Hudson commented on HBASE-3712:
---

Integrated in HBase-TRUNK #1824 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1824/])


> HTable.close() doesn't shutdown thread pool
> ---
>
> Key: HBASE-3712
> URL: https://issues.apache.org/jira/browse/HBASE-3712
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.90.3
>
> Attachments: 3712.txt
>
>
> HTable.close() doesn't shutdown thread pool

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3711) importtsv fails if rowkey length exceeds MAX_ROW_LENGTH

2011-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014916#comment-13014916
 ] 

Hudson commented on HBASE-3711:
---

Integrated in HBase-TRUNK #1824 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1824/])


> importtsv fails if rowkey length exceeds MAX_ROW_LENGTH
> ---
>
> Key: HBASE-3711
> URL: https://issues.apache.org/jira/browse/HBASE-3711
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Kazuki Ohta
>Assignee: Kazuki Ohta
> Fix For: 0.92.0
>
> Attachments: HBASE-3711.patch
>
>
> importtsv MapTask fails with this error, when long row key (exceeds 
> MAX_ROW_LENGTH) was given.
> {quote}
> 11/03/30 04:59:16 INFO mapred.JobClient: Task Id : 
> attempt_201103252231_0077_m_03_0, Status : FAILED
> java.lang.IllegalArgumentException: Row key is invalid
>   at org.apache.hadoop.hbase.client.Put.(Put.java:101)
>   at org.apache.hadoop.hbase.client.Put.(Put.java:80)
>   at org.apache.hadoop.hbase.client.Put.(Put.java:71)
>   at 
> org.apache.hadoop.hbase.mapreduce.ImportTsv$TsvImporter.map(ImportTsv.java:235)
>   at 
> org.apache.hadoop.hbase.mapreduce.ImportTsv$TsvImporter.map(ImportTsv.java:190)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>   at org.apache.hadoop.mapred.Child.main(Child.java:234)
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3647) Distinguish read and write request count in region

2011-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014918#comment-13014918
 ] 

Hudson commented on HBASE-3647:
---

Integrated in HBase-TRUNK #1824 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1824/])
HBASE-3647 Distinguish read and write request count in region -- reversed 
the patch because missing a file
HBASE-3647 Distinguish read and write request count in region


> Distinguish read and write request count in region
> --
>
> Key: HBASE-3647
> URL: https://issues.apache.org/jira/browse/HBASE-3647
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 3647-versioned-2.txt, 3647-versioned.txt, hbase-3647.txt
>
>
> Distinguishing read and write request counts, on top of HBASE-3507, would 
> benefit load balancer.
> The action for balancing read vs. write load should be different. For read 
> load, region movement should be low (to keep scanner happy). For write load, 
> region movement is allowed.
> Now that we have cheap(er) counters, it should not be too burdensome keeping 
> up the extra count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3684) Support column range filter

2011-04-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014919#comment-13014919
 ] 

Hudson commented on HBASE-3684:
---

Integrated in HBase-TRUNK #1824 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1824/])


> Support column range filter 
> 
>
> Key: HBASE-3684
> URL: https://issues.apache.org/jira/browse/HBASE-3684
> Project: HBase
>  Issue Type: New Feature
>  Components: filters
>Reporter: Jerry Chen
>Priority: Minor
> Attachments: 3684.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently we have a ColumnPrefix filter which will seek to the proper column 
> prefix. We also need a column range filter to query a range of columns. The 
> proposed interface is the following: ColumnRangeFilter(final byte[] 
> minColumn, boolean minColumnInclusive,final byte[] maxColumn, boolean 
> maxColumnInclusive) 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3065) Retry all 'retryable' zk operations; e.g. connection loss

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014923#comment-13014923
 ] 

stack commented on HBASE-3065:
--

Requiring a restart is ok Liyin.  Going from 0.90.x to 0.92 will require a 
restart currently anyways (The RCP version changed because of the addition of 
coprocessors among other additions and removals).

> Retry all 'retryable' zk operations; e.g. connection loss
> -
>
> Key: HBASE-3065
> URL: https://issues.apache.org/jira/browse/HBASE-3065
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Liyin Tang
> Fix For: 0.92.0
>
>
> The 'new' master refactored our zk code tidying up all zk accesses and 
> coralling them behind nice zk utility classes.  One improvement was letting 
> out all KeeperExceptions letting the client deal.  Thats good generally 
> because in old days, we'd suppress important state zk changes in state.  But 
> there is at least one case the new zk utility could handle for the 
> application and thats the class of retryable KeeperExceptions.  The one that 
> comes to mind is conection loss.  On connection loss we should retry the 
> just-failed operation.  Usually the retry will just work.  At worse, on 
> reconnect, we'll pick up the expired session event. 
> Adding in this change shouldn't be too bad given the refactor of zk corralled 
> all zk access into one or two classes only.
> One thing to consider though is how much we should retry.  We could retry on 
> a timer or we could retry for ever as long as the Stoppable interface is 
> passed so if another thread has stopped or aborted the hosting service, we'll 
> notice and give up trying.  Doing the latter is probably better than some 
> kinda timeout.
> HBASE-3062 adds a timed retry on the first zk operation.  This issue is about 
> generalizing what is over there across all zk access.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3065) Retry all 'retryable' zk operations; e.g. connection loss

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014924#comment-13014924
 ] 

stack commented on HBASE-3065:
--

Oh, so, do you even need the MAGIC in that case? (Restart usually clears zk 
state)

> Retry all 'retryable' zk operations; e.g. connection loss
> -
>
> Key: HBASE-3065
> URL: https://issues.apache.org/jira/browse/HBASE-3065
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Liyin Tang
> Fix For: 0.92.0
>
>
> The 'new' master refactored our zk code tidying up all zk accesses and 
> coralling them behind nice zk utility classes.  One improvement was letting 
> out all KeeperExceptions letting the client deal.  Thats good generally 
> because in old days, we'd suppress important state zk changes in state.  But 
> there is at least one case the new zk utility could handle for the 
> application and thats the class of retryable KeeperExceptions.  The one that 
> comes to mind is conection loss.  On connection loss we should retry the 
> just-failed operation.  Usually the retry will just work.  At worse, on 
> reconnect, we'll pick up the expired session event. 
> Adding in this change shouldn't be too bad given the refactor of zk corralled 
> all zk access into one or two classes only.
> One thing to consider though is how much we should retry.  We could retry on 
> a timer or we could retry for ever as long as the Stoppable interface is 
> passed so if another thread has stopped or aborted the hosting service, we'll 
> notice and give up trying.  Doing the latter is probably better than some 
> kinda timeout.
> HBASE-3062 adds a timed retry on the first zk operation.  This issue is about 
> generalizing what is over there across all zk access.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3647) Distinguish read and write request count in region

2011-04-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014926#comment-13014926
 ] 

stack commented on HBASE-3647:
--

Applied Ted's v2.  Compiles.  Thanks again Ted.

> Distinguish read and write request count in region
> --
>
> Key: HBASE-3647
> URL: https://issues.apache.org/jira/browse/HBASE-3647
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 3647-versioned-2.txt, 3647-versioned.txt, hbase-3647.txt
>
>
> Distinguishing read and write request counts, on top of HBASE-3507, would 
> benefit load balancer.
> The action for balancing read vs. write load should be different. For read 
> load, region movement should be low (to keep scanner happy). For write load, 
> region movement is allowed.
> Now that we have cheap(er) counters, it should not be too burdensome keeping 
> up the extra count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced

2011-04-01 Thread zhoushuaifeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014927#comment-13014927
 ] 

zhoushuaifeng commented on HBASE-3373:
--

Agree with Gao's comments. When the region are splitting, it usually gets more 
write operations. It's better to assign the daughters to different 
regionservers to avoid hot spot.

> Allow regions of specific table to be load-balanced
> ---
>
> Key: HBASE-3373
> URL: https://issues.apache.org/jira/browse/HBASE-3373
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.20.6
>Reporter: Ted Yu
> Fix For: 0.92.0
>
> Attachments: HbaseBalancerTest2.java
>
>
> From our experience, cluster can be well balanced and yet, one table's 
> regions may be badly concentrated on few region servers.
> For example, one table has 839 regions (380 regions at time of table 
> creation) out of which 202 are on one server.
> It would be desirable for load balancer to distribute regions for specified 
> tables evenly across the cluster. Each of such tables has number of regions 
> many times the cluster size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3704) Show per region request count in table.jsp

2011-04-01 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3704:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to TRUNK.  Thanks for the patch Ted.

> Show per region request count in table.jsp
> --
>
> Key: HBASE-3704
> URL: https://issues.apache.org/jira/browse/HBASE-3704
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.1
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 3704.txt
>
>
> table.jsp should display per region request count.
> It should also display region count per region server.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014942#comment-13014942
 ] 

Ted Yu commented on HBASE-1512:
---

For LongColumnInterpreter.divide(), if l2 is null, I think we should return 
Double.NaN
I would write:
{code}
  if (l2 == null)
return Double.NaN;
  if (l1 == null)
return 0;
{code}
I think the following method can be named getAvgArgs (argument in place of 
parameter):
{code}
  private  List getAvgParams(final byte[] tableName,
{code}
But I don't have strong opinion here.

getAvgParamsAsArray() of AvgCallBack can be named getAvgParams() because its 
return type is List<>.

Overall, version 3 is great.

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3719) Workload has to drain before hlog can be rolled

2011-04-01 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014945#comment-13014945
 ] 

dhruba borthakur commented on HBASE-3719:
-

Of course, clients requests will not get a response back until the old log is 
closed and the newly written transactions to the new HLog are also hflushed/sync

> Workload has to drain before hlog can be rolled
> ---
>
> Key: HBASE-3719
> URL: https://issues.apache.org/jira/browse/HBASE-3719
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> In the current implementation, the regionserver blocks new transactions from 
> occuring when the HLog is rolled. Closing the existing HLog sometimes takes 
> more than a few seconds and during this time all new puts/increments are 
> blocked. It will be nice if we can continue to write new transactions to the 
> new HLog (but maybe not commit those transactions) while the old HLog is 
> being closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014947#comment-13014947
 ] 

Ted Yu commented on HBASE-1512:
---

Also, I think it is time to move LongColumnInterpreter out into its own file 
under src/main/java/org/apache/hadoop/hbase/client/coprocessor/.


> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira