[jira] [Commented] (HBASE-3729) Get cells via shell with a time range predicate

2011-04-04 Thread Eric Charles (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015776#comment-13015776
 ] 

Eric Charles commented on HBASE-3729:
-

Right, get.rb was there.
I applied patch and tested with 

hbase(main):001:0> create 'test', 'cf'
hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
hbase(main):004:0> scan 'test'
ROW  COLUMN+CELL
 row1column=cf:a, 
timestamp=1301984830139, value=value1
==> value1 ==> OK
1 row(s) in 0.0440 seconds
hbase(main):005:0> put 'test', 'row1', 'cf:a', 'value2'
0 row(s) in 0.0610 seconds
hbase(main):006:0> scan 'test'
ROW  COLUMN+CELL
 row1column=cf:a, 
timestamp=1301984853863, value=value2
==> value2 ==> OK (the last one)
hbase(main):007:0> get 'test', 'row1'
COLUMN   CELL
 cf:atimestamp=1301984853863, 
value=value2
1 row(s) in 0.1350 seconds
==> value2 ==> OK (the last one)
hbase(main):009:0> get 'test', 'row1',  { TIMERANGE => [0, 3]}
COLUMN   CELL
 cf:atimestamp=1301984853863, 
value=value2
1 row(s) in 0.0260 seconds
==> I would have expected a list of values, namely value1 and value2, because 
they both map the given timerange predicate.

Or maybe I missunderstood what we were talking about?


> Get cells via shell with a time range predicate
> ---
>
> Key: HBASE-3729
> URL: https://issues.apache.org/jira/browse/HBASE-3729
> Project: HBase
>  Issue Type: New Feature
>  Components: shell
>Reporter: Eric Charles
>Assignee: Ted Yu
> Attachments: 3729-v2.txt, 3729.txt
>
>
> HBase shell allows to specify a timestamp to get a value
> - get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
> If you don't give the exact timestamp, you get nothing... so it's difficult 
> to get the cell previous versions.
> It would be fine to have a "time range" predicate based get.
> The shell syntax could be (depending on technical feasibility)
> - get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => (start_timestamp, 
> end_timestamp)}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-04-04 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani reassigned HBASE-1364:
--

Assignee: Prakash Khemani

> [performance] Distributed splitting of regionserver commit logs
> ---
>
> Key: HBASE-1364
> URL: https://issues.apache.org/jira/browse/HBASE-1364
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Reporter: stack
>Assignee: Prakash Khemani
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-1364.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> HBASE-1008 has some improvements to our log splitting on regionserver crash; 
> but it needs to run even faster.
> (Below is from HBASE-1008)
> In bigtable paper, the split is distributed. If we're going to have 1000 
> logs, we need to distribute or at least multithread the splitting.
> 1. As is, regions starting up expect to find one reconstruction log only. 
> Need to make it so pick up a bunch of edit logs and it should be fine that 
> logs are elsewhere in hdfs in an output directory written by all split 
> participants whether multithreaded or a mapreduce-like distributed process 
> (Lets write our distributed sort first as a MR so we learn whats involved; 
> distributed sort, as much as possible should use MR framework pieces). On 
> startup, regions go to this directory and pick up the files written by split 
> participants deleting and clearing the dir when all have been read in. Making 
> it so can take multiple logs for input, can also make the split process more 
> robust rather than current tenuous process which loses all edits if it 
> doesn't make it to the end without error.
> 2. Each column family rereads the reconstruction log to find its edits. Need 
> to fix that. Split can sort the edits by column family so store only reads 
> its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3734) HBaseAdmin creates new configurations in getCatalogTracker

2011-04-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015762#comment-13015762
 ] 

Ted Yu commented on HBASE-3734:
---

HBaseAdmin.getCatalogTracker() calls HConnectionManager.getConnection() which 
uses Configuration as key in HBASE_INSTANCES.
Since we want to reuse connection to the same cluster, maybe we can do this:
{code}
  connection = HBASE_INSTANCES.get(conf.get("hbase.zookeeper.quorum"));
  if (connection == null) {
connection = new HConnectionImplementation(conf);
HBASE_INSTANCES.put(conf.get("hbase.zookeeper.quorum"), connection);
  }
{code}
Of course, the above would break deleteConnection() which requires reference 
counting.
So maybe we can use a wrapper for (refcount, connection) tuple and store the 
wrapper object as value in HBASE_INSTANCES ?

> HBaseAdmin creates new configurations in getCatalogTracker
> --
>
> Key: HBASE-3734
> URL: https://issues.apache.org/jira/browse/HBASE-3734
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.1
>Reporter: Jean-Daniel Cryans
> Fix For: 0.90.3
>
>
> HBaseAdmin.getCatalogTracker creates new Configuration every time it's 
> called, instead HBA should reuse the same one and do the copy inside the 
> constructor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3734) HBaseAdmin creates new configurations in getCatalogTracker

2011-04-04 Thread Jean-Daniel Cryans (JIRA)
HBaseAdmin creates new configurations in getCatalogTracker
--

 Key: HBASE-3734
 URL: https://issues.apache.org/jira/browse/HBASE-3734
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.3


HBaseAdmin.getCatalogTracker creates new Configuration every time it's called, 
instead HBA should reuse the same one and do the copy inside the constructor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3733) MemStoreFlusher.flushOneForGlobalPressure() shouldn't be using TreeSet for HRegion

2011-04-04 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-3733.
---

   Resolution: Fixed
Fix Version/s: 0.92.0
 Hadoop Flags: [Reviewed]

Committed to trunk, thanks Ted!

> MemStoreFlusher.flushOneForGlobalPressure() shouldn't be using TreeSet for 
> HRegion
> --
>
> Key: HBASE-3733
> URL: https://issues.apache.org/jira/browse/HBASE-3733
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 3733.txt
>
>
> v-himanshu found that since HRegion doesn't implement Comparable, it cannot 
> be placed in TreeSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3729) Get cells via shell with a time range predicate

2011-04-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3729:
--

Attachment: 3729-v2.txt

Added one more usage sample according to Eric's suggestion.

> Get cells via shell with a time range predicate
> ---
>
> Key: HBASE-3729
> URL: https://issues.apache.org/jira/browse/HBASE-3729
> Project: HBase
>  Issue Type: New Feature
>  Components: shell
>Reporter: Eric Charles
>Assignee: Ted Yu
> Attachments: 3729-v2.txt, 3729.txt
>
>
> HBase shell allows to specify a timestamp to get a value
> - get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
> If you don't give the exact timestamp, you get nothing... so it's difficult 
> to get the cell previous versions.
> It would be fine to have a "time range" predicate based get.
> The shell syntax could be (depending on technical feasibility)
> - get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => (start_timestamp, 
> end_timestamp)}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3733) MemStoreFlusher.flushOneForGlobalPressure() shouldn't be using TreeSet for HRegion

2011-04-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3733:
--

Attachment: 3733.txt

Use HashSet in place of TreeSet

> MemStoreFlusher.flushOneForGlobalPressure() shouldn't be using TreeSet for 
> HRegion
> --
>
> Key: HBASE-3733
> URL: https://issues.apache.org/jira/browse/HBASE-3733
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 3733.txt
>
>
> v-himanshu found that since HRegion doesn't implement Comparable, it cannot 
> be placed in TreeSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3733) MemStoreFlusher.flushOneForGlobalPressure() shouldn't be using TreeSet for HRegion

2011-04-04 Thread Ted Yu (JIRA)
MemStoreFlusher.flushOneForGlobalPressure() shouldn't be using TreeSet for 
HRegion
--

 Key: HBASE-3733
 URL: https://issues.apache.org/jira/browse/HBASE-3733
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Ted Yu
Assignee: Ted Yu


v-himanshu found that since HRegion doesn't implement Comparable, it cannot be 
placed in TreeSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3732) New configuration option for client-side compression

2011-04-04 Thread Benoit Sigoure (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015753#comment-13015753
 ] 

Benoit Sigoure commented on HBASE-3732:
---

If you want {{Put}}/{{Result}} to do the conversion for you, that means the 
client needs to be aware of the schema of the table before it can start using 
it, right?  Because right now HBase clients don't know the schema, so it's 
something extra that they'd need to lookup separately, unless we add new fields 
in the {{.META.}} table that go along with each and every region.

> New configuration option for client-side compression
> 
>
> Key: HBASE-3732
> URL: https://issues.apache.org/jira/browse/HBASE-3732
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jean-Daniel Cryans
> Fix For: 0.92.0
>
>
> We have a case here where we have to store very fat cells (arrays of 
> integers) which can amount into the hundreds of KBs that we need to read 
> often, concurrently, and possibly keep in cache. Compressing the values on 
> the client using java.util.zip's Deflater before sending them to HBase proved 
> to be in our case almost an order of magnitude faster.
> There reasons are evident: less data sent to hbase, memstore contains 
> compressed data, block cache contains compressed data too, etc.
> I was thinking that it might be something useful to add to a family schema, 
> so that Put/Result do the conversion for you. The actual compression algo 
> should also be configurable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3729) Get cells via shell with a time range predicate

2011-04-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015751#comment-13015751
 ] 

Ted Yu commented on HBASE-3729:
---

I think 0.90.x has get.rb:
{code}
tyumac:hbase-0.90.2 tyu$ find . -name get.rb
./lib/ruby/shell/commands/get.rb
./src/main/ruby/shell/commands/get.rb
{code}

TIMERANGE => [ts1, ts2], VERSIONS => 4 would limit output for the same rowkey 
to 4.


> Get cells via shell with a time range predicate
> ---
>
> Key: HBASE-3729
> URL: https://issues.apache.org/jira/browse/HBASE-3729
> Project: HBase
>  Issue Type: New Feature
>  Components: shell
>Reporter: Eric Charles
>Assignee: Ted Yu
> Attachments: 3729.txt
>
>
> HBase shell allows to specify a timestamp to get a value
> - get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
> If you don't give the exact timestamp, you get nothing... so it's difficult 
> to get the cell previous versions.
> It would be fine to have a "time range" predicate based get.
> The shell syntax could be (depending on technical feasibility)
> - get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => (start_timestamp, 
> end_timestamp)}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3730) DEFAULT_VERSIONS should be 1

2011-04-04 Thread Eric Charles (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015748#comment-13015748
 ] 

Eric Charles commented on HBASE-3730:
-

I've copied here after the comment made by tsuna on 
http://markmail.org/message/ifphk6plab3uoi5b.

tsuna has summarized what I have in mind: it does not cost much (well, some 
disk), and gives a nice feature to recover values. 3 is a good number, but 
should be further announced in doc, tutorials,...

tsuna comment: "Personally I think that 3 is a good reasonable default. Maybe 
most people don't really need 3 versions, but most of the time I'm sure they 
can pay for it, they won't even notice. It can be a life-saver after you screw 
up to be able to get back to older versions... If you truly have a "big data" 
problem (few people really do), then you probably will know what you're doing, 
and you'll tune the number of versions appropriately for your needs."


> DEFAULT_VERSIONS should be 1
> 
>
> Key: HBASE-3730
> URL: https://issues.apache.org/jira/browse/HBASE-3730
> Project: HBase
>  Issue Type: Improvement
>Reporter: Joe Pallas
>Priority: Minor
>
> The current DEFAULT_VERSIONS (in HColumnDescriptor) is 3, but there is no 
> particular reason for this.  Many uses require only 1, and having a default 
> that is different makes people confused (e.g., "Do I need multiple versions 
> to support deletes properly?").
> Reasonable values for the default are 1 and max int.  1 is the better choice.
> Discussion on the mailing list suggests that the current value of 3 may have 
> been derived from an example in the Bigtable paper.  The example does not 
> suggest that there is anything special about 3, it's just an illustration.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3071) Graceful decommissioning of a regionserver

2011-04-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3071:
-

   Resolution: Fixed
Fix Version/s: 0.92.0
   Status: Resolved  (was: Patch Available)

Applied branch and trunk.  Added doc. to book on how to decommission a node and 
how you could do a rolling restart using this script.

> Graceful decommissioning of a regionserver
> --
>
> Key: HBASE-3071
> URL: https://issues.apache.org/jira/browse/HBASE-3071
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>Assignee: stack
> Fix For: 0.92.0
>
> Attachments: 3071-v5.txt, 3071.txt, 3701-v2.txt, 3701-v3.txt
>
>
> Currently if you stop a regionserver nicely, it'll put up its stopping flag 
> and then close all hosted regions.  While the stopping flag is in place all 
> region requests are rejected.  If this server was under load, closing could 
> take a while.  Only after all is closed is the master informed and it'll 
> restart assigning (in old master, master woud get a report with list of all 
> regions closed, in new master the zk expired is triggered and we'll run 
> shutdown handler).
> At least in new master, we have means of disabling balancer, and then moving 
> the regions off the server one by one via HBaseAdmin methods -- we shoud 
> write a script to do this at least for rolling restarts -- but we need 
> something better.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3729) Get cells via shell with a time range predicate

2011-04-04 Thread Eric Charles (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015745#comment-13015745
 ] 

Eric Charles commented on HBASE-3729:
-

Hi Ted,
Tks a lot for this!
I applied the patch to the scripts of 0.90-1, but I think the scripts have have 
evolved in trunk (for example, I have no get.rb in 0.90-1.
I will try a bit later tomorrow the trunk with the TIMERANGE option to see if 
it returns more than 1 cell (in your example, only one cell is returned).
Btw, does TIMERANGE => [ts1, ts2], VERSIONS => 4 limit the ouput to 4?
It would be useful to add the description when you type "get help" (also on the 
http://wiki.apache.org/hadoop/Hbase/Shell)
 hbase> get 't1', 'r1'
  hbase> get 't1', 'r1', {COLUMN => 'c1'}
  hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
**  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], 
VERSIONS => 4}
  hbase> get 't1', 'r1', 'c1'
  hbase> get 't1', 'r1', 'c1', 'c2'
  hbase> get 't1', 'r1', ['c1', 'c2']


> Get cells via shell with a time range predicate
> ---
>
> Key: HBASE-3729
> URL: https://issues.apache.org/jira/browse/HBASE-3729
> Project: HBase
>  Issue Type: New Feature
>  Components: shell
>Reporter: Eric Charles
>Assignee: Ted Yu
> Attachments: 3729.txt
>
>
> HBase shell allows to specify a timestamp to get a value
> - get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
> If you don't give the exact timestamp, you get nothing... so it's difficult 
> to get the cell previous versions.
> It would be fine to have a "time range" predicate based get.
> The shell syntax could be (depending on technical feasibility)
> - get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => (start_timestamp, 
> end_timestamp)}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3071) Graceful decommissioning of a regionserver

2011-04-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3071:
-

Attachment: 3071-v5.txt

More testing.  Added some waits on machines to come up, a --debug flag, and 
some more helpful logging.

> Graceful decommissioning of a regionserver
> --
>
> Key: HBASE-3071
> URL: https://issues.apache.org/jira/browse/HBASE-3071
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>Assignee: stack
> Attachments: 3071-v5.txt, 3071.txt, 3701-v2.txt, 3701-v3.txt
>
>
> Currently if you stop a regionserver nicely, it'll put up its stopping flag 
> and then close all hosted regions.  While the stopping flag is in place all 
> region requests are rejected.  If this server was under load, closing could 
> take a while.  Only after all is closed is the master informed and it'll 
> restart assigning (in old master, master woud get a report with list of all 
> regions closed, in new master the zk expired is triggered and we'll run 
> shutdown handler).
> At least in new master, we have means of disabling balancer, and then moving 
> the regions off the server one by one via HBaseAdmin methods -- we shoud 
> write a script to do this at least for rolling restarts -- but we need 
> something better.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3731) NPE in HTable.getRegionsInfo()

2011-04-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015740#comment-13015740
 ] 

stack commented on HBASE-3731:
--

Yeah, IIRC, that was intent.

> NPE in HTable.getRegionsInfo()
> --
>
> Key: HBASE-3731
> URL: https://issues.apache.org/jira/browse/HBASE-3731
> Project: HBase
>  Issue Type: Bug
>Reporter: Liyin Tang
>
> In HTable.getRegionInfo
> 
> HRegionInfo info = Writables.getHRegionInfo(
> rowResult.getValue(HConstants.CATALOG_FAMILY,
> HConstants.REGIONINFO_QUALIFIER));
> 
> But the rowResult.getValue() may return null, and Writables.getHRegionInfo 
> will throw NullPoinException when the parameter is null.
> 2 fixes here: 
> 1) In Writables.getHRegionInfo(). We need to check whether the data is null 
> before using data.length.
> 2)
> HRegionInfo info = Writables.getHRegionInfoOrNull(
> rowResult.getValue(HConstants.CATALOG_FAMILY,
> HConstants.REGIONINFO_QUALIFIER));
> if(info == null)
>   return false
> 
> Any thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3731) NPE in HTable.getRegionsInfo()

2011-04-04 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015692#comment-13015692
 ] 

Liyin Tang commented on HBASE-3731:
---

Sounds good to me.
But from the code, may be the original purpose of getHRegionInfo is to throw 
out IllegalArgumentException if the parameter is null, while 
getHRegionInfoOrNull just return back null if the parameter is null.
 


> NPE in HTable.getRegionsInfo()
> --
>
> Key: HBASE-3731
> URL: https://issues.apache.org/jira/browse/HBASE-3731
> Project: HBase
>  Issue Type: Bug
>Reporter: Liyin Tang
>
> In HTable.getRegionInfo
> 
> HRegionInfo info = Writables.getHRegionInfo(
> rowResult.getValue(HConstants.CATALOG_FAMILY,
> HConstants.REGIONINFO_QUALIFIER));
> 
> But the rowResult.getValue() may return null, and Writables.getHRegionInfo 
> will throw NullPoinException when the parameter is null.
> 2 fixes here: 
> 1) In Writables.getHRegionInfo(). We need to check whether the data is null 
> before using data.length.
> 2)
> HRegionInfo info = Writables.getHRegionInfoOrNull(
> rowResult.getValue(HConstants.CATALOG_FAMILY,
> HConstants.REGIONINFO_QUALIFIER));
> if(info == null)
>   return false
> 
> Any thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk

2011-04-04 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015683#comment-13015683
 ] 

Jonathan Gray commented on HBASE-3725:
--

Hey Nathaniel.  Thanks for posting the unit test!

I will take a look at this sometime this week and try to get a fix out for it.

> HBase increments from old value after delete and write to disk
> --
>
> Key: HBASE-3725
> URL: https://issues.apache.org/jira/browse/HBASE-3725
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.90.1
>Reporter: Nathaniel Cook
> Attachments: HBASE-3725.patch
>
>
> Deleted row values are sometimes used for starting points on new increments.
> To reproduce:
> Create a row "r". Set column "x" to some default value.
> Force hbase to write that value to the file system (such as restarting the 
> cluster).
> Delete the row.
> Call table.incrementColumnValue with "some_value"
> Get the row.
> The returned value in the column was incremented from the old value before 
> the row was deleted instead of being initialized to "some_value".
> Code to reproduce:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HColumnDescriptor;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HBaseAdmin;
> import org.apache.hadoop.hbase.client.HTableInterface;
> import org.apache.hadoop.hbase.client.HTablePool;
> import org.apache.hadoop.hbase.client.Increment;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.util.Bytes;
> public class HBaseTestIncrement
> {
>   static String tableName  = "testIncrement";
>   static byte[] infoCF = Bytes.toBytes("info");
>   static byte[] rowKey = Bytes.toBytes("test-rowKey");
>   static byte[] newInc = Bytes.toBytes("new");
>   static byte[] oldInc = Bytes.toBytes("old");
>   /**
>* This code reproduces a bug with increment column values in hbase
>* Usage: First run part one by passing '1' as the first arg
>*Then restart the hbase cluster so it writes everything to disk
>*Run part two by passing '2' as the first arg
>*
>* This will result in the old deleted data being found and used for 
> the increment calls
>*
>* @param args
>* @throws IOException
>*/
>   public static void main(String[] args) throws IOException
>   {
>   if("1".equals(args[0]))
>   partOne();
>   if("2".equals(args[0]))
>   partTwo();
>   if ("both".equals(args[0]))
>   {
>   partOne();
>   partTwo();
>   }
>   }
>   /**
>* Creates a table and increments a column value 10 times by 10 each 
> time.
>* Results in a value of 100 for the column
>*
>* @throws IOException
>*/
>   static void partOne()throws IOException
>   {
>   Configuration conf = HBaseConfiguration.create();
>   HBaseAdmin admin = new HBaseAdmin(conf);
>   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>   tableDesc.addFamily(new HColumnDescriptor(infoCF));
>   if(admin.tableExists(tableName))
>   {
>   admin.disableTable(tableName);
>   admin.deleteTable(tableName);
>   }
>   admin.createTable(tableDesc);
>   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
>   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
>   //Increment unitialized column
>   for (int j = 0; j < 10; j++)
>   {
>   table.incrementColumnValue(rowKey, infoCF, oldInc, 
> (long)10);
>   Increment inc = new Increment(rowKey);
>   inc.addColumn(infoCF, newInc, (long)10);
>   table.increment(inc);
>   }
>   Get get = new Get(rowKey);
>   Result r = table.get(get);
>   System.out.println("initial values: new " + 
> Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + 
> Bytes.toLong(r.getValue(infoCF, oldInc)));
>   }
>   /**
>* First deletes the data then increments the column 10 times by 1 each 
> time
>*
>* Should result in a value of 10 but it doesn't, it results in a 
> values of 110
>*
>* @throws IOException
>*/
>   static 

[jira] [Created] (HBASE-3732) New configuration option for client-side compression

2011-04-04 Thread Jean-Daniel Cryans (JIRA)
New configuration option for client-side compression


 Key: HBASE-3732
 URL: https://issues.apache.org/jira/browse/HBASE-3732
 Project: HBase
  Issue Type: New Feature
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0


We have a case here where we have to store very fat cells (arrays of integers) 
which can amount into the hundreds of KBs that we need to read often, 
concurrently, and possibly keep in cache. Compressing the values on the client 
using java.util.zip's Deflater before sending them to HBase proved to be in our 
case almost an order of magnitude faster.

There reasons are evident: less data sent to hbase, memstore contains 
compressed data, block cache contains compressed data too, etc.

I was thinking that it might be something useful to add to a family schema, so 
that Put/Result do the conversion for you. The actual compression algo should 
also be configurable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3725) HBase increments from old value after delete and write to disk

2011-04-04 Thread Nathaniel Cook (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathaniel Cook updated HBASE-3725:
--

Attachment: HBASE-3725.patch

Here is a patch with a simple unit test for this bug.

As far as a fix for the bug any help would be appreciated. I don't know enough 
about HBase's internals to even know where to begin.

> HBase increments from old value after delete and write to disk
> --
>
> Key: HBASE-3725
> URL: https://issues.apache.org/jira/browse/HBASE-3725
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.90.1
>Reporter: Nathaniel Cook
> Attachments: HBASE-3725.patch
>
>
> Deleted row values are sometimes used for starting points on new increments.
> To reproduce:
> Create a row "r". Set column "x" to some default value.
> Force hbase to write that value to the file system (such as restarting the 
> cluster).
> Delete the row.
> Call table.incrementColumnValue with "some_value"
> Get the row.
> The returned value in the column was incremented from the old value before 
> the row was deleted instead of being initialized to "some_value".
> Code to reproduce:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HColumnDescriptor;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HBaseAdmin;
> import org.apache.hadoop.hbase.client.HTableInterface;
> import org.apache.hadoop.hbase.client.HTablePool;
> import org.apache.hadoop.hbase.client.Increment;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.util.Bytes;
> public class HBaseTestIncrement
> {
>   static String tableName  = "testIncrement";
>   static byte[] infoCF = Bytes.toBytes("info");
>   static byte[] rowKey = Bytes.toBytes("test-rowKey");
>   static byte[] newInc = Bytes.toBytes("new");
>   static byte[] oldInc = Bytes.toBytes("old");
>   /**
>* This code reproduces a bug with increment column values in hbase
>* Usage: First run part one by passing '1' as the first arg
>*Then restart the hbase cluster so it writes everything to disk
>*Run part two by passing '2' as the first arg
>*
>* This will result in the old deleted data being found and used for 
> the increment calls
>*
>* @param args
>* @throws IOException
>*/
>   public static void main(String[] args) throws IOException
>   {
>   if("1".equals(args[0]))
>   partOne();
>   if("2".equals(args[0]))
>   partTwo();
>   if ("both".equals(args[0]))
>   {
>   partOne();
>   partTwo();
>   }
>   }
>   /**
>* Creates a table and increments a column value 10 times by 10 each 
> time.
>* Results in a value of 100 for the column
>*
>* @throws IOException
>*/
>   static void partOne()throws IOException
>   {
>   Configuration conf = HBaseConfiguration.create();
>   HBaseAdmin admin = new HBaseAdmin(conf);
>   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>   tableDesc.addFamily(new HColumnDescriptor(infoCF));
>   if(admin.tableExists(tableName))
>   {
>   admin.disableTable(tableName);
>   admin.deleteTable(tableName);
>   }
>   admin.createTable(tableDesc);
>   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
>   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
>   //Increment unitialized column
>   for (int j = 0; j < 10; j++)
>   {
>   table.incrementColumnValue(rowKey, infoCF, oldInc, 
> (long)10);
>   Increment inc = new Increment(rowKey);
>   inc.addColumn(infoCF, newInc, (long)10);
>   table.increment(inc);
>   }
>   Get get = new Get(rowKey);
>   Result r = table.get(get);
>   System.out.println("initial values: new " + 
> Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + 
> Bytes.toLong(r.getValue(infoCF, oldInc)));
>   }
>   /**
>* First deletes the data then increments the column 10 times by 1 each 
> time
>*
>* Should result in a value of 10 but it doesn't, it results in a 
> values of 110
>*
>* @throws IO

[jira] [Updated] (HBASE-3685) when multiple columns are combined with TimestampFilter, only one column is returned

2011-04-04 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated HBASE-3685:
--

Attachment: 3685-missing-column.patch

bug fix. This patch has been submitted to review board as well. 

> when multiple columns are combined with TimestampFilter, only one column is 
> returned
> 
>
> Key: HBASE-3685
> URL: https://issues.apache.org/jira/browse/HBASE-3685
> Project: HBase
>  Issue Type: Bug
>  Components: filters, regionserver
>Reporter: Jerry Chen
>Priority: Minor
>  Labels: noob
> Attachments: 3685-missing-column.patch
>
>
> As reported by an Hbase user: 
> "I have a ThreadMetadata column family, and there are two columns in it: 
> v12:th: and v12:me. The following code only returns v12:me
> get.addColumn(Bytes.toBytes("ThreadMetadata"), Bytes.toBytes("v12:th:");
> get.addColumn(Bytes.toBytes("ThreadMetadata"), Bytes.toBytes("v12:me:");
> List threadIds = new ArrayList();
> threadIds.add(10709L);
> TimestampFilter filter = new TimestampFilter(threadIds);
> get.setFilter(filter);
> get.setMaxVersions();
> Result result = table.get(get);
> I checked hbase for the key/value, they are present. Also other combinations 
> like no timestampfilter, it returns both."
> Kannan was able to do a small repro of the issue and commented that if we 
> drop the get.setMaxVersions(), then the problem goes away. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3685) when multiple columns are combined with TimestampFilter, only one column is returned

2011-04-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015656#comment-13015656
 ] 

stack commented on HBASE-3685:
--

@Jerry Is there a patch attached?  I don't see it.

> when multiple columns are combined with TimestampFilter, only one column is 
> returned
> 
>
> Key: HBASE-3685
> URL: https://issues.apache.org/jira/browse/HBASE-3685
> Project: HBase
>  Issue Type: Bug
>  Components: filters, regionserver
>Reporter: Jerry Chen
>Priority: Minor
>  Labels: noob
>
> As reported by an Hbase user: 
> "I have a ThreadMetadata column family, and there are two columns in it: 
> v12:th: and v12:me. The following code only returns v12:me
> get.addColumn(Bytes.toBytes("ThreadMetadata"), Bytes.toBytes("v12:th:");
> get.addColumn(Bytes.toBytes("ThreadMetadata"), Bytes.toBytes("v12:me:");
> List threadIds = new ArrayList();
> threadIds.add(10709L);
> TimestampFilter filter = new TimestampFilter(threadIds);
> get.setFilter(filter);
> get.setMaxVersions();
> Result result = table.get(get);
> I checked hbase for the key/value, they are present. Also other combinations 
> like no timestampfilter, it returns both."
> Kannan was able to do a small repro of the issue and commented that if we 
> drop the get.setMaxVersions(), then the problem goes away. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3731) NPE in HTable.getRegionsInfo()

2011-04-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015654#comment-13015654
 ] 

stack commented on HBASE-3731:
--

2., unless you think the getHRegionInfoOrNull to awkward to use and that we 
should get rid of it and turn all calls to getHREgionInfos into 
getHRegionInfoOrNulls.

> NPE in HTable.getRegionsInfo()
> --
>
> Key: HBASE-3731
> URL: https://issues.apache.org/jira/browse/HBASE-3731
> Project: HBase
>  Issue Type: Bug
>Reporter: Liyin Tang
>
> In HTable.getRegionInfo
> 
> HRegionInfo info = Writables.getHRegionInfo(
> rowResult.getValue(HConstants.CATALOG_FAMILY,
> HConstants.REGIONINFO_QUALIFIER));
> 
> But the rowResult.getValue() may return null, and Writables.getHRegionInfo 
> will throw NullPoinException when the parameter is null.
> 2 fixes here: 
> 1) In Writables.getHRegionInfo(). We need to check whether the data is null 
> before using data.length.
> 2)
> HRegionInfo info = Writables.getHRegionInfoOrNull(
> rowResult.getValue(HConstants.CATALOG_FAMILY,
> HConstants.REGIONINFO_QUALIFIER));
> if(info == null)
>   return false
> 
> Any thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3685) when multiple columns are combined with TimestampFilter, only one column is returned

2011-04-04 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated HBASE-3685:
--

Affects Version/s: (was: 0.89.20100924)
 Release Note: bug fix. 
   Status: Patch Available  (was: Open)

> when multiple columns are combined with TimestampFilter, only one column is 
> returned
> 
>
> Key: HBASE-3685
> URL: https://issues.apache.org/jira/browse/HBASE-3685
> Project: HBase
>  Issue Type: Bug
>  Components: filters, regionserver
>Reporter: Jerry Chen
>Priority: Minor
>  Labels: noob
>
> As reported by an Hbase user: 
> "I have a ThreadMetadata column family, and there are two columns in it: 
> v12:th: and v12:me. The following code only returns v12:me
> get.addColumn(Bytes.toBytes("ThreadMetadata"), Bytes.toBytes("v12:th:");
> get.addColumn(Bytes.toBytes("ThreadMetadata"), Bytes.toBytes("v12:me:");
> List threadIds = new ArrayList();
> threadIds.add(10709L);
> TimestampFilter filter = new TimestampFilter(threadIds);
> get.setFilter(filter);
> get.setMaxVersions();
> Result result = table.get(get);
> I checked hbase for the key/value, they are present. Also other combinations 
> like no timestampfilter, it returns both."
> Kannan was able to do a small repro of the issue and commented that if we 
> drop the get.setMaxVersions(), then the problem goes away. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3345) Master crash with NPE during table disable

2011-04-04 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015651#comment-13015651
 ] 

Alex Newman commented on HBASE-3345:


Most likely it wasn't that it was null but that the key didn't exist.

> Master crash with NPE during table disable
> --
>
> Key: HBASE-3345
> URL: https://issues.apache.org/jira/browse/HBASE-3345
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.0
>Reporter: Todd Lipcon
>Priority: Blocker
>
> Running on a config that triggers lots of splits, I attempted to disable a 
> table while it was getting a lot of load and injected failures. Got the 
> following NPE in master, followed by an abort:
> 2010-12-13 12:52:27,323 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region 
> usertable,user1182862181,1292273503885.d223f1dc4d9003508f2db7566518b05d. 
> (offlining)
> 2010-12-13 12:52:27,323 FATAL org.apache.hadoop.hbase.master.HMaster: Remote 
> unexpected exception
> java.lang.NullPointerException: Passed server is null
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:581)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1085)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1032)
> at 
> org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler$1.run(DisableTableHandler.java:132)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3441) Add means of graceful decommission of node where it sheds load slowly in a non-disruptive manner until its carrying none

2011-04-04 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015645#comment-13015645
 ] 

Jean-Daniel Cryans commented on HBASE-3441:
---

(I guess Jean is me so I'm going to answer)

The region moving stuff is orthogonal to decommissioning IMO, and IIRC Benoit 
either opened an issue about it or at least he described on the dev list a way 
to make it much less disruptive to move a region.

> Add means of graceful decommission of node where it sheds load slowly in a 
> non-disruptive manner until its carrying none
> 
>
> Key: HBASE-3441
> URL: https://issues.apache.org/jira/browse/HBASE-3441
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Priority: Blocker
> Fix For: 0.92.0
>
>
> Needed decommissioning nodes.  Currently we effectively crash them out.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3731) NPE in HTable.getRegionsInfo()

2011-04-04 Thread Liyin Tang (JIRA)
NPE in HTable.getRegionsInfo()
--

 Key: HBASE-3731
 URL: https://issues.apache.org/jira/browse/HBASE-3731
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang


In HTable.getRegionInfo

HRegionInfo info = Writables.getHRegionInfo(
rowResult.getValue(HConstants.CATALOG_FAMILY,
HConstants.REGIONINFO_QUALIFIER));

But the rowResult.getValue() may return null, and Writables.getHRegionInfo will 
throw NullPoinException when the parameter is null.

2 fixes here: 
1) In Writables.getHRegionInfo(). We need to check whether the data is null 
before using data.length.
2)
HRegionInfo info = Writables.getHRegionInfoOrNull(
rowResult.getValue(HConstants.CATALOG_FAMILY,
HConstants.REGIONINFO_QUALIFIER));
if(info == null)
  return false


Any thoughts?


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3441) Add means of graceful decommission of node where it sheds load slowly in a non-disruptive manner until its carrying none

2011-04-04 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015639#comment-13015639
 ] 

Alex Newman commented on HBASE-3441:


Jean, where are we on nicolas's suggestion? Do they have to move in lockstep? 
What's the guarantee btwn the two wals?1`

> Add means of graceful decommission of node where it sheds load slowly in a 
> non-disruptive manner until its carrying none
> 
>
> Key: HBASE-3441
> URL: https://issues.apache.org/jira/browse/HBASE-3441
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Priority: Blocker
> Fix For: 0.92.0
>
>
> Needed decommissioning nodes.  Currently we effectively crash them out.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3721) Speedup LoadIncrementalHFiles

2011-04-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3721:
--

Attachment: 3721-v3.txt

Renamed getRegionName()

> Speedup LoadIncrementalHFiles
> -
>
> Key: HBASE-3721
> URL: https://issues.apache.org/jira/browse/HBASE-3721
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 3721-v2.txt, 3721-v3.txt, 3721.txt
>
>
> From Adam Phelps:
> from the logs it looks like <1% of the hfiles we're loading have to be split. 
>  Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually 
> thinking our problem is that this code loads the hfiles sequentially.  Our 
> largest table has over 2500 regions and the data being loaded is fairly well 
> distributed across them, so there end up being around 2500 HFiles for each 
> load period.  At 1-2 seconds per HFile that means the loading process is very 
> time consuming.
> Currently server.bulkLoadHFile() is a blocking call.
> We can utilize ExecutorService to achieve better parallelism.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3488) Add CellCounter to count multiple versions of rows

2011-04-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015589#comment-13015589
 ] 

Ted Yu commented on HBASE-3488:
---

Sure I can.

> Add CellCounter to count multiple versions of rows
> --
>
> Key: HBASE-3488
> URL: https://issues.apache.org/jira/browse/HBASE-3488
> Project: HBase
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.90.0
>Reporter: Ted Yu
>Assignee: Subbu M Iyer
> Fix For: 0.92.0
>
> Attachments: 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows-version2.patch, 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows.patch
>
>
> Currently RowCounter only retrieves latest version for each row.
> Some applications would store multiple versions for the same row.
> RowCounter should accept a new parameter for the number of versions to return.
> Scan object would be configured with version parameter (for scan.maxVersions).
> Then the following API should be called:
> {code}
>   public KeyValue[] raw() {
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3488) Add CellCounter to count multiple versions of rows

2011-04-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015587#comment-13015587
 ] 

stack commented on HBASE-3488:
--

Its a feature Ted.  Hard to argue that features should be backported.  Can you 
apply it to your local hbase version?

> Add CellCounter to count multiple versions of rows
> --
>
> Key: HBASE-3488
> URL: https://issues.apache.org/jira/browse/HBASE-3488
> Project: HBase
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.90.0
>Reporter: Ted Yu
>Assignee: Subbu M Iyer
> Fix For: 0.92.0
>
> Attachments: 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows-version2.patch, 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows.patch
>
>
> Currently RowCounter only retrieves latest version for each row.
> Some applications would store multiple versions for the same row.
> RowCounter should accept a new parameter for the number of versions to return.
> Scan object would be configured with version parameter (for scan.maxVersions).
> Then the following API should be called:
> {code}
>   public KeyValue[] raw() {
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2963) based on hadoop0.21.0

2011-04-04 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015572#comment-13015572
 ] 

Alex Newman commented on HBASE-2963:


I think this can be closed out.

> based on hadoop0.21.0
> -
>
> Key: HBASE-2963
> URL: https://issues.apache.org/jira/browse/HBASE-2963
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.20.6
>Reporter: chenjiajun
>Priority: Blocker
> Fix For: 0.92.0
>
>
> I upgrade my hadoop from 0.20.2 to 0.21.0,but hbase is does't work!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3488) Add CellCounter to count multiple versions of rows

2011-04-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015569#comment-13015569
 ] 

Ted Yu commented on HBASE-3488:
---

Is it possible to port this to 0.90.3 ?
We need this feature. It wouldn't introduce regression.

> Add CellCounter to count multiple versions of rows
> --
>
> Key: HBASE-3488
> URL: https://issues.apache.org/jira/browse/HBASE-3488
> Project: HBase
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.90.0
>Reporter: Ted Yu
>Assignee: Subbu M Iyer
> Fix For: 0.92.0
>
> Attachments: 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows-version2.patch, 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows.patch
>
>
> Currently RowCounter only retrieves latest version for each row.
> Some applications would store multiple versions for the same row.
> RowCounter should accept a new parameter for the number of versions to return.
> Scan object would be configured with version parameter (for scan.maxVersions).
> Then the following API should be called:
> {code}
>   public KeyValue[] raw() {
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3727) MultiHFileOutputFormat

2011-04-04 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015566#comment-13015566
 ] 

Andrew Purtell commented on HBASE-3727:
---

I want to run this for a bit and see if more is needed.



> MultiHFileOutputFormat
> --
>
> Key: HBASE-3727
> URL: https://issues.apache.org/jira/browse/HBASE-3727
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Purtell
>Priority: Minor
> Attachments: MultiHFileOutputFormat.java, MultiHFileOutputFormat.java
>
>
> Like MultiTableOutputFormat, but outputting HFiles. Key is tablename as an 
> IBW. Creates sub-writers (code cut and pasted from HFileOutputFormat) on 
> demand that produce HFiles in per-table subdirectories of the configured 
> output path. Does not currently support partitioning for existing tables / 
> incremental update.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-2118) Fix our wikipedia page; says we're slow among other errors

2011-04-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-2118.
--

Resolution: Fixed

Resolving.  Page is a bit better now.

> Fix our wikipedia page; says we're slow among other errors
> --
>
> Key: HBASE-2118
> URL: https://issues.apache.org/jira/browse/HBASE-2118
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-3660) HMaster will exit when starting with stale data in cached locations such as -ROOT- or .META.

2011-04-04 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HBASE-3660:
--

Assignee: stack

> HMaster will exit when starting with stale data in cached locations such as 
> -ROOT- or .META.
> 
>
> Key: HBASE-3660
> URL: https://issues.apache.org/jira/browse/HBASE-3660
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Affects Versions: 0.90.1
>Reporter: Cosmin Lehene
>Assignee: stack
>Priority: Critical
> Fix For: 0.90.2
>
> Attachments: 3660.txt, HBASE-3660.patch
>
>
> later edit: I've mixed up two issues here. The main problem is that a client 
> (that could be HMaster) will read stale data from -ROOT- or .META. and not 
> deal correctly with the raised exceptions. 
> I've noticed this when the IP on my machine changed (it's even easier to 
> detect when LZO doesn't work)
> Master loads .META. successfully and then starts assigning regions.
> However LZO doesn't work so HRegionServer can't open the regions. 
> A client attempts to get data from a table so it reads the location from 
> .META. but goes to a totally different server (the old value in .META.)
> This could happen without the LZO story too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2404) native fast compression codec

2011-04-04 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015561#comment-13015561
 ] 

Alex Newman commented on HBASE-2404:


Dup of HBASE-3691

> native fast compression codec
> -
>
> Key: HBASE-2404
> URL: https://issues.apache.org/jira/browse/HBASE-2404
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Purtell
>
> We often recommend enabling LZO on tables, most users see big wins. LZO is 
> roughly comparable to BigTable LZW, and we get something like prefix 
> compression on keys. However, LZO is GPL licensed, so a series of install 
> steps are required: http://wiki.apache.org/hadoop/UsingLzoCompression . It's 
> easy to miss a step or get it wrong. If so, all writes on a table 
> (re)configured to use LZO will fail. 
> Hadoop, well, Java, has native support for gzip compression but it is too 
> slow generally; is a good option however for archival tables. 
> This issue is about considering bundling or creating a comparable alternate 
> to LZO which is ASF 2.0 license compatible. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3727) MultiHFileOutputFormat

2011-04-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015558#comment-13015558
 ] 

stack commented on HBASE-3727:
--

Its 2011 (for copyright).   Add a class comment saying what this output format 
does.  Can you factor out the common code Andrew rather than copy/paste?  You 
know that there will be a bug in the copied code as soon as you commit!  Else 
looks good.  You going to commit?

> MultiHFileOutputFormat
> --
>
> Key: HBASE-3727
> URL: https://issues.apache.org/jira/browse/HBASE-3727
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Purtell
>Priority: Minor
> Attachments: MultiHFileOutputFormat.java, MultiHFileOutputFormat.java
>
>
> Like MultiTableOutputFormat, but outputting HFiles. Key is tablename as an 
> IBW. Creates sub-writers (code cut and pasted from HFileOutputFormat) on 
> demand that produce HFiles in per-table subdirectories of the configured 
> output path. Does not currently support partitioning for existing tables / 
> incremental update.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2118) Fix our wikipedia page; says we're slow among other errors

2011-04-04 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015560#comment-13015560
 ] 

Alex Newman commented on HBASE-2118:


Can we close this out?

> Fix our wikipedia page; says we're slow among other errors
> --
>
> Key: HBASE-2118
> URL: https://issues.apache.org/jira/browse/HBASE-2118
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3488) Add CellCounter to count multiple versions of rows

2011-04-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3488.
--

  Resolution: Fixed
Assignee: Subbu M Iyer
Hadoop Flags: [Reviewed]

Committed to TRUNK.  Thanks for nice patch Subbu.  I added mention of this task 
to our mapreduce Driver too.

> Add CellCounter to count multiple versions of rows
> --
>
> Key: HBASE-3488
> URL: https://issues.apache.org/jira/browse/HBASE-3488
> Project: HBase
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.90.0
>Reporter: Ted Yu
>Assignee: Subbu M Iyer
> Fix For: 0.92.0
>
> Attachments: 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows-version2.patch, 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows.patch
>
>
> Currently RowCounter only retrieves latest version for each row.
> Some applications would store multiple versions for the same row.
> RowCounter should accept a new parameter for the number of versions to return.
> Scan object would be configured with version parameter (for scan.maxVersions).
> Then the following API should be called:
> {code}
>   public KeyValue[] raw() {
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3488) Add CellCounter to count multiple versions of rows

2011-04-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3488:
--

Summary: Add CellCounter to count multiple versions of rows  (was: Allow 
RowCounter to retrieve multiple versions of rows)

> Add CellCounter to count multiple versions of rows
> --
>
> Key: HBASE-3488
> URL: https://issues.apache.org/jira/browse/HBASE-3488
> Project: HBase
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.90.0
>Reporter: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows-version2.patch, 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows.patch
>
>
> Currently RowCounter only retrieves latest version for each row.
> Some applications would store multiple versions for the same row.
> RowCounter should accept a new parameter for the number of versions to return.
> Scan object would be configured with version parameter (for scan.maxVersions).
> Then the following API should be called:
> {code}
>   public KeyValue[] raw() {
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3488) Allow RowCounter to retrieve multiple versions of rows

2011-04-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015544#comment-13015544
 ] 

Ted Yu commented on HBASE-3488:
---

I think version 2 looks good.
Thanks for the hard work Subbu.

> Allow RowCounter to retrieve multiple versions of rows
> --
>
> Key: HBASE-3488
> URL: https://issues.apache.org/jira/browse/HBASE-3488
> Project: HBase
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.90.0
>Reporter: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows-version2.patch, 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows.patch
>
>
> Currently RowCounter only retrieves latest version for each row.
> Some applications would store multiple versions for the same row.
> RowCounter should accept a new parameter for the number of versions to return.
> Scan object would be configured with version parameter (for scan.maxVersions).
> Then the following API should be called:
> {code}
>   public KeyValue[] raw() {
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function

2011-04-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3694.
--

   Resolution: Fixed
Fix Version/s: 0.92.0
 Hadoop Flags: [Reviewed]

Committed to TRUNK.  Thanks for the nice patch Liyin (and to all who reviewed).

> high multiput latency due to checking global mem store size in a synchronized 
> function
> --
>
> Key: HBASE-3694
> URL: https://issues.apache.org/jira/browse/HBASE-3694
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.92.0
>
> Attachments: 3694-cliffs-counter.txt, Hbase-3694[r1085306], 
> Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, 
> Hbase-3694[r1085508]_4.patch, Hbase-3694[r1085592]_7.patch, 
> Hbase-3694[r1085593]_5.patch, Hbase-3694[r1085593]_6.patch
>
>
> The problem is we found the multiput latency is very high.
> In our case, we have almost 22 Regions in each RS and there are no flush 
> happened during these puts.
> After investigation, we believe that the root cause is the function 
> getGlobalMemStoreSize, which is to check the high water mark of mem store. 
> This function takes almost 40% of total execution time of multiput when 
> instrumenting some metrics in the code.  
> The actual percentage may be more higher. The execution time is spent on 
> synchronize contention.
> One solution is to keep a static var in HRegion to keep the global MemStore 
> size instead of calculating them every time.
> Why using static variable?
> Since all the HRegion objects in the same JVM share the same memory heap, 
> they need to share fate as well.
> The static variable, globalMemStroeSize, naturally shows the total mem usage 
> in this shared memory heap for this JVM.
> If multiple RS need to run in the same JVM, they still need only one 
> globalMemStroeSize.
> If multiple RS run on different JVMs, everything is fine.
> After changing, in our cases, the avg multiput latency decrease from 60ms to 
> 10ms.
> I will submit a patch based on the current trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3488) Allow RowCounter to retrieve multiple versions of rows

2011-04-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015534#comment-13015534
 ] 

stack commented on HBASE-3488:
--

I'm going to apply this (will wait a little in case Ted wants to do a last 
review).

I will also add it to our 'driver' so it shows as one of the MR jobs hbase 
ships with; see src/main/java/org/apache/hadoop/hbase/mapreduce/Driver.java

Subbu, for future, its 2011 when it comes to copyright notices in src files 
(smile) but also, by convention we wrap lines at 80 characters.  I can fix this 
on commit, np, but just going forward.



> Allow RowCounter to retrieve multiple versions of rows
> --
>
> Key: HBASE-3488
> URL: https://issues.apache.org/jira/browse/HBASE-3488
> Project: HBase
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.90.0
>Reporter: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows-version2.patch, 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows.patch
>
>
> Currently RowCounter only retrieves latest version for each row.
> Some applications would store multiple versions for the same row.
> RowCounter should accept a new parameter for the number of versions to return.
> Scan object would be configured with version parameter (for scan.maxVersions).
> Then the following API should be called:
> {code}
>   public KeyValue[] raw() {
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3730) DEFAULT_VERSIONS should be 1

2011-04-04 Thread Joe Pallas (JIRA)
DEFAULT_VERSIONS should be 1


 Key: HBASE-3730
 URL: https://issues.apache.org/jira/browse/HBASE-3730
 Project: HBase
  Issue Type: Improvement
Reporter: Joe Pallas
Priority: Minor


The current DEFAULT_VERSIONS (in HColumnDescriptor) is 3, but there is no 
particular reason for this.  Many uses require only 1, and having a default 
that is different makes people confused (e.g., "Do I need multiple versions to 
support deletes properly?").

Reasonable values for the default are 1 and max int.  1 is the better choice.

Discussion on the mailing list suggests that the current value of 3 may have 
been derived from an example in the Bigtable paper.  The example does not 
suggest that there is anything special about 3, it's just an illustration.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3488) Allow RowCounter to retrieve multiple versions of rows

2011-04-04 Thread Subbu M Iyer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015525#comment-13015525
 ] 

Subbu M Iyer commented on HBASE-3488:
-

1. To run the cellcounter with default row/cf/qualifier separator ':' for table 
'planet'

java -cp 
./conf:./hbase-0.91.0-SNAPSHOT.jar:./lib/hadoop-0.20.1-core.jar:./lib/commons-logging-1.1.1.jar:./lib/commons-cli-1.2.jar:./lib/zookeeper-3.3.2.jar:./lib/log4j-1.2.16.jar:./lib/commons-httpclient-3.1.jar
 org.apache.hadoop.hbase.mapreduce.CellCounter planet 
/work/HBaseExport/cellcounter30


2. To run the cellcounter with row/cf/qualifier separator '%' for table 'planet'

java -cp 
./conf:./hbase-0.91.0-SNAPSHOT.jar:./lib/hadoop-0.20.1-core.jar:./lib/commons-logging-1.1.1.jar:./lib/commons-cli-1.2.jar:./lib/zookeeper-3.3.2.jar:./lib/log4j-1.2.16.jar:./lib/commons-httpclient-3.1.jar
 org.apache.hadoop.hbase.mapreduce.CellCounter planet 
/work/HBaseExport/cellcounter31 % 

3. To run the cellcounter with default row/cf/qualifier separator '%' for table 
'planet' with a prefix filter row_55

java -cp 
./conf:./hbase-0.91.0-SNAPSHOT.jar:./lib/hadoop-0.20.1-core.jar:./lib/commons-logging-1.1.1.jar:./lib/commons-cli-1.2.jar:./lib/zookeeper-3.3.2.jar:./lib/log4j-1.2.16.jar:./lib/commons-httpclient-3.1.jar
 org.apache.hadoop.hbase.mapreduce.CellCounter planet 
/work/HBaseExport/cellcounter31 % row_55 

4. To run the cellcounter with row/cf/qualifier separator '%' for table 
'planet' with a regex filter ^11

java -cp 
./conf:./hbase-0.91.0-SNAPSHOT.jar:./lib/hadoop-0.20.1-core.jar:./lib/commons-logging-1.1.1.jar:./lib/commons-cli-1.2.jar:./lib/zookeeper-3.3.2.jar:./lib/log4j-1.2.16.jar:./lib/commons-httpclient-3.1.jar
 org.apache.hadoop.hbase.mapreduce.CellCounter planet 
/work/HBaseExport/cellcounter31 % ^11

> Allow RowCounter to retrieve multiple versions of rows
> --
>
> Key: HBASE-3488
> URL: https://issues.apache.org/jira/browse/HBASE-3488
> Project: HBase
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.90.0
>Reporter: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows-version2.patch, 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows.patch
>
>
> Currently RowCounter only retrieves latest version for each row.
> Some applications would store multiple versions for the same row.
> RowCounter should accept a new parameter for the number of versions to return.
> Scan object would be configured with version parameter (for scan.maxVersions).
> Then the following API should be called:
> {code}
>   public KeyValue[] raw() {
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3728) NPE in HTablePool.closeTablePool

2011-04-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3728.
--

   Resolution: Fixed
Fix Version/s: 0.92.0
 Hadoop Flags: [Reviewed]

Committed. Thanks for the patch Ted.

> NPE in HTablePool.closeTablePool
> 
>
> Key: HBASE-3728
> URL: https://issues.apache.org/jira/browse/HBASE-3728
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.1
>Reporter: Andruschuk Borislav
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: 3728.txt
>
>
> When I use HTablePool and try to close it on application shutdown I've got 
> NPE calling closeTablePool method because I didn't borrow any tables with the 
> given name. Could you please add a null check for queue in closeTablePool or 
> add ability to get all table names used in a pool or just add a destroy 
> method to close all existed table in a pool.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1861) Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)

2011-04-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015517#comment-13015517
 ] 

Todd Lipcon commented on HBASE-1861:


Hi Nichole. I think it's probably best to open a new JIRA for the bugfix since 
this one has been closed for a while.

> Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)
> -
>
> Key: HBASE-1861
> URL: https://issues.apache.org/jira/browse/HBASE-1861
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 0.20.0
>Reporter: Jonathan Gray
>Assignee: Nicolas Spiegelberg
> Fix For: 0.92.0
>
> Attachments: HBASE1861-incomplete.patch
>
>
> Add multi-family support to bulk upload tools from HBASE-48.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-3587) Eliminate use of ReadWriteLock in RegionObserver coprocessor invocation

2011-04-04 Thread Gary Helmling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling reassigned HBASE-3587:


Assignee: Gary Helmling

> Eliminate use of ReadWriteLock in RegionObserver coprocessor invocation
> ---
>
> Key: HBASE-3587
> URL: https://issues.apache.org/jira/browse/HBASE-3587
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>
> Follow-up to a discussion on the dev list: 
> http://search-hadoop.com/m/jOovV1uAJBP
> The CoprocessorHost ReentrantReadWriteLock is imposing some overhead on data 
> read/write operations, even when no coprocessors are loaded.  Currently 
> execution of RegionCoprocessorHost pre/postXXX() methods are guarded by 
> acquiring the coprocessor read lock.  This is used to prevent coprocessor 
> registration from modifying the coprocessor collection while upcall hooks are 
> in progress.
> On further discussion, and looking at the locking in HRegion, it should be 
> sufficient to just use a CopyOnWriteArrayList for the coprocessor collection. 
>  We can then remove the coprocessor lock and eliminate the associated 
> overhead without having to special case the "no loaded coprocessors" 
> condition.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3488) Allow RowCounter to retrieve multiple versions of rows

2011-04-04 Thread Subbu M Iyer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015491#comment-13015491
 ] 

Subbu M Iyer commented on HBASE-3488:
-

Submitted version 2 of the patch.

1. Fixed the comments per Stack's suggestion.
2. Added a Regex/Prefix based row filter to limit the counter to work on a 
smaller subset of rows
3. Changed the Bytes.toString to Bytes.toStringBinary.
4. Parameterized the row/cf/qualifier separator string.  

> Allow RowCounter to retrieve multiple versions of rows
> --
>
> Key: HBASE-3488
> URL: https://issues.apache.org/jira/browse/HBASE-3488
> Project: HBase
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.90.0
>Reporter: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows-version2.patch, 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows.patch
>
>
> Currently RowCounter only retrieves latest version for each row.
> Some applications would store multiple versions for the same row.
> RowCounter should accept a new parameter for the number of versions to return.
> Scan object would be configured with version parameter (for scan.maxVersions).
> Then the following API should be called:
> {code}
>   public KeyValue[] raw() {
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3488) Allow RowCounter to retrieve multiple versions of rows

2011-04-04 Thread Subbu M Iyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subbu M Iyer updated HBASE-3488:


Attachment: 
3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows-version2.patch

> Allow RowCounter to retrieve multiple versions of rows
> --
>
> Key: HBASE-3488
> URL: https://issues.apache.org/jira/browse/HBASE-3488
> Project: HBase
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.90.0
>Reporter: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows-version2.patch, 
> 3488-Allow_RowCounter_to_retrieve_multiple_versions_of_rows.patch
>
>
> Currently RowCounter only retrieves latest version for each row.
> Some applications would store multiple versions for the same row.
> RowCounter should accept a new parameter for the number of versions to return.
> Scan object would be configured with version parameter (for scan.maxVersions).
> Then the following API should be called:
> {code}
>   public KeyValue[] raw() {
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3729) Get cells via shell with a time range predicate

2011-04-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3729:
--

Attachment: 3729.txt

Add support for TIMERANGE

> Get cells via shell with a time range predicate
> ---
>
> Key: HBASE-3729
> URL: https://issues.apache.org/jira/browse/HBASE-3729
> Project: HBase
>  Issue Type: New Feature
>  Components: shell
>Reporter: Eric Charles
>Assignee: Ted Yu
> Attachments: 3729.txt
>
>
> HBase shell allows to specify a timestamp to get a value
> - get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
> If you don't give the exact timestamp, you get nothing... so it's difficult 
> to get the cell previous versions.
> It would be fine to have a "time range" predicate based get.
> The shell syntax could be (depending on technical feasibility)
> - get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => (start_timestamp, 
> end_timestamp)}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3729) Get cells via shell with a time range predicate

2011-04-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015485#comment-13015485
 ] 

Ted Yu commented on HBASE-3729:
---

Here is sample output:
{code}
hbase(main):001:0> get 'SECOND_2-1299913025789', 
'F871915C83D0D046642369B88904F8CD', { TIMERANGE => [0,1] }
COLUMN   CELL   
 
0 row(s) in 0.3130 seconds
get 'SECOND_2-1299913025789', 'F871915C83D0D046642369B88904F8CD', { TIMERANGE 
=> [0,1] }
COLUMN   CELL   
 
 v:_ timestamp=1299913025789, value=\x00\x069\xFC 
F871915C83D0D046642369B88904F8CD\x1670gCcSu8wbc
 
ZOOpN65azZw\x00\x00\x00\x00\x01"\xDD\x8E4\x00\x00\x00\x00\x00   

1 row(s) in 0.0470 seconds
{code}

> Get cells via shell with a time range predicate
> ---
>
> Key: HBASE-3729
> URL: https://issues.apache.org/jira/browse/HBASE-3729
> Project: HBase
>  Issue Type: New Feature
>  Components: shell
>Reporter: Eric Charles
>Assignee: Ted Yu
>
> HBase shell allows to specify a timestamp to get a value
> - get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
> If you don't give the exact timestamp, you get nothing... so it's difficult 
> to get the cell previous versions.
> It would be fine to have a "time range" predicate based get.
> The shell syntax could be (depending on technical feasibility)
> - get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => (start_timestamp, 
> end_timestamp)}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-3729) Get cells via shell with a time range predicate

2011-04-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-3729:
-

Assignee: Ted Yu

> Get cells via shell with a time range predicate
> ---
>
> Key: HBASE-3729
> URL: https://issues.apache.org/jira/browse/HBASE-3729
> Project: HBase
>  Issue Type: New Feature
>  Components: shell
>Reporter: Eric Charles
>Assignee: Ted Yu
>
> HBase shell allows to specify a timestamp to get a value
> - get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
> If you don't give the exact timestamp, you get nothing... so it's difficult 
> to get the cell previous versions.
> It would be fine to have a "time range" predicate based get.
> The shell syntax could be (depending on technical feasibility)
> - get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => (start_timestamp, 
> end_timestamp)}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3729) Get cells via shell with a time range predicate

2011-04-04 Thread Eric Charles (JIRA)
Get cells via shell with a time range predicate
---

 Key: HBASE-3729
 URL: https://issues.apache.org/jira/browse/HBASE-3729
 Project: HBase
  Issue Type: New Feature
  Components: shell
Reporter: Eric Charles


HBase shell allows to specify a timestamp to get a value
- get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}

If you don't give the exact timestamp, you get nothing... so it's difficult to 
get the cell previous versions.

It would be fine to have a "time range" predicate based get.
The shell syntax could be (depending on technical feasibility)
- get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => (start_timestamp, 
end_timestamp)}


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3728) NPE in HTablePool.closeTablePool

2011-04-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3728:
--

Attachment: 3728.txt

Added null check for queue

> NPE in HTablePool.closeTablePool
> 
>
> Key: HBASE-3728
> URL: https://issues.apache.org/jira/browse/HBASE-3728
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.1
>Reporter: Andruschuk Borislav
>Assignee: Ted Yu
>Priority: Minor
> Attachments: 3728.txt
>
>
> When I use HTablePool and try to close it on application shutdown I've got 
> NPE calling closeTablePool method because I didn't borrow any tables with the 
> given name. Could you please add a null check for queue in closeTablePool or 
> add ability to get all table names used in a pool or just add a destroy 
> method to close all existed table in a pool.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-3728) NPE in HTablePool.closeTablePool

2011-04-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-3728:
-

Assignee: Ted Yu

> NPE in HTablePool.closeTablePool
> 
>
> Key: HBASE-3728
> URL: https://issues.apache.org/jira/browse/HBASE-3728
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.1
>Reporter: Andruschuk Borislav
>Assignee: Ted Yu
>Priority: Minor
> Attachments: 3728.txt
>
>
> When I use HTablePool and try to close it on application shutdown I've got 
> NPE calling closeTablePool method because I didn't borrow any tables with the 
> given name. Could you please add a null check for queue in closeTablePool or 
> add ability to get all table names used in a pool or just add a destroy 
> method to close all existed table in a pool.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira