[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205895#comment-13205895
 ] 

stack commented on HBASE-5313:
--

How do I read the above?  Its same amount of kvs in each of the files?

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205909#comment-13205909
 ] 

Zhihong Yu commented on HBASE-5313:
---

@Yongqiang:
Thanks for sharing the results.
Can you also list the time it took writing the HFile for each of the three 
schemes ?

If you can characterize the row keys and values, that would be nice too.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5386) [usability] Soft limit for eager region splitting of young tables

2012-02-10 Thread Jean-Daniel Cryans (Created) (JIRA)
[usability] Soft limit for eager region splitting of young tables
-

 Key: HBASE-5386
 URL: https://issues.apache.org/jira/browse/HBASE-5386
 Project: HBase
  Issue Type: New Feature
Reporter: Jean-Daniel Cryans
 Fix For: 0.94.0


Coming out of HBASE-2375, we need a new functionality much like hypertable's 
where we would have a lower split size for new tables and it would grow up to a 
certain hard limit. This helps usability in different ways:

 - With that we can set the default split size much higher and users will still 
have good data distribution
 - No more messing with force splits
 - Not mandatory to pre-split your table in order to get good out of the box 
performance

The way Doug Judd described how it works for them, they start with a low value 
and then double it every time it splits. For example if we started with a soft 
size of 32MB and a hard size of 2GB, it wouldn't be until you have 64 regions 
that you hit the ceiling.

On the implementation side, we could add a new qualifier in .META. that has 
that soft limit. When that field doesn't exist, this feature doesn't kick in. 
It would be written by the region servers after a split and by the master when 
the table is created with 1 region.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5368) Move PrefixSplitKeyPolicy out of the src/test into src, so it is accessible in HBase installs

2012-02-10 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5368:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for the review.

> Move PrefixSplitKeyPolicy out of the src/test into src, so it is accessible 
> in HBase installs
> -
>
> Key: HBASE-5368
> URL: https://issues.apache.org/jira/browse/HBASE-5368
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 5368.txt
>
>
> Very simple change to make PrefixSplitKeyPolicy accessible in HBase installs 
> (user still needs to setup the table(s) accordingly).
> Right now it is in src/test/org.apache.hadoop.hbase.regionserver, I propose 
> moving it to src/org.apache.hadoop.hbase.regionserver (alongside 
> ConstantSizeRegionSplitPolicy), and maybe renaming it too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205921#comment-13205921
 ] 

dhruba borthakur commented on HBASE-5313:
-

The same amount of kvs in each file. total of 3 million kvs for this 
experiment. The blocksize is 16 KB.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5363) Automatically run rat check on mvn release builds

2012-02-10 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205924#comment-13205924
 ] 

Jonathan Hsieh commented on HBASE-5363:
---

JonathanHsieh

Sent from my iPhone




> Automatically run rat check on mvn release builds
> -
>
> Key: HBASE-5363
> URL: https://issues.apache.org/jira/browse/HBASE-5363
> Project: HBase
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.90.5, 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-5363-0.90.patch, hbase-5363.2.patch, 
> hbase-5363.patch
>
>
> Some of the recent hbase release failed rat checks (mvn rat:check).  We 
> should add checks likely in the mvn package phase so that this becomes a 
> non-issue in the future.
> Here's an example from Whirr:
> https://github.com/apache/whirr/blob/trunk/pom.xml line 388 for an example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205931#comment-13205931
 ] 

Jesse Yates commented on HBASE-5313:


However, those compression numbers are pretty nice. I worry a little bit about 
having now an hfileV3, so soon on the heels of the last, leading to a 
proliferation of versions. My other concern is that the columnar storage 
doesn't make sense for all cases - Dremel is for a specific use case.

That being said, I would love to see the ability to do Dremel in HBase. How 
about along with a new version/columnar data support comes the ability to 
select storage files on a per-table basis? That would enable some tables to be 
optimized for certain use cases, other tables for others, rather than having to 
use completely different clusters (continuing the multi-tenancy story).

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup

2012-02-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205936#comment-13205936
 ] 

stack commented on HBASE-5209:
--

bq. I have a multi-master HBase set up, and I'm trying to programmatically 
determine which of the masters is currently active. But the API does not allow 
me to do this. There is a getMaster() method in the HConnection class, but it 
returns an HMasterInterface, whose methods do not allow me to find out which 
master won the last race. The API should have a getActiveMasterHostname() or 
something to that effect.

If you do a getMaster, I'd think that you should get the active master, only, 
in HConnection.  Are you saying that it'll give you an Interface on the 
non-active Master?  Thats broke I'd say.  For the name of the Master, yeah, 
getServerName should be part of HMasterInterface.

On the patch:

{code}
+  private boolean isMasterRunning, isActiveMaster;
{code}

The above are the names of methods, not data members.  Should be masterRunning 
and activeMaster.

Whats going on here:

{code}
+this.master = master;
+this.isMasterRunning = isMasterRunning;
+this.isActiveMaster = isActiveMaster;
{code}

So, we could be reporting a master that is not running and not the active 
master?  Why would we even care about it in that case?

getMasterInfo as method name returning master ServerName seems off.  Is this 
the 'active' master or non-running master?

I think we need to be clear that ClusterStatus reports on the active master 
only (unless you want to add list of all running master which I don't think yet 
possible since they do not register until they assume mastership --- hmmm... 
looking further down in your patch, it looks like you are adding this facility 
to zk).

Is this of any use?

+  public boolean isMasterRunning() {

I mean, if master is not running, can you even get a ClusterStatus from the 
cluster?

Ditto for +  public boolean isActiveMaster() {

Won't this just be true anytime you get a ClusterStatue?

You up the ClusterStatue version number but you don't act on it (what if you 
are asked deserialize an earlier version of ClusterStatus?)

On MasterInterface, I'd suggest don't bother upping the version number -- just 
add the new method on the end.  Thats usually ok.  Also, isActiveMaster of any 
use even?  (You could ask zk directly?  Have hbaseadmin go ask zk rather than 
go via the master at all?  Isn't the master znode name its ServerName?  Isn't 
that what you need?)

I like your registering backup masters... and adding the list to the zk report.



> HConnection/HMasterInterface should allow for way to get hostname of 
> currently active master in multi-master HBase setup
> 
>
> Key: HBASE-5209
> URL: https://issues.apache.org/jira/browse/HBASE-5209
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.94.0, 0.90.5, 0.92.0
>Reporter: Aditya Acharya
>Assignee: David S. Wang
> Fix For: 0.94.0, 0.90.7, 0.92.1
>
> Attachments: HBASE-5209-v0.diff, HBASE-5209-v1.diff
>
>
> I have a multi-master HBase set up, and I'm trying to programmatically 
> determine which of the masters is currently active. But the API does not 
> allow me to do this. There is a getMaster() method in the HConnection class, 
> but it returns an HMasterInterface, whose methods do not allow me to find out 
> which master won the last race. The API should have a 
> getActiveMasterHostname() or something to that effect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5363) Automatically run rat check on mvn release builds

2012-02-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205937#comment-13205937
 ] 

stack commented on HBASE-5363:
--

Try it now boss

> Automatically run rat check on mvn release builds
> -
>
> Key: HBASE-5363
> URL: https://issues.apache.org/jira/browse/HBASE-5363
> Project: HBase
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.90.5, 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-5363-0.90.patch, hbase-5363.2.patch, 
> hbase-5363.patch
>
>
> Some of the recent hbase release failed rat checks (mvn rat:check).  We 
> should add checks likely in the mvn package phase so that this becomes a 
> non-issue in the future.
> Here's an example from Whirr:
> https://github.com/apache/whirr/blob/trunk/pom.xml line 388 for an example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5327) Print a message when an invalid hbase.rootdir is passed

2012-02-10 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205939#comment-13205939
 ] 

Jonathan Hsieh commented on HBASE-5327:
---

@Jimmy.  Thanks for looking into this.  I'm +1 on v2, plan on committing 
tomorrow unless I hear otherwise.

> Print a message when an invalid hbase.rootdir is passed
> ---
>
> Key: HBASE-5327
> URL: https://issues.apache.org/jira/browse/HBASE-5327
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.5
>Reporter: Jean-Daniel Cryans
>Assignee: Jimmy Xiang
> Fix For: 0.94.0, 0.90.7, 0.92.1
>
> Attachments: hbase-5327.txt, hbase-5327_v2.txt
>
>
> As seen on the mailing list: 
> http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/24124
> If hbase.rootdir doesn't specify a folder on hdfs we crash while opening a 
> path to .oldlogs:
> {noformat}
> 2012-02-02 23:07:26,292 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unhandled exception. Starting shutdown.
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: hdfs://sv4r11s38:9100.oldlogs
> at org.apache.hadoop.fs.Path.initialize(Path.java:148)
> at org.apache.hadoop.fs.Path.(Path.java:71)
> at org.apache.hadoop.fs.Path.(Path.java:50)
> at 
> org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:112)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
> hdfs://sv4r11s38:9100.oldlogs
> at java.net.URI.checkPath(URI.java:1787)
> at java.net.URI.(URI.java:735)
> at org.apache.hadoop.fs.Path.initialize(Path.java:145)
> ... 6 more
> {noformat}
> It could also crash anywhere else, this just happens to be the first place we 
> use hbase.rootdir. We need to verify that it's an actual folder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5363) Automatically run rat check on mvn release builds

2012-02-10 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205941#comment-13205941
 ] 

Jonathan Hsieh commented on HBASE-5363:
---

Thanks!  Works.  Updated wiki. 

> Automatically run rat check on mvn release builds
> -
>
> Key: HBASE-5363
> URL: https://issues.apache.org/jira/browse/HBASE-5363
> Project: HBase
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.90.5, 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-5363-0.90.patch, hbase-5363.2.patch, 
> hbase-5363.patch
>
>
> Some of the recent hbase release failed rat checks (mvn rat:check).  We 
> should add checks likely in the mvn package phase so that this becomes a 
> non-issue in the future.
> Here's an example from Whirr:
> https://github.com/apache/whirr/blob/trunk/pom.xml line 388 for an example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5377) Fix licenses on the 0.90 branch.

2012-02-10 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205943#comment-13205943
 ] 

Jonathan Hsieh commented on HBASE-5377:
---

I actually backported HBASE-4647, wasn't sure about those edits, and then added 
a few more.  I'll remove those lines, test, and if good commit.


> Fix licenses on the 0.90 branch.
> 
>
> Key: HBASE-5377
> URL: https://issues.apache.org/jira/browse/HBASE-5377
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-5377.patch
>
>
> There are a handful of empty files and several files missing apache licenses 
> on the 0.90 branch.  This patch will fixes all of them and in conjunction 
> with HBASE-5363 will allow it to pass RAT tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5377) Fix licenses on the 0.90 branch.

2012-02-10 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5377:
--

Attachment: hbase-5377.v2.patch

> Fix licenses on the 0.90 branch.
> 
>
> Key: HBASE-5377
> URL: https://issues.apache.org/jira/browse/HBASE-5377
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-5377.patch, hbase-5377.v2.patch
>
>
> There are a handful of empty files and several files missing apache licenses 
> on the 0.90 branch.  This patch will fixes all of them and in conjunction 
> with HBASE-5363 will allow it to pass RAT tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5377) Fix licenses on the 0.90 branch.

2012-02-10 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205962#comment-13205962
 ] 

Jonathan Hsieh commented on HBASE-5377:
---

Things check out after edits.  Thanks for the review stack.  Committed.

> Fix licenses on the 0.90 branch.
> 
>
> Key: HBASE-5377
> URL: https://issues.apache.org/jira/browse/HBASE-5377
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.90.6
>
> Attachments: hbase-5377.patch, hbase-5377.v2.patch
>
>
> There are a handful of empty files and several files missing apache licenses 
> on the 0.90 branch.  This patch will fixes all of them and in conjunction 
> with HBASE-5363 will allow it to pass RAT tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5377) Fix licenses on the 0.90 branch.

2012-02-10 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5377:
--

   Resolution: Fixed
Fix Version/s: 0.90.6
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

> Fix licenses on the 0.90 branch.
> 
>
> Key: HBASE-5377
> URL: https://issues.apache.org/jira/browse/HBASE-5377
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.90.6
>
> Attachments: hbase-5377.patch, hbase-5377.v2.patch
>
>
> There are a handful of empty files and several files missing apache licenses 
> on the 0.90 branch.  This patch will fixes all of them and in conjunction 
> with HBASE-5363 will allow it to pass RAT tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5363) Automatically run rat check on mvn release builds

2012-02-10 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5363:
--

Fix Version/s: 0.92.1
   0.90.6
   0.94.0

> Automatically run rat check on mvn release builds
> -
>
> Key: HBASE-5363
> URL: https://issues.apache.org/jira/browse/HBASE-5363
> Project: HBase
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.90.5, 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.94.0, 0.90.6, 0.92.1
>
> Attachments: hbase-5363-0.90.patch, hbase-5363.2.patch, 
> hbase-5363.patch
>
>
> Some of the recent hbase release failed rat checks (mvn rat:check).  We 
> should add checks likely in the mvn package phase so that this becomes a 
> non-issue in the future.
> Here's an example from Whirr:
> https://github.com/apache/whirr/blob/trunk/pom.xml line 388 for an example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205978#comment-13205978
 ] 

Zhihong Yu commented on HBASE-5313:
---

There're only two weeks before we branch 0.94
I think HFile v3 would be in 0.96, containing this feature and HBASE-5347.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup

2012-02-10 Thread David S. Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205979#comment-13205979
 ] 

David S. Wang commented on HBASE-5209:
--

I'm going to take out isActiveMaster and isMasterRunning, which I think will 
address the bulk of the comments.  Agree that we should be able to assume that 
if master is not null, that it is the active one.

When I have a new patch ready, I'll use reviewboard.

Thanks for the helpful comments; I appreciate them and ask for your 
understanding as this is my first upstream commit.

> HConnection/HMasterInterface should allow for way to get hostname of 
> currently active master in multi-master HBase setup
> 
>
> Key: HBASE-5209
> URL: https://issues.apache.org/jira/browse/HBASE-5209
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.94.0, 0.90.5, 0.92.0
>Reporter: Aditya Acharya
>Assignee: David S. Wang
> Fix For: 0.94.0, 0.90.7, 0.92.1
>
> Attachments: HBASE-5209-v0.diff, HBASE-5209-v1.diff
>
>
> I have a multi-master HBase set up, and I'm trying to programmatically 
> determine which of the masters is currently active. But the API does not 
> allow me to do this. There is a getMaster() method in the HConnection class, 
> but it returns an HMasterInterface, whose methods do not allow me to find out 
> which master won the last race. The API should have a 
> getActiveMasterHostname() or something to that effect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-10 Thread Gregory Chanan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205982#comment-13205982
 ] 

Gregory Chanan commented on HBASE-5317:
---

Ok, did some more digging.

In 0.23 the MRAppMaster tries to load up the OutputFileFormat, so we have to 
load that, which we didn't have to do prior to 0.23.
It looks like:  TableMapReduceUtil.addDependencyJars(job)
is supposed to do this, but when I run it, no jars can be found for the classes 
associated with the job.

By default, it looks like the classpath in the test is:
{basedir}/target/test-classes:{basedir}/target/classes:...
but the relevant jars are in:
target/
target/hbase-{VERSION}/hbase-{VERSION}/

By the way, when I add hbase jars manually to "tmpjars" [which is what 
addDependencyJars does], the test passes.

So it seems like the solution is either:
1) Modify the classpath to include those jars 
2) Build a jar myself of the relevant classes

I'm not sure what the correct answer is; please advise.

> Fix TestHFileOutputFormat to work against hadoop 0.23
> -
>
> Key: HBASE-5317
> URL: https://issues.apache.org/jira/browse/HBASE-5317
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.94.0, 0.92.0
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch
>
>
> Running
> mvn -Dhadoop.profile=23 test -P localTests 
> -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
> yields this on 0.92:
> Failed tests:   
> testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
>  HFile for column family info-A not found
> Tests in error: 
>   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
> /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
>  (Is a directory)
>   
> testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
>  TestTable
>   
> testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
>  TestTable
> It looks like on trunk, this also results in an error:
>   
> testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
>  TestTable
> I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
> haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-10 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205983#comment-13205983
 ] 

Zhihong Yu commented on HBASE-5317:
---

@Gregory:
I think we should try approach #1 first.

Thanks for your persistence.

> Fix TestHFileOutputFormat to work against hadoop 0.23
> -
>
> Key: HBASE-5317
> URL: https://issues.apache.org/jira/browse/HBASE-5317
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.94.0, 0.92.0
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch
>
>
> Running
> mvn -Dhadoop.profile=23 test -P localTests 
> -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
> yields this on 0.92:
> Failed tests:   
> testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
>  HFile for column family info-A not found
> Tests in error: 
>   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
> /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
>  (Is a directory)
>   
> testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
>  TestTable
>   
> testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
>  TestTable
> It looks like on trunk, this also results in an error:
>   
> testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
>  TestTable
> I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
> haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread Jerry Chen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205987#comment-13205987
 ] 

Jerry Chen commented on HBASE-5313:
---

Yongqiang, what is the delta encoding algorithm did you use? The default 
algorithm only do a simple encoding. Do we have results using prefix with fast 
diff algorithm for the current hfile v2? 

I suppose this is only for the on-disk representation. How do we plan to 
represent it in block cache?  

Sent from my iPhone




> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Mikhail Bautin (Created) (JIRA)
Reuse compression streams in HFileBlock.Writer
--

 Key: HBASE-5387
 URL: https://issues.apache.org/jira/browse/HBASE-5387
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin


We need to to reuse compression streams in HFileBlock.Writer instead of 
allocating them every time. The motivation is that when using Java's built-in 
implementation of Gzip, we allocate a new GZIPOutputStream object and an 
associated native data structure any time. This is one suspected cause of 
recent TestHFileBlock failures on Hadoop QA: 
https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Mikhail Bautin (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5387:
--

Attachment: Fix-deflater-leak-2012-02-10_18_48_45.patch

> Reuse compression streams in HFileBlock.Writer
> --
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
>  Issue Type: Bug
>Reporter: Mikhail Bautin
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of 
> allocating them every time. The motivation is that when using Java's built-in 
> implementation of Gzip, we allocate a new GZIPOutputStream object and an 
> associated native data structure any time. This is one suspected cause of 
> recent TestHFileBlock failures on Hadoop QA: 
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5371) Introduce AccessControllerProtocol.checkPermissions(Permission[] permissons) API

2012-02-10 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205998#comment-13205998
 ] 

jirapos...@reviews.apache.org commented on HBASE-5371:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3829/
---

(Updated 2012-02-11 02:58:35.137932)


Review request for hbase.


Changes
---

Added more tests! 


Summary
---

We need to introduce something like 
AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so 
that clients can check access rights before carrying out the operations. We 
need this kind of operation for HCATALOG-245, which introduces authorization 
providers for hbase over hcat. We cannot use getUserPermissions() since it 
requires ADMIN permissions on the global/table level.


This addresses bug HBASE-5371.
https://issues.apache.org/jira/browse/HBASE-5371


Diffs (updated)
-

  
security/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java
 5091b7d 
  
security/src/main/java/org/apache/hadoop/hbase/security/access/AccessControllerProtocol.java
 5fa2edb 
  
security/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java
 f864373 

Diff: https://reviews.apache.org/r/3829/diff


Testing
---


Thanks,

enis



> Introduce AccessControllerProtocol.checkPermissions(Permission[] permissons) 
> API
> 
>
> Key: HBASE-5371
> URL: https://issues.apache.org/jira/browse/HBASE-5371
> Project: HBase
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 0.94.0, 0.92.1
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>
> We need to introduce something like 
> AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so 
> that clients can check access rights before carrying out the operations. We 
> need this kind of operation for HCATALOG-245, which introduces authorization 
> providers for hbase over hcat. We cannot use getUserPermissions() since it 
> requires ADMIN permissions on the global/table level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5387:
--

Status: Patch Available  (was: Open)

> Reuse compression streams in HFileBlock.Writer
> --
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
>  Issue Type: Bug
>Reporter: Mikhail Bautin
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of 
> allocating them every time. The motivation is that when using Java's built-in 
> implementation of Gzip, we allocate a new GZIPOutputStream object and an 
> associated native data structure any time. This is one suspected cause of 
> recent TestHFileBlock failures on Hadoop QA: 
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Zhihong Yu (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-5387:
-

Assignee: Mikhail Bautin

> Reuse compression streams in HFileBlock.Writer
> --
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
>  Issue Type: Bug
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of 
> allocating them every time. The motivation is that when using Java's built-in 
> implementation of Gzip, we allocate a new GZIPOutputStream object and an 
> associated native data structure any time. This is one suspected cause of 
> recent TestHFileBlock failures on Hadoop QA: 
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Mikhail Bautin (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5387:
--

Description: 
We need to to reuse compression streams in HFileBlock.Writer instead of 
allocating them every time. The motivation is that when using Java's built-in 
implementation of Gzip, we allocate a new GZIPOutputStream object and an 
associated native data structure every time we create a compression stream. The 
native data structure is only deallocated in the finalizer. This is one 
suspected cause of recent TestHFileBlock failures on Hadoop QA: 
https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.


  was:
We need to to reuse compression streams in HFileBlock.Writer instead of 
allocating them every time. The motivation is that when using Java's built-in 
implementation of Gzip, we allocate a new GZIPOutputStream object and an 
associated native data structure any time. This is one suspected cause of 
recent TestHFileBlock failures on Hadoop QA: 
https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.



> Reuse compression streams in HFileBlock.Writer
> --
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
>  Issue Type: Bug
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of 
> allocating them every time. The motivation is that when using Java's built-in 
> implementation of Gzip, we allocate a new GZIPOutputStream object and an 
> associated native data structure every time we create a compression stream. 
> The native data structure is only deallocated in the finalizer. This is one 
> suspected cause of recent TestHFileBlock failures on Hadoop QA: 
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206006#comment-13206006
 ] 

Hadoop QA commented on HBASE-5387:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12514195/Fix-deflater-leak-2012-02-10_18_48_45.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -136 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 157 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/943//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/943//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/943//console

This message is automatically generated.

> Reuse compression streams in HFileBlock.Writer
> --
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
>  Issue Type: Bug
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of 
> allocating them every time. The motivation is that when using Java's built-in 
> implementation of Gzip, we allocate a new GZIPOutputStream object and an 
> associated native data structure every time we create a compression stream. 
> The native data structure is only deallocated in the finalizer. This is one 
> suspected cause of recent TestHFileBlock failures on Hadoop QA: 
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206009#comment-13206009
 ] 

Zhihong Yu commented on HBASE-5387:
---

Looks like TestHFileBlock passes this time. 

> Reuse compression streams in HFileBlock.Writer
> --
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
>  Issue Type: Bug
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of 
> allocating them every time. The motivation is that when using Java's built-in 
> implementation of Gzip, we allocate a new GZIPOutputStream object and an 
> associated native data structure every time we create a compression stream. 
> The native data structure is only deallocated in the finalizer. This is one 
> suspected cause of recent TestHFileBlock failures on Hadoop QA: 
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206012#comment-13206012
 ] 

Mikhail Bautin commented on HBASE-5387:
---

@Ted: I think this addresses the root cause of TestHFileBlock and 
TestForceCacheImportantBlocks failures. As you suspected, Hadoop QA was 
pointing to a real bug in HBase. However, I think we have had this issue for a 
while (even in HFile v1), and it just got exposed as I increased the volume of 
IO happening within a single unit test. I will add a ulimit setting to our 
internal test runs so that we catch memory leaks like this in the future.


> Reuse compression streams in HFileBlock.Writer
> --
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
>  Issue Type: Bug
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of 
> allocating them every time. The motivation is that when using Java's built-in 
> implementation of Gzip, we allocate a new GZIPOutputStream object and an 
> associated native data structure every time we create a compression stream. 
> The native data structure is only deallocated in the finalizer. This is one 
> suspected cause of recent TestHFileBlock failures on Hadoop QA: 
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206017#comment-13206017
 ] 

Lars Hofhansl commented on HBASE-5387:
--

Patch looks good to me.
Can you mark the overriden methods in ResetableGZIPOutputStream, 
ReusableGzipOutputStream, and ReusableStreamGzipCodec with @Override?

Also, I would probably move the static code to get the GZIP_HEADER into 
ResetableGZIPOutputStream, and put a few more comments around it (took me a bit 
figure out what this was doing).


> Reuse compression streams in HFileBlock.Writer
> --
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
>  Issue Type: Bug
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of 
> allocating them every time. The motivation is that when using Java's built-in 
> implementation of Gzip, we allocate a new GZIPOutputStream object and an 
> associated native data structure every time we create a compression stream. 
> The native data structure is only deallocated in the finalizer. This is one 
> suspected cause of recent TestHFileBlock failures on Hadoop QA: 
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206018#comment-13206018
 ] 

Lars Hofhansl commented on HBASE-5313:
--

I agree with Ted, this is 0.96 material.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206020#comment-13206020
 ] 

Zhihong Yu commented on HBASE-5387:
---

I ran TestSplitTransactionOnCluster twice on MacBook and it passed.
I used script to check for hung test and found none.

I think the patch, after slight revision, should be good to go.

Thanks for the quick turnaround, Mikhail.

I think we should try to push ReusableStreamGzipCodec upstream to Hadoop.

> Reuse compression streams in HFileBlock.Writer
> --
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
>  Issue Type: Bug
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of 
> allocating them every time. The motivation is that when using Java's built-in 
> implementation of Gzip, we allocate a new GZIPOutputStream object and an 
> associated native data structure every time we create a compression stream. 
> The native data structure is only deallocated in the finalizer. This is one 
> suspected cause of recent TestHFileBlock failures on Hadoop QA: 
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5347) GC free memory management in Level-1 Block Cache

2012-02-10 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206021#comment-13206021
 ] 

Lars Hofhansl commented on HBASE-5347:
--

Nice patch.
How does this interact with the off-heap cache effort?

> GC free memory management in Level-1 Block Cache
> 
>
> Key: HBASE-5347
> URL: https://issues.apache.org/jira/browse/HBASE-5347
> Project: HBase
>  Issue Type: Improvement
>Reporter: Prakash Khemani
>Assignee: Prakash Khemani
>
> On eviction of a block from the block-cache, instead of waiting for the 
> garbage collecter to reuse its memory, reuse the block right away.
> This will require us to keep reference counts on the HFile blocks. Once we 
> have the reference counts in place we can do our own simple 
> blocks-out-of-slab allocation for the block-cache.
> This will help us with
> * reducing gc pressure, especially in the old generation
> * making it possible to have non-java-heap memory backing the HFile blocks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206026#comment-13206026
 ] 

stack commented on HBASE-5387:
--

Any reason for hardcoding of 32K for buffer size:

+  ((Configurable)codec).getConf().setInt("io.file.buffer.size", 32 * 1024);

Give this an initial reasonable size?

+compressedByteStream = new ByteArrayOutputStream();

So, we'll keep around the largest thing we ever wrote into this 
ByteArrayOutputStream?  Should we resize it or something from time to time?  Or 
I suppose we can just wait till its a prob?

Is the gzip stuff brittle?  The header can be bigger than 10bytes I suppose 
(spec allows extensions IIRC) but I suppose its safe because we presume java or 
underlying native compression.

Good stuff Mikhail.  +1 on patch.

> Reuse compression streams in HFileBlock.Writer
> --
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
>  Issue Type: Bug
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of 
> allocating them every time. The motivation is that when using Java's built-in 
> implementation of Gzip, we allocate a new GZIPOutputStream object and an 
> associated native data structure every time we create a compression stream. 
> The native data structure is only deallocated in the finalizer. This is one 
> suspected cause of recent TestHFileBlock failures on Hadoop QA: 
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206030#comment-13206030
 ] 

Zhihong Yu commented on HBASE-5387:
---

w.r.t. the second comment above, I see the following in doCompression() :
{code}
+compressedByteStream.reset();
{code}
According to 
http://docs.oracle.com/javase/6/docs/api/java/io/ByteArrayOutputStream.html#reset%28%29,
 after the above call, 'all currently accumulated output in the output stream 
is discarded.'
The output stream can be used again, reusing the already allocated buffer space.

> Reuse compression streams in HFileBlock.Writer
> --
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
>  Issue Type: Bug
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of 
> allocating them every time. The motivation is that when using Java's built-in 
> implementation of Gzip, we allocate a new GZIPOutputStream object and an 
> associated native data structure every time we create a compression stream. 
> The native data structure is only deallocated in the finalizer. This is one 
> suspected cause of recent TestHFileBlock failures on Hadoop QA: 
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-02-10 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4218:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

With HBASE-5387, this issue can be resolved.

> Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
> ---
>
> Key: HBASE-4218
> URL: https://issues.apache.org/jira/browse/HBASE-4218
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jacek Migdal
>Assignee: Mikhail Bautin
>  Labels: compression
> Fix For: 0.94.0
>
> Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
> 0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt, 
> D1659.1.patch, D1659.2.patch, D447.1.patch, D447.10.patch, D447.11.patch, 
> D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.16.patch, 
> D447.17.patch, D447.18.patch, D447.19.patch, D447.2.patch, D447.20.patch, 
> D447.21.patch, D447.22.patch, D447.23.patch, D447.24.patch, D447.25.patch, 
> D447.26.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, 
> D447.7.patch, D447.8.patch, D447.9.patch, 
> Data-block-encoding-2011-12-23.patch, 
> Delta-encoding-2012-01-17_11_09_09.patch, 
> Delta-encoding-2012-01-25_00_45_29.patch, 
> Delta-encoding-2012-01-25_16_32_14.patch, 
> Delta-encoding.patch-2011-12-22_11_52_07.patch, 
> Delta-encoding.patch-2012-01-05_15_16_43.patch, 
> Delta-encoding.patch-2012-01-05_16_31_44.patch, 
> Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, 
> Delta-encoding.patch-2012-01-05_18_50_47.patch, 
> Delta-encoding.patch-2012-01-07_14_12_48.patch, 
> Delta-encoding.patch-2012-01-13_12_20_07.patch, 
> Delta_encoding_with_memstore_TS.patch, open-source.diff
>
>
> A compression for keys. Keys are sorted in HFile and they are usually very 
> similar. Because of that, it is possible to design better compression than 
> general purpose algorithms,
> It is an additional step designed to be used in memory. It aims to save 
> memory in cache as well as speeding seeks within HFileBlocks. It should 
> improve performance a lot, if key lengths are larger than value lengths. For 
> example, it makes a lot of sense to use it when value is a counter.
> Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
> shows that I could achieve decent level of compression:
>  key compression ratio: 92%
>  total compression ratio: 85%
>  LZO on the same data: 85%
>  LZO after delta encoding: 91%
> While having much better performance (20-80% faster decompression ratio than 
> LZO). Moreover, it should allow far more efficient seeking which should 
> improve performance a bit.
> It seems that a simple compression algorithms are good enough. Most of the 
> savings are due to prefix compression, int128 encoding, timestamp diffs and 
> bitfields to avoid duplication. That way, comparisons of compressed data can 
> be much faster than a byte comparator (thanks to prefix compression and 
> bitfields).
> In order to implement it in HBase two important changes in design will be 
> needed:
> -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
> and iterating; access to uncompressed buffer in HFileBlock will have bad 
> performance
> -extend comparators to support comparison assuming that N first bytes are 
> equal (or some fields are equal)
> Link to a discussion about something similar:
> http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windows&subj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5319) TRUNK broke since hbase-4218 went in? TestHFileBlock OOMEs

2012-02-10 Thread Zhihong Yu (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu resolved HBASE-5319.
---

Resolution: Duplicate

This issue should be fixed by HBASE-5387

> TRUNK broke since hbase-4218 went in?  TestHFileBlock OOMEs
> ---
>
> Key: HBASE-5319
> URL: https://issues.apache.org/jira/browse/HBASE-5319
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>
> Check it out...https://builds.apache.org/job/HBase-TRUNK/  Mikhail, you might 
> know whats up.  Else, will have a looksee...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5382) Test that we always cache index and bloom blocks

2012-02-10 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206035#comment-13206035
 ] 

Hudson commented on HBASE-5382:
---

Integrated in HBase-TRUNK-security #108 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/108/])
[jira] [HBASE-5382] Test that we always cache index and bloom blocks

Summary: This is a unit test that should have been part of HBASE-4683 but was
not committed. The original test was reviewed as part of
https://reviews.facebook.net/D807. Submitting unit test as a separate JIRA and
patch, and extending the scope of the test to also handle the case when block
cache is enabled for the column family.

Test Plan: Run unit tests

Reviewers: JIRA, jdcryans, lhofhansl, Liyin

Reviewed By: jdcryans

CC: jdcryans

Differential Revision: https://reviews.facebook.net/D1695

mbautin : 
Files : 
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestForceCacheImportantBlocks.java


> Test that we always cache index and bloom blocks
> 
>
> Key: HBASE-5382
> URL: https://issues.apache.org/jira/browse/HBASE-5382
> Project: HBase
>  Issue Type: Test
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: TestForceCacheImportantBlocks-2012-02-10_11_07_15.patch
>
>
> This is a unit test that should have been part of HBASE-4683 but was not 
> committed. The original test was reviewed as part of 
> https://reviews.facebook.net/D807. Submitting unit test as a separate JIRA 
> and patch, and extending the scope of the test to also handle the case when 
> block cache is enabled for the column family. The new review is at 
> https://reviews.facebook.net/D1695.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5384) Up heap used by hadoopqa

2012-02-10 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206034#comment-13206034
 ] 

Hudson commented on HBASE-5384:
---

Integrated in HBase-TRUNK-security #108 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/108/])
HBASE-5384 Up heap used by hadoopqa

stack : 
Files : 
* /hbase/trunk/dev-support/test-patch.properties
* /hbase/trunk/dev-support/test-patch.sh


> Up heap used by hadoopqa
> 
>
> Key: HBASE-5384
> URL: https://issues.apache.org/jira/browse/HBASE-5384
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Fix For: 0.94.0
>
> Attachments: hadoopqa_mavenopts.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5380) [book] book.xml - KeyValue, adding comment about KeyValue's not being split across blocks

2012-02-10 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206036#comment-13206036
 ] 

Hudson commented on HBASE-5380:
---

Integrated in HBase-TRUNK-security #108 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/108/])
hbase-5380.  book.xml, comment about KeyValue instances not being split 
across blocks


> [book] book.xml - KeyValue, adding comment about KeyValue's not being split 
> across blocks
> -
>
> Key: HBASE-5380
> URL: https://issues.apache.org/jira/browse/HBASE-5380
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: book_hbase_5380.xml.patch
>
>
> book.xml
> * Adding comment in KeyValue section about KV's not being split across 
> blocks.  This was a recent question on the dist-list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4683) Always cache index and bloom blocks

2012-02-10 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206033#comment-13206033
 ] 

Hudson commented on HBASE-4683:
---

Integrated in HBase-TRUNK-security #108 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/108/])
[jira] [HBASE-5382] Test that we always cache index and bloom blocks

Summary: This is a unit test that should have been part of HBASE-4683 but was
not committed. The original test was reviewed as part of
https://reviews.facebook.net/D807. Submitting unit test as a separate JIRA and
patch, and extending the scope of the test to also handle the case when block
cache is enabled for the column family.

Test Plan: Run unit tests

Reviewers: JIRA, jdcryans, lhofhansl, Liyin

Reviewed By: jdcryans

CC: jdcryans

Differential Revision: https://reviews.facebook.net/D1695

mbautin : 
Files : 
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestForceCacheImportantBlocks.java


> Always cache index and bloom blocks
> ---
>
> Key: HBASE-4683
> URL: https://issues.apache.org/jira/browse/HBASE-4683
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Mikhail Bautin
>Priority: Minor
> Fix For: 0.94.0, 0.92.0
>
> Attachments: 0001-Cache-important-block-types.patch, 4683-v2.txt, 
> 4683.txt, D1695.1.patch, D807.1.patch, D807.2.patch, D807.3.patch, 
> HBASE-4683-0.92-v2.patch, HBASE-4683-v3.patch
>
>
> This would add a new boolean config option: hfile.block.cache.datablocks
> Default would be true.
> Setting this to false allows HBase in a mode where only index blocks are 
> cached, which is useful for analytical scenarios where a useful working set 
> of the data cannot be expected to fit into the (aggregate) cache.
> This is the equivalent of setting cacheBlocks to false on all scans 
> (including scans on behalf of gets).
> I would like to get a general feeling about what folks think about this.
> The change itself would be simple.
> Update (Mikhail): we probably don't need a new conf option. Instead, we will 
> make index blocks cached by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5378) [book] book.xml - added link to coprocessor blog entry

2012-02-10 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206037#comment-13206037
 ] 

Hudson commented on HBASE-5378:
---

Integrated in HBase-TRUNK-security #108 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/108/])
hbase-5378  book.xml - adding new section for coprocessors in 
Arch/RegionServer


> [book] book.xml - added link to coprocessor blog entry 
> ---
>
> Key: HBASE-5378
> URL: https://issues.apache.org/jira/browse/HBASE-5378
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Trivial
> Attachments: book_hbase_5378.xml.patch
>
>
> book.xml
> * added section under Arch/RegionServer for Coprocessors, and a link to the 
> blog entry on this subject.
> * updated the schema design chapter that mentioned coprocessors link to this 
> new section.
> * minor update to compaction explanation in the 3rd example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5368) Move PrefixSplitKeyPolicy out of the src/test into src, so it is accessible in HBase installs

2012-02-10 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206039#comment-13206039
 ] 

Hudson commented on HBASE-5368:
---

Integrated in HBase-TRUNK-security #108 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/108/])
HBASE-5368 Move PrefixSplitKeyPolicy out of the src/test into src, so it is 
accessible in HBase installs

larsh : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/PrefixSplitKeyPolicy.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java


> Move PrefixSplitKeyPolicy out of the src/test into src, so it is accessible 
> in HBase installs
> -
>
> Key: HBASE-5368
> URL: https://issues.apache.org/jira/browse/HBASE-5368
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 5368.txt
>
>
> Very simple change to make PrefixSplitKeyPolicy accessible in HBase installs 
> (user still needs to setup the table(s) accordingly).
> Right now it is in src/test/org.apache.hadoop.hbase.regionserver, I propose 
> moving it to src/org.apache.hadoop.hbase.regionserver (alongside 
> ConstantSizeRegionSplitPolicy), and maybe renaming it too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5364) Fix source files missing licenses in 0.92 and trunk

2012-02-10 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206038#comment-13206038
 ] 

Hudson commented on HBASE-5364:
---

Integrated in HBase-TRUNK-security #108 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/108/])
HBASE-5364 Fix source files missing license in 0.92 and trunk (Elliott 
Clark)

jmhsieh : 
Files : 
* /hbase/trunk/bin/hbase-jruby
* /hbase/trunk/dev-support/findHangingTest.sh
* /hbase/trunk/src/main/python/hbase/merge_conf.py
* /hbase/trunk/src/packages/deb/conf-pseudo.control/control
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestHTableDescriptor.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/constraint/RuntimeFailConstraint.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportExport.java


> Fix source files missing licenses in 0.92 and trunk
> ---
>
> Key: HBASE-5364
> URL: https://issues.apache.org/jira/browse/HBASE-5364
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.0, 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Elliott Clark
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
> Attachments: HBASE-5364-1.patch, hbase-5364-0.92.patch, 
> hbase-5364-v2.patch
>
>
> running 'mvn rat:check' shows that a few files have snuck in that do not have 
> proper apache licenses.  Ideally we should fix these before we cut another 
> release/release candidate.
> This is a blocker for 0.94, and probably should be for the other branches as 
> well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5387:
--

 Priority: Critical  (was: Major)
Affects Version/s: 0.94.0
Fix Version/s: 0.94.0
 Hadoop Flags: Reviewed

> Reuse compression streams in HFileBlock.Writer
> --
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.0
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of 
> allocating them every time. The motivation is that when using Java's built-in 
> implementation of Gzip, we allocate a new GZIPOutputStream object and an 
> associated native data structure every time we create a compression stream. 
> The native data structure is only deallocated in the finalizer. This is one 
> suspected cause of recent TestHFileBlock failures on Hadoop QA: 
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206040#comment-13206040
 ] 

Zhihong Yu commented on HBASE-5387:
---

See 
http://crawler.archive.org/apidocs/org/archive/io/GzipHeader.html#MINIMAL_GZIP_HEADER_LENGTH
 for where header length came from.

> Reuse compression streams in HFileBlock.Writer
> --
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.0
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Fix For: 0.94.0
>
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of 
> allocating them every time. The motivation is that when using Java's built-in 
> implementation of Gzip, we allocate a new GZIPOutputStream object and an 
> associated native data structure every time we create a compression stream. 
> The native data structure is only deallocated in the finalizer. This is one 
> suspected cause of recent TestHFileBlock failures on Hadoop QA: 
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206040#comment-13206040
 ] 

Zhihong Yu edited comment on HBASE-5387 at 2/11/12 6:11 AM:


See 
http://crawler.archive.org/apidocs/org/archive/io/GzipHeader.html#MINIMAL_GZIP_HEADER_LENGTH
 for where header length came from.

http://kickjava.com/src/java/util/zip/GZIPOutputStream.java.htm, line 109 gives 
us better idea about header length.

  was (Author: zhi...@ebaysf.com):
See 
http://crawler.archive.org/apidocs/org/archive/io/GzipHeader.html#MINIMAL_GZIP_HEADER_LENGTH
 for where header length came from.
  
> Reuse compression streams in HFileBlock.Writer
> --
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.0
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of 
> allocating them every time. The motivation is that when using Java's built-in 
> implementation of Gzip, we allocate a new GZIPOutputStream object and an 
> associated native data structure every time we create a compression stream. 
> The native data structure is only deallocated in the finalizer. This is one 
> suspected cause of recent TestHFileBlock failures on Hadoop QA: 
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5347) GC free memory management in Level-1 Block Cache

2012-02-10 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206043#comment-13206043
 ] 

Lars Hofhansl commented on HBASE-5347:
--

Seems to me that we should target this for 0.96.

> GC free memory management in Level-1 Block Cache
> 
>
> Key: HBASE-5347
> URL: https://issues.apache.org/jira/browse/HBASE-5347
> Project: HBase
>  Issue Type: Improvement
>Reporter: Prakash Khemani
>Assignee: Prakash Khemani
>
> On eviction of a block from the block-cache, instead of waiting for the 
> garbage collecter to reuse its memory, reuse the block right away.
> This will require us to keep reference counts on the HFile blocks. Once we 
> have the reference counts in place we can do our own simple 
> blocks-out-of-slab allocation for the block-cache.
> This will help us with
> * reducing gc pressure, especially in the old generation
> * making it possible to have non-java-heap memory backing the HFile blocks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




<    1   2