[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2022-06-12 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553331#comment-17553331
 ] 

Andrew Kyle Purtell commented on HBASE-5313:


This one seems valuable to retain for a potential HFileV4 effort.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile, io
>Reporter: Dhruba Borthakur
>Priority: Major
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2018-02-16 Thread Mark Hale (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367326#comment-16367326
 ] 

Mark Hale commented on HBASE-5313:
--

This would be of particular interest to us too. We have some tables where we 
can pack the entirety of our data into (small) composite row keys (with no 
values) and take advantage of the lexical key ordering to scan on the first 
component of the composite key to return the set of second components.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>Priority: Major
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2016-07-04 Thread Robert James (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361344#comment-15361344
 ] 

Robert James commented on HBASE-5313:
-

This ticket seems to have been abandoned.  Why? The results posted by [~he 
yongqiang] show a lot of performance gain: half the disk usage.  Has it just 
been forgotten, or has a decision been made not to do this? Why?

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2013-07-16 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710526#comment-13710526
 ] 

Jean-Daniel Cryans commented on HBASE-5313:
---

After some more investigation, I don't think it will be easy to do. 
[~mcorgan]'s HBASE-7162 relies on that code too. So we have to make 
HFileBlockDefaultEncodingContext thread safe it seems.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2013-07-15 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708969#comment-13708969
 ] 

Mikhail Bautin commented on HBASE-5313:
---

[~jdcryans]: I'm OK with reverting HBASE-5521 because it does not look like 
HBASE-5313 is moving forward.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2013-07-15 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708886#comment-13708886
 ] 

Jean-Daniel Cryans commented on HBASE-5313:
---

[~he yongqiang], [~dhruba], [~mikhail]

Guys, I need your help to understand what's going on with this jira. HBASE-5521 
has been committed more than a year ago and nothing moved after that. Moreover, 
the code breaks encoding by making it not thread safe. See HBASE-8732.

This makes me think that the code in 5521 was not seriously tested (maybe 
waiting on this jira to tie all the loose ends?) and since we are trying to 
release 0.96.0 soonish, I'm currently in favor of reverting it.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-04-05 Thread He Yongqiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247497#comment-13247497
 ] 

He Yongqiang commented on HBASE-5313:
-

Hi Kannan,

We are still experimenting this. The initial results shows only less than one 
quarter off, which is kind of not big enough for us. The timestamp issue is a 
low hanging fruit, which can cut 8%. 
We will post some diff asap, once after we finalize our experiments.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-04-03 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245093#comment-13245093
 ] 

Kannan Muthukkaruppan commented on HBASE-5313:
--

Yongqiang: Any updates on this effort/investigation? I noticed HBASE-5674 that 
you had created which is sort of going after a specific part (timestamps)... 
but was curious where things are with respect to this JIRA.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-03-19 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233244#comment-13233244
 ] 

Hudson commented on HBASE-5313:
---

Integrated in HBase-TRUNK-security #143 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/143/])
HBASE-5521 [jira] Move compression/decompression to an encoder specific 
encoding
context

Author: Yongqiang He

Summary:
https://issues.apache.org/jira/browse/HBASE-5521

As part of working on HBASE-5313, we want to add a new columnar encoder/decoder.
It makes sense to move compression to be part of encoder/decoder:
1) a scanner for a columnar encoded block can do lazy decompression to a
specific part of a key value object
2) avoid an extra bytes copy from encoder to hblock-writer.

If there is no encoder specified for a writer, the HBlock.Writer will use a
default compression-context to do something very similar to today's code.

Test Plan: existing unit tests verified by mbautin and tedyu. And no new test
added here since this code is just a preparation for columnar encoder. Will add
testcase later in that diff.

Reviewers: dhruba, tedyu, sc, mbautin

Reviewed By: mbautin

Differential Revision: https://reviews.facebook.net/D2097 (Revision 1302602)

 Result = FAILURE
mbautin : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/EncodedDataBlock.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockDecodingContext.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockDefaultDecodingContext.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockDefaultEncodingContext.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockEncodingContext.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/encoding/TestDataBlockEncoders.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/DataBlockEncodingTool.java


> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-03-09 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226351#comment-13226351
 ] 

dhruba borthakur commented on HBASE-5313:
-

I am guessing that initially, we keep this new "columnar encoding" completely 
isolated inside a HFileBlock. At table creation time, one can specify that the 
table be stored in columnar-encoded fashion.

A HFile will have information in the FixedFileTrailer that specifies whether 
the data inside the hfile is in columnar-format. A single HFileBlock will have 
four "column-entity": all the rowkeys will be laid out first, followed by all 
the cf, followed by all the "column names", followed by the timestamps, 
followed by the memstoreTS, followed by all the values.

If 'prefix-encoding' is enabled, then each column-entity will be prefix encoded 
individually. If compression (lzo, gz, etc) is enabled, the entire HFileBlock 
will be compressed accordingly.

Prefix-encoding will work well for the rowkey entity and the column-family 
entity. The column name entity may need to be sorted and then prefix encoded. 
The timestamp entity may need special kind of encoding. One option (suggested 
by a co-worker Chip Turner) is to take the first timestamp as the base and xor 
it with each of the following timestamps (thus, zeroing out the common bits) 
and then storing it. Another option is to find the minimum timestamp in the 
block and then store diffs from that minimum value. Yet another option is to 
make Jan-01-2012 as the hbase-epoch and store the difference from that number.


> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-03-05 Thread Matt Corgan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222550#comment-13222550
 ] 

Matt Corgan commented on HBASE-5313:


Just noticed this jira.  I've been working on 
https://issues.apache.org/jira/browse/HBASE-4676.  In this trie format all the 
values are concatenated at the end of the block.  I haven't done anything with 
compressing them because they are generally small in my use cases, but seems 
like it would eventually be a good option.  I would think that the compression 
savings would be similar to the on-disk compression savings, but the benefit is 
that you have access to scan the keys while the data part of the block is still 
compressed.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-03-05 Thread He Yongqiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222512#comment-13222512
 ] 

He Yongqiang commented on HBASE-5313:
-

As part of working on HBASE-5313, we first tried to write a 
HFileWriter/HFileReader to do it. After finishing some work, it seems this 
requires a lot of code refactoring in order to reuse existing code as much as 
possible.

Then we find seems adding a new columnar encoder/decoder would be easy to do. 
opened https://issues.apache.org/jira/browse/HBASE-5521 to do encoder/decoder 
specific compression work.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-22 Thread He Yongqiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214032#comment-13214032
 ] 

He Yongqiang commented on HBASE-5313:
-

As a first step, we will go ahead with a simple columnar layout implementation. 
And leave more advanced features (like nested column layout) in a follow up. 



> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-13 Thread He Yongqiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207070#comment-13207070
 ] 

He Yongqiang commented on HBASE-5313:
-

bq. However, those compression numbers are pretty nice. I worry a little bit 
about having now an hfileV3, so soon on the heels of the last, leading to a 
proliferation of versions. My other concern is that the columnar storage 
doesn't make sense for all cases - Dremel is for a specific use case.
That being said, I would love to see the ability to do Dremel in HBase. How 
about along with a new version/columnar data support comes the ability to 
select storage files on a per-table basis? That would enable some tables to be 
optimized for certain use cases, other tables for others, rather than having to 
use completely different clusters (continuing the multi-tenancy story).

@Jesse Yates, Yeah. Agree here. One big thing we need to answer is how to 
integrate with current HFile implementation. We want to reuse code as much as 
possible. I guess a nested columnar structure like Dremel is what we finally 
want for HBase. But we first need to figure out a good story of how 
applications will use it.



> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-13 Thread He Yongqiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207057#comment-13207057
 ] 

He Yongqiang commented on HBASE-5313:
-

>>Can you also list the time it took writing the HFile for each of the three 
>>schemes ?
@Zhihong, we are still trying to explore more ideas here. Once we got a 
finalized plan, i will get the cpu/latency numbers. 


>>Yongqiang, what is the delta encoding algorithm did you use? The default 
>>algorithm only do a simple encoding. Do we have results using prefix with 
>>fast diff algorithm for the current hfile v2?

@jerry, i tried all three delta. And Diff with HFileWriterV2 is producing 
smallest file in my test. 






> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-11 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206237#comment-13206237
 ] 

dhruba borthakur commented on HBASE-5313:
-

yq: can we get some numbers how the compression is if we just do columnar and 
delta-compression (no lzo). This will tell us if there is benefit in storing 
data columnar in cache.

we still have to measure the overhead of a read/scan when data us stored in 
columnar fashion. Very early to say whether this is 0.96 or something further 
out.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206018#comment-13206018
 ] 

Lars Hofhansl commented on HBASE-5313:
--

I agree with Ted, this is 0.96 material.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread Jerry Chen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205987#comment-13205987
 ] 

Jerry Chen commented on HBASE-5313:
---

Yongqiang, what is the delta encoding algorithm did you use? The default 
algorithm only do a simple encoding. Do we have results using prefix with fast 
diff algorithm for the current hfile v2? 

I suppose this is only for the on-disk representation. How do we plan to 
represent it in block cache?  

Sent from my iPhone




> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205978#comment-13205978
 ] 

Zhihong Yu commented on HBASE-5313:
---

There're only two weeks before we branch 0.94
I think HFile v3 would be in 0.96, containing this feature and HBASE-5347.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205931#comment-13205931
 ] 

Jesse Yates commented on HBASE-5313:


However, those compression numbers are pretty nice. I worry a little bit about 
having now an hfileV3, so soon on the heels of the last, leading to a 
proliferation of versions. My other concern is that the columnar storage 
doesn't make sense for all cases - Dremel is for a specific use case.

That being said, I would love to see the ability to do Dremel in HBase. How 
about along with a new version/columnar data support comes the ability to 
select storage files on a per-table basis? That would enable some tables to be 
optimized for certain use cases, other tables for others, rather than having to 
use completely different clusters (continuing the multi-tenancy story).

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205921#comment-13205921
 ] 

dhruba borthakur commented on HBASE-5313:
-

The same amount of kvs in each file. total of 3 million kvs for this 
experiment. The blocksize is 16 KB.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205909#comment-13205909
 ] 

Zhihong Yu commented on HBASE-5313:
---

@Yongqiang:
Thanks for sharing the results.
Can you also list the time it took writing the HFile for each of the three 
schemes ?

If you can characterize the row keys and values, that would be nice too.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205895#comment-13205895
 ] 

stack commented on HBASE-5313:
--

How do I read the above?  Its same amount of kvs in each of the files?

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread He Yongqiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205892#comment-13205892
 ] 

He Yongqiang commented on HBASE-5313:
-

@Todd, with such a small block size and data also already sorted, i was also 
thinking it is will be very hard to optimize the space.

So we did some experiments by modifying today's HFileWriter. It turns out it 
can still save a lot if we play more tricks.

Here are test results (block size is 16KB):

*42MB HFile, with Delta compression and with LZO compression* (with default 
setting on Apache trunk)

*30MB HFile, with Columnar, with Delta compression, and with LZO compression.*

Inside one block, first put all row keys inside that block, and do delta 
compression, and then LZO compression. After row key, put all column family 
data in that block, and do Delta+LZO for it. And then similarly put 
column_qualifier. etc

*24MB HFile, with Columnar, Sort value column, Sort column_qualifier column, 
and with LZO compression.*

Inside one block, first put all row keys inside that block, and do delta 
compression, and then LZO compression. After row key, put all column family 
data in that block, and do Delta+LZO for it. And then put column_qualifier, 
sort it, and then do Delta+LZO. TS column and Code column are processed the 
same as column family. The value column is processed the same as 
column_qualifier. So it is the same as disk format for the 30MB HFile, except 
all data for 'column_qualifier' and 'value' are sorted separately.

Out of 24MB file, 6MB is used to store row keys, 7MB is used to store 
column_qualifier, and 6MB is to store value.

More ideas are welcome! 


> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-08 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204031#comment-13204031
 ] 

Todd Lipcon commented on HBASE-5313:


I'm curious what the expected compression gain would be. Has anyone tried 
"rearranging" an example of a production hfile block and recompressing to see 
the difference?

My thinking is that typical LZ-based compression (eg snappy) uses a hashtable 
for common substring identification which is up to 16K entries or so. So I 
don't know that it would do a particularly better job with the common keys if 
they were all grouped at the front of the block - so long as the keyval pairs 
are less than a few hundred bytes apart, it should still find them OK.

Of course the other gains (storing large values compressed in RAM for example) 
seem good.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-08 Thread He Yongqiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203935#comment-13203935
 ] 

He Yongqiang commented on HBASE-5313:
-

"I suppose we could use the value length from the key, then know we have nth 
key and by using the value length of all 1 to n-1 keys to find the value."
Yes. The value length is stored in the key header. The key header is cheap. And 
can always be decompressed without a big cpu cost.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-08 Thread Prakash Khemani (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203918#comment-13203918
 ] 

Prakash Khemani commented on HBASE-5313:


The values can be kept compressed in memory. We can uncompress them on
demand when writing out the key-values during rpc or compactions.

The key has to have a pointer to the values. The pointer can be implicit
and can be derived from value lengths if all the values are stored in the
same order as keys.

The value pointer has to be explicit if the values are stored in a
different order than the keys. We might want to write out the values in a
different order if we want to do per column compression. While writing out
the HFileBlock the following can be done - group all the values by their
column identifier, independently compress and write out each group of
values, go back to the keys and update the value pointers.


On 2/8/12 11:50 AM, "Lars Hofhansl (Commented) (JIRA)" 




> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-08 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203899#comment-13203899
 ] 

Lars Hofhansl commented on HBASE-5313:
--

Presumably storing the keys together might lends itself for better compression.
Do we need to index values then? In that case we'd use up more space. Or how 
would we find the value belonging to a key?
I suppose we could use the value length from the key, then know we have nth key 
and by using the value length of all 1 to n-1 keys to find the value.
Or store the lengths with the values and scan the keys and values in "parallel".

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-08 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203836#comment-13203836
 ] 

Nicolas Spiegelberg commented on HBASE-5313:


Storing all keys together would just help on CPU, correct?  We wouldn't get any 
disk size savings or IO savings with the current approach.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-07 Thread He Yongqiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203324#comment-13203324
 ] 

He Yongqiang commented on HBASE-5313:
-

As discussed earlier, one thing we can try is to use something like hive's 
rcfile. The thing different from hive is hbase row's value is not a single 
type. If it turns out the columnar file format helps, we can employ nested 
columnar format for the value (like what dremel does.). There is one thread on 
Quora about dremel 
http://www.quora.com/How-will-Googles-Dremel-change-future-Hadoop-releases.

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-06 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202127#comment-13202127
 ] 

dhruba borthakur commented on HBASE-5313:
-

One option listed above is to keep all the keys in the beginning of the block 
and all the values in the end of the block. The keys will still be 
delta-encoded. The values can be lzo-compressed.

any other ideas out there?

> Restructure hfiles layout for better compression
> 
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira