[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-07 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145322#comment-13145322
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

Review request for Eli Collins and Todd Lipcon.


Summary
---

Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with 
something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs
-

  src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-07 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145323#comment-13145323
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2011-11-07 10:28:10.774971)


Review request for Eli Collins and Todd Lipcon.


Summary (updated)
---

Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with 
something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs
-

  src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-07 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145321#comment-13145321
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2011-11-07 10:27:33.538403)


Review request for Eli Collins and Todd Lipcon.


Summary (updated)
---

Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with 
something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs
-

  src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-07 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145566#comment-13145566
 ] 

Todd Lipcon commented on HBASE-4608:


I haven't looked at the patch yet, but it would be great if you could build a 
tool to go along with this for testing that compresses/decompresses logs. EG:

bin/hbase org.apache.hadoop.hbase.HLogTool -compress /path/to/hlog 
/path/to/hlog.compressed
bin/hbase org.apache.hadoop.hbase.HLogTool -uncompress 
/path/to/hlog.compressed /path/to/hlog
.. or something like that.

Then real users could see what kind of compression ratio they could expect (and 
it serves as a decent test that compress/uncompress yields the original file)

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-07 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145924#comment-13145924
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2011-11-07 23:12:37.111204)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary
---

Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with 
something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs
-

  src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-07 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145950#comment-13145950
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3093
---


Cool stuff.

I am probably just missing something... But when is the dictionary itself 
stored? Don't we need to read out the logs again.

Just so I understand: We build up the dictionary as we go along. In the 
beginning most things won't be in the dictionary, we write them out and add 
them to the dict, and from that time on when we encounter them again we just 
write the index.
On the read we could also build up the dict as we go along, because when values 
weren't in the dictionary they where written into the file, so we can recreate 
the dictionary as we read. Right?

(As I said, I am probably missing something).

See minor comments inline.


src/main/java/org/apache/hadoop/hbase/KeyValue.java


This is functionally the same as before, but less readable. I don't think 
this leads to much performance improvement.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


I think we leave out the line with the year now.
Lot's of leading whitespace and weird indentation in this file.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


passing 0 here? I might be missing something, but looking down at 
readCompressed that looks wrong.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java


Could we have a no-op compressor instead?


- Lars


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-07 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145965#comment-13145965
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2011-11-07 23:39:59, Lars Hofhansl wrote:
bq.  > Cool stuff.
bq.  > 
bq.  > I am probably just missing something... But when is the dictionary 
itself stored? Don't we need to read out the logs again.
bq.  > 
bq.  > Just so I understand: We build up the dictionary as we go along. In the 
beginning most things won't be in the dictionary, we write them out and add 
them to the dict, and from that time on when we encounter them again we just 
write the index.
bq.  > On the read we could also build up the dict as we go along, because when 
values weren't in the dictionary they where written into the file, so we can 
recreate the dictionary as we read. Right?
bq.  > 
bq.  > (As I said, I am probably missing something).
bq.  > 
bq.  > See minor comments inline.

You aren't missing anything! Thats exactly how it works.

Each WAL starts off with a brand new shiny dictionary. We build up the 
dictionary as we write, and when we read, we start off with a shiny new 
dictionary again. The dictionary is recreated upon read.


bq.  On 2011-11-07 23:39:59, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 1088
bq.  > 
bq.  >
bq.  > This is functionally the same as before, but less readable. I don't 
think this leads to much performance improvement.

good point, i can get rid of this.


bq.  On 2011-11-07 23:39:59, Lars Hofhansl wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
line 2
bq.  > 
bq.  >
bq.  > I think we leave out the line with the year now.
bq.  > Lot's of leading whitespace and weird indentation in this file.

I need to fix my eclipse autoformatter. Will take care of this and formatting 
bugs.


bq.  On 2011-11-07 23:39:59, Lars Hofhansl wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
line 62
bq.  > 
bq.  >
bq.  > passing 0 here? I might be missing something, but looking down at 
readCompressed that looks wrong.

We pass a 0, because we don't encode the length of the qualifier. I don't know 
why we don't but thats how KeyValue does it.


bq.  On 2011-11-07 23:39:59, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, 
line 157
bq.  > 
bq.  >
bq.  > Could we have a no-op compressor instead?

no-op compressor? as in one that does nothing?


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3093
---


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type

[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-07 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145984#comment-13145984
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2011-11-07 23:39:59, Lars Hofhansl wrote:
bq.  > Cool stuff.
bq.  > 
bq.  > I am probably just missing something... But when is the dictionary 
itself stored? Don't we need to read out the logs again.
bq.  > 
bq.  > Just so I understand: We build up the dictionary as we go along. In the 
beginning most things won't be in the dictionary, we write them out and add 
them to the dict, and from that time on when we encounter them again we just 
write the index.
bq.  > On the read we could also build up the dict as we go along, because when 
values weren't in the dictionary they where written into the file, so we can 
recreate the dictionary as we read. Right?
bq.  > 
bq.  > (As I said, I am probably missing something).
bq.  > 
bq.  > See minor comments inline.
bq.  
bq.  Li Pi wrote:
bq.  You aren't missing anything! Thats exactly how it works.
bq.  
bq.  Each WAL starts off with a brand new shiny dictionary. We build up the 
dictionary as we write, and when we read, we start off with a shiny new 
dictionary again. The dictionary is recreated upon read.

Ok... What I cannot find then, is the code that builds the dictionary during 
read :)

Also as a general concern... We write these WAL logs (in part) for redundancy. 
Compression is the opposite of redundancy... So say, we garble the beginning of 
a WAL file, then the entire file will be useless to us... I don't think that is 
a big deal, though. As the WAL entries are variable length this is mostly true 
even today.


bq.  On 2011-11-07 23:39:59, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, 
line 157
bq.  > 
bq.  >
bq.  > Could we have a no-op compressor instead?
bq.  
bq.  Li Pi wrote:
bq.  no-op compressor? as in one that does nothing?

Yep... So compression will never be null, and we can safe if-statements (and 
make the code more readable) :)


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3093
---


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default

[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-07 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146017#comment-13146017
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2011-11-07 23:39:59, Lars Hofhansl wrote:
bq.  > Cool stuff.
bq.  > 
bq.  > I am probably just missing something... But when is the dictionary 
itself stored? Don't we need to read out the logs again.
bq.  > 
bq.  > Just so I understand: We build up the dictionary as we go along. In the 
beginning most things won't be in the dictionary, we write them out and add 
them to the dict, and from that time on when we encounter them again we just 
write the index.
bq.  > On the read we could also build up the dict as we go along, because when 
values weren't in the dictionary they where written into the file, so we can 
recreate the dictionary as we read. Right?
bq.  > 
bq.  > (As I said, I am probably missing something).
bq.  > 
bq.  > See minor comments inline.
bq.  
bq.  Li Pi wrote:
bq.  You aren't missing anything! Thats exactly how it works.
bq.  
bq.  Each WAL starts off with a brand new shiny dictionary. We build up the 
dictionary as we write, and when we read, we start off with a shiny new 
dictionary again. The dictionary is recreated upon read.
bq.  
bq.  Lars Hofhansl wrote:
bq.  Ok... What I cannot find then, is the code that builds the dictionary 
during read :)
bq.  
bq.  Also as a general concern... We write these WAL logs (in part) for 
redundancy. Compression is the opposite of redundancy... So say, we garble the 
beginning of a WAL file, then the entire file will be useless to us... I don't 
think that is a big deal, though. As the WAL entries are variable length this 
is mostly true even today.
bq.

Oops, somehow I deleted that line. There are comments for it. Added it back in.

//if this isn't in the dictionary, we need to add to the dictionary.

As for the more general concern: HBase won't return a write to the client until 
the WALEdit write is completely done. So aborting midway won't be an issue - 
and even if we abort midway, we can recover everything thats been written so 
far.

For the beginning of the file getting garbled? - True but we'd lose some 
information with or without compression. With compression we lose more 
information, but that's the nature of compression. Recovering a partially 
garbled WAL fully is impossible no matter what approach we use. Either way, its 
not a contingency the WAL is built to handle - a partial recovery after all WAL 
replica's have been corrupted.


bq.  On 2011-11-07 23:39:59, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, 
line 157
bq.  > 
bq.  >
bq.  > Could we have a no-op compressor instead?
bq.  
bq.  Li Pi wrote:
bq.  no-op compressor? as in one that does nothing?
bq.  
bq.  Lars Hofhansl wrote:
bq.  Yep... So compression will never be null, and we can safe 
if-statements (and make the code more readable) :)

Sure. I should probably define a compressor interface as well.


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3093
---


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  

[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-08 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146269#comment-13146269
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2011-11-07 23:39:59, Lars Hofhansl wrote:
bq.  > Cool stuff.
bq.  > 
bq.  > I am probably just missing something... But when is the dictionary 
itself stored? Don't we need to read out the logs again.
bq.  > 
bq.  > Just so I understand: We build up the dictionary as we go along. In the 
beginning most things won't be in the dictionary, we write them out and add 
them to the dict, and from that time on when we encounter them again we just 
write the index.
bq.  > On the read we could also build up the dict as we go along, because when 
values weren't in the dictionary they where written into the file, so we can 
recreate the dictionary as we read. Right?
bq.  > 
bq.  > (As I said, I am probably missing something).
bq.  > 
bq.  > See minor comments inline.
bq.  
bq.  Li Pi wrote:
bq.  You aren't missing anything! Thats exactly how it works.
bq.  
bq.  Each WAL starts off with a brand new shiny dictionary. We build up the 
dictionary as we write, and when we read, we start off with a shiny new 
dictionary again. The dictionary is recreated upon read.
bq.  
bq.  Lars Hofhansl wrote:
bq.  Ok... What I cannot find then, is the code that builds the dictionary 
during read :)
bq.  
bq.  Also as a general concern... We write these WAL logs (in part) for 
redundancy. Compression is the opposite of redundancy... So say, we garble the 
beginning of a WAL file, then the entire file will be useless to us... I don't 
think that is a big deal, though. As the WAL entries are variable length this 
is mostly true even today.
bq.
bq.  
bq.  Li Pi wrote:
bq.  Oops, somehow I deleted that line. There are comments for it. Added it 
back in.
bq.  
bq.  //if this isn't in the dictionary, we need to add to the dictionary.
bq.  
bq.  As for the more general concern: HBase won't return a write to the 
client until the WALEdit write is completely done. So aborting midway won't be 
an issue - and even if we abort midway, we can recover everything thats been 
written so far.
bq.  
bq.  For the beginning of the file getting garbled? - True but we'd lose 
some information with or without compression. With compression we lose more 
information, but that's the nature of compression. Recovering a partially 
garbled WAL fully is impossible no matter what approach we use. Either way, its 
not a contingency the WAL is built to handle - a partial recovery after all WAL 
replica's have been corrupted.

well, in the non-compressed WAL case, we can re-sync to a SequenceFile "SYNC" 
marker and continue reading from there in the face of arbitrary corruption.

Perhaps the compression mechanism should have some kind of "maximum lookback" - 
ie when a dictionary is being built, keep the file offset where each dictionary 
word was used. Then, when deciding to use a dict reference vs a literal, if the 
curOffset - lastUsedOffset > MAX_LOOKBACK_THRESHOLD, we re-write the entry. 
This would bound the size of unrecoverable WAL portions while still providing 
good compression (similar to what we have today)


- Todd


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3093
---


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apach

[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-09 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147225#comment-13147225
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2011-11-07 23:39:59, Lars Hofhansl wrote:
bq.  > Cool stuff.
bq.  > 
bq.  > I am probably just missing something... But when is the dictionary 
itself stored? Don't we need to read out the logs again.
bq.  > 
bq.  > Just so I understand: We build up the dictionary as we go along. In the 
beginning most things won't be in the dictionary, we write them out and add 
them to the dict, and from that time on when we encounter them again we just 
write the index.
bq.  > On the read we could also build up the dict as we go along, because when 
values weren't in the dictionary they where written into the file, so we can 
recreate the dictionary as we read. Right?
bq.  > 
bq.  > (As I said, I am probably missing something).
bq.  > 
bq.  > See minor comments inline.
bq.  
bq.  Li Pi wrote:
bq.  You aren't missing anything! Thats exactly how it works.
bq.  
bq.  Each WAL starts off with a brand new shiny dictionary. We build up the 
dictionary as we write, and when we read, we start off with a shiny new 
dictionary again. The dictionary is recreated upon read.
bq.  
bq.  Lars Hofhansl wrote:
bq.  Ok... What I cannot find then, is the code that builds the dictionary 
during read :)
bq.  
bq.  Also as a general concern... We write these WAL logs (in part) for 
redundancy. Compression is the opposite of redundancy... So say, we garble the 
beginning of a WAL file, then the entire file will be useless to us... I don't 
think that is a big deal, though. As the WAL entries are variable length this 
is mostly true even today.
bq.
bq.  
bq.  Li Pi wrote:
bq.  Oops, somehow I deleted that line. There are comments for it. Added it 
back in.
bq.  
bq.  //if this isn't in the dictionary, we need to add to the dictionary.
bq.  
bq.  As for the more general concern: HBase won't return a write to the 
client until the WALEdit write is completely done. So aborting midway won't be 
an issue - and even if we abort midway, we can recover everything thats been 
written so far.
bq.  
bq.  For the beginning of the file getting garbled? - True but we'd lose 
some information with or without compression. With compression we lose more 
information, but that's the nature of compression. Recovering a partially 
garbled WAL fully is impossible no matter what approach we use. Either way, its 
not a contingency the WAL is built to handle - a partial recovery after all WAL 
replica's have been corrupted.
bq.  
bq.  Todd Lipcon wrote:
bq.  well, in the non-compressed WAL case, we can re-sync to a SequenceFile 
"SYNC" marker and continue reading from there in the face of arbitrary 
corruption.
bq.  
bq.  Perhaps the compression mechanism should have some kind of "maximum 
lookback" - ie when a dictionary is being built, keep the file offset where 
each dictionary word was used. Then, when deciding to use a dict reference vs a 
literal, if the curOffset - lastUsedOffset > MAX_LOOKBACK_THRESHOLD, we 
re-write the entry. This would bound the size of unrecoverable WAL portions 
while still providing good compression (similar to what we have today)

That makes sense. Maybe file a separate jira and use this one to get the 
compression in?


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3093
---


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary

[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-09 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147238#comment-13147238
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3134
---


overall, interesting idea.  especially for the counter workload case.


src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java


Since dictionaries would probably get a little more complex (threshold 
size, different types of dictionaries).  You would also need to write code to 
persist dictionary state at the beginning of the HLog.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java


you're planning to write the code to threshold size, correct?  This should 
probably be user-configurable.  Setting it to 0 to disable compression.


- Nicolas


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-09 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147245#comment-13147245
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3136
---



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java


should compression be added to the HLogKey as well to compress regionName & 
table?  It seems like the biggest wins will come from table + region + family, 
which all have user-bounded values.  It might even make sense to have these 
values in a different dictionary from row & column qualifier, which can be 
unbounded and might accidentally dominate the dictionary


- Nicolas


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-09 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147345#comment-13147345
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3142
---



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


The entry returned maybe null, right ?


- Ted


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-09 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147363#comment-13147363
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2011-11-09 20:00:52, Nicolas Spiegelberg wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, 
lines 122-124
bq.  > 
bq.  >
bq.  > should compression be added to the HLogKey as well to compress 
regionName & table?  It seems like the biggest wins will come from table + 
region + family, which all have user-bounded values.  It might even make sense 
to have these values in a different dictionary from row & column qualifier, 
which can be unbounded and might accidentally dominate the dictionary

Yes. I'm adding compression for regionname, table, and family as well. For this 
kind of simple 1 way associative dictionary, its likely that those two factors 
will end up dominating, but other more complex dictionaries can be used, 
perhaps with more interesting eviction strategies.

I do agree using multiple dictionaries is a simple strategy.


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3136
---


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-11 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148679#comment-13148679
 ] 

Kannan Muthukkaruppan commented on HBASE-4608:
--

Li wrote: <<< The current bottleneck to HBase write speed is replicating the 
WAL appends across different datanodes. We can speed up this process by 
compressing the HLog. >>> 

Compression potentially adds some time, but then, yes, you save somewhere else 
in amount of stuff DFS has to do. I am curious what kind of improvement are you 
seeing with your changes. Without "sync" (deferred log flushing) the win might 
be even more. Perhaps, could you share some numbers with and without "sync".



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-11-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150139#comment-13150139
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2011-11-09 20:00:52, Nicolas Spiegelberg wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, 
lines 122-124
bq.  > 
bq.  >
bq.  > should compression be added to the HLogKey as well to compress 
regionName & table?  It seems like the biggest wins will come from table + 
region + family, which all have user-bounded values.  It might even make sense 
to have these values in a different dictionary from row & column qualifier, 
which can be unbounded and might accidentally dominate the dictionary
bq.  
bq.  Li Pi wrote:
bq.  Yes. I'm adding compression for regionname, table, and family as well. 
For this kind of simple 1 way associative dictionary, its likely that those two 
factors will end up dominating, but other more complex dictionaries can be 
used, perhaps with more interesting eviction strategies.
bq.  
bq.  I do agree using multiple dictionaries is a simple strategy.

FYI I already compress CF name along with column qualifier. Are regionname and 
table stored as part of the row key?


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3136
---


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-13 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169114#comment-13169114
 ] 

Lars Hofhansl commented on HBASE-4608:
--

Are you still working on this Li?
I think this is an important featured to get into HBase, especially if we want 
to do log archival for backups and PIT restores.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-13 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169117#comment-13169117
 ] 

Li Pi commented on HBASE-4608:
--

Yup. Just finished finals. So I have time again.
On Dec 13, 2011 10:40 PM, "Lars Hofhansl (Commented) (JIRA)" <



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175296#comment-13175296
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2011-12-23 06:00:24.065183)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
---

Some new things, for WALCompress.

I've modified TestWALReplay to test compression - this is a quick hack to have 
effective test cases. I'm building my own subset later. 

Integration is done, including config, but it doesn't all work yet. It worked 
before I tried compressing HLogKeys, SequenceFile seems to try to read them out 
of order, causing it to hit empty dictionary entries. Not sure what to do about 
this, any advice?

If you only compress KeyValues/WALEdits, it works fine.


Summary
---

Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with 
something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175299#comment-13175299
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4098
---



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


Apache headers go here.


- Li


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175310#comment-13175310
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4100
---


Maybe just do this for WALEdits/KeyValues for now and tackle HLogKey later.
Looks like hash collisions in SimpleDictionary could be nasty.

Other than that mostly whitespace.

Cool stuff.


src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


Should remove the year line.
Also some extra whitespace in this file.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java


Bunch of whitespace in here.
As said above, maybe do HLogKey in a separate jira.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java


bunch of whitespace in here.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java


whitespace



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java


I know this is not done, yet... But needs to be a fully qualified config 
name.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java


LOG.debug?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java


Hardcoding SimpleDictionary here?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java


year...



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java


What if you have a hash collision?
You now overwrite the old value that just happens to have the same hash 
code. Is that OK?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java


Here too; what happens for hash collisions?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java


Year... And trailing whitespace in here.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java


bunch of extra leading whitespace in this file



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java


Would sure be nice if we had a KeyValue interface and the implementations 
would just do the right thing.



src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java


I assume you'll tests with/without compression.


- Lars


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq

[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175314#comment-13175314
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2011-12-23 06:34:53, Lars Hofhansl wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 73
bq.  > 
bq.  >
bq.  > What if you have a hash collision?
bq.  > You now overwrite the old value that just happens to have the same 
hash code. Is that OK?

I overwrite the old value. As long as we do it for both reads and writes, thats 
okay! (The state of the dictionary must be consistent).


bq.  On 2011-12-23 06:34:53, Lars Hofhansl wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 82
bq.  > 
bq.  >
bq.  > Here too; what happens for hash collisions?

The old value would have been evicted by the latest value.


bq.  On 2011-12-23 06:34:53, Lars Hofhansl wrote:
bq.  > 
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java, line 
84
bq.  > 
bq.  >
bq.  > I assume you'll tests with/without compression.

I'm gonna write better tests, this is just sort of a hackwish way to make it 
work.


bq.  On 2011-12-23 06:34:53, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, 
line 130
bq.  > 
bq.  >
bq.  > Would sure be nice if we had a KeyValue interface and the 
implementations would just do the right thing.

Didn't want to create a new KeyValue, or modify it, rather - thus the 
CompressedKeyValue thing.

I can refactor this.


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4100
---


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLo

[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175315#comment-13175315
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2011-12-23 06:34:53, Lars Hofhansl wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 73
bq.  > 
bq.  >
bq.  > What if you have a hash collision?
bq.  > You now overwrite the old value that just happens to have the same 
hash code. Is that OK?
bq.  
bq.  Li Pi wrote:
bq.  I overwrite the old value. As long as we do it for both reads and 
writes, thats okay! (The state of the dictionary must be consistent).

I see, because read and write would do that in the same order.


bq.  On 2011-12-23 06:34:53, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, 
line 130
bq.  > 
bq.  >
bq.  > Would sure be nice if we had a KeyValue interface and the 
implementations would just do the right thing.
bq.  
bq.  Li Pi wrote:
bq.  Didn't want to create a new KeyValue, or modify it, rather - thus the 
CompressedKeyValue thing.
bq.  
bq.  I can refactor this.

That was just a general comment. I've thinking quite often how our life would 
be nice if KeyValue was just an interface rather than a concrete class. Fixing 
that would be a huge PITA... Different jira :)


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4100
---


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175317#comment-13175317
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2011-12-23 06:34:53, Lars Hofhansl wrote:
bq.  > Maybe just do this for WALEdits/KeyValues for now and tackle HLogKey 
later.
bq.  > Looks like hash collisions in SimpleDictionary could be nasty.
bq.  > 
bq.  > Other than that mostly whitespace.
bq.  > 
bq.  > Cool stuff.

Just did another test, looks like SequenceFile doesn't actually do it out of 
order, theres another bug making HLogKey break.


I'll figure it out later. probably after christmas.


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4100
---


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-25 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175855#comment-13175855
 ] 

Li Pi commented on HBASE-4608:
--

Okay. I'm confused.

I disabled compression, went back to trunk, and changed these lines of code in 
HLogKey

System.out.println("Writing region: " + this.encodedRegionName.hashCode());
Bytes.writeByteArray(out, this.encodedRegionName);
System.out.println("Writing table: " + this.tablename.hashCode());
Bytes.writeByteArray(out, this.tablename);

And

in.readFully(this.encodedRegionName);
System.out.println("Reading region: " + this.encodedRegionName.hashCode());
this.tablename = Bytes.readByteArray(in);
System.out.println("Reading table: " + this.tablename.hashCode());

then I ran test replay after partial flush.

Got this as output

PositionWritten 124
Writing region: 1251181435
Writing table: 446506621
PositionWritten 319
Writing region: 1251181435
Writing table: 446506621
PositionWritten 514
Writing region: 1251181435
Writing table: 446506621
PositionWritten 709
Writing region: 1251181435
Writing table: 446506621
PositionWritten 904
Writing region: 1251181435
Writing table: 446506621
PositionWritten 1099
Writing region: 1251181435
Writing table: 446506621
PositionWritten 1294
Writing region: 1251181435
Writing table: 446506621
PositionWritten 1489
Writing region: 1251181435
Writing table: 446506621
PositionWritten 1684
Writing region: 1251181435
Writing table: 446506621
PositionWritten 1879
Writing region: 1251181435
Writing table: 446506621
PositionWritten 2074
Writing region: 1251181435
Writing table: 446506621
PositionWritten 2289
Writing region: 1251181435
Writing table: 446506621
PositionWritten 2484
Writing region: 1251181435
Writing table: 446506621
PositionWritten 2679
Writing region: 1251181435
Writing table: 446506621
PositionWritten 2874
Writing region: 1251181435
Writing table: 446506621
PositionWritten 3069
Writing region: 1251181435
Writing table: 446506621
PositionWritten 3264
Writing region: 1251181435
Writing table: 446506621
PositionWritten 3459
Writing region: 1251181435
Writing table: 446506621
PositionWritten 3654
Writing region: 1251181435
Writing table: 446506621
PositionWritten 3849
Writing region: 1251181435
Writing table: 446506621
PositionWritten 4044
Writing region: 1251181435
Writing table: 446506621
PositionWritten 4239
Writing region: 1251181435
Writing table: 446506621
PositionWritten 4454
Writing region: 1251181435
Writing table: 446506621
PositionWritten 4649
Writing region: 1251181435
Writing table: 446506621
PositionWritten 4844
Writing region: 1251181435
Writing table: 446506621
PositionWritten 5039
Writing region: 1251181435
Writing table: 446506621
PositionWritten 5234
Writing region: 1251181435
Writing table: 446506621
PositionWritten 5429
Writing region: 1251181435
Writing table: 446506621
PositionWritten 5624
Writing region: 1251181435
Writing table: 446506621
PositionWritten 5819
Writing region: 1251181435
Writing table: 446506621
PositionWritten 124
Writing region: 736259394
Writing table: 510860944
PositionWritten 319
Writing region: 1336786910
Writing table: 403681456
PositionWritten 514
Writing region: 1336786910
Writing table: 403681456
PositionWritten 709
Writing region: 1336786910
Writing table: 403681456
PositionWritten 904
Writing region: 1336786910
Writing table: 403681456
PositionWritten 1099
Writing region: 1336786910
Writing table: 403681456
PositionWritten 1294
Writing region: 1336786910
Writing table: 403681456
PositionWritten 1489
Writing region: 1336786910
Writing table: 403681456
PositionWritten 1684
Writing region: 1336786910
Writing table: 403681456
PositionWritten 1879
Writing region: 1336786910
Writing table: 403681456
PositionWritten 2074
Writing region: 1336786910
Writing table: 403681456
PositionWritten 2289
Writing region: 1336786910
Writing table: 403681456
PositionWritten 2484
Writing region: 1336786910
Writing table: 403681456
PositionWritten 2679
Writing region: 1336786910
Writing table: 403681456
PositionWritten 2874
Writing region: 1336786910
Writing table: 403681456
PositionWritten 3069
Writing region: 1336786910
Writing table: 403681456
PositionWritten 3264
Writing region: 1336786910
Writing table: 403681456
PositionWritten 3459
Writing region: 1336786910
Writing table: 403681456
PositionWritten 3654
Writing region: 1336786910
Writing table: 403681456
PositionWritten 3849
Writing region: 1336786910
Writing table: 403681456
PositionWritten 4044
Writing region: 1336786910
Writing table: 403681456
PositionWritten 4239
Writing region: 1336786910
Writing table: 403681456
PositionWritten 4454
Writing region: 1336786910
Writing table: 403681456
PositionWritten 4649
Writing region: 1336786910
Writing table: 403681456
PositionWritten 4844
Writing region: 1336786910
Writing table: 403681456
PositionWritten 5039
Writing region: 1336786910
Writing table: 403681456
PositionWritten 5234
Writing region: 13367

[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176240#comment-13176240
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4121
---


good start.

A general design thing: rather than using these static 
readCompressed/writeCompressed methods, we can introduce an interface something 
like:
interface WALCompression {
  public int encodeKeyValue(KeyValue kv, byte[] out, int offset);
  ...
}

and then have the current non-compressed code path just be the default 
implementation of WALCompression -- and add a configuration which specifies the 
class to use as the implementor of this interface. We can also store the class 
name in the WAL metadata so that you can read compressed HLogs even if you are 
writing non-compressed ones (useful for replication if one cluster uses 
compression and the other doesn't, for example)


src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


no need to wrap lines



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


this should be // comments inside the function, rather than javadoc style 
comments above



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


we should probably use vints here - most keys and many values are <100bytes 
long, so we could store the lengths in 1 byte instead of the 4 used here



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


extra space



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


extra word "designed"?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


example should use arguments like "-u compressed-hlog uncompressed-hlog" 
rather than "filename" twice



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


check args.length first and print help if it's not got 3 args



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


should be an 'else if' -- and have a final 'else' clause that gives usage



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


TODO: need to change this config key to match our others



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


this assumes the whole log's content fits in memory, which shouldn't be 
necessary... why not loop reading one record from reader and writing one to 
writer?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


should have a finally { in.close(); } probably



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


should go in finally clause. Also use IOUtils.closeStream as long as "out" 
implements Closeable (I think it does?)



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


why not combine this with the if/else above?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


most of this byte is wasted - we're only using 2 of the 6 bits... and I 
think we could actually get rid of EMPTY as well.

If we limit the dictionaries to 32k entries, then we could use the 
following:

If bit 0 == 0: dictionary reference
  bits 1 through 15: the dictionary index
if bit 0 == 1: new value
  start a varint encoding in this byte

but let's leave this as is for now just to get the rest of the code-level 
issues cleaned up



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


rather than this, why not use varints here so you don't have to specify up 
front what the size is?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


use constant



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java


since we have several methods that take all

[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176259#comment-13176259
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4125
---



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


public class SequenceFileLogWriter implements HLog.Writer {
And
  public interface Writer {

There is no Closeable mentioned above although Writer has close() method.


- Ted


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-28 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176984#comment-13176984
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2011-12-29 04:38:25.385999)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
---

added tests. fixed code issues as mentioned by todd.


Summary
---

Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with 
something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java
 PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-28 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176989#comment-13176989
 ] 

Li Pi commented on HBASE-4608:
--

This should be a good time to mention that, at this point, the patch is working.

There is some refactoring to make it prettier, and room for
optimization, but please test out the compressor! (with a realistic
load and see how much improvement it gains.)

Compressor.java contains a command line compression tool that you can
use. Just run this against a HLog and check the differing sizes of the
outputs.

On Wed, Dec 28, 2011 at 8:38 PM, jirapos...@reviews.apache.org


> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-28 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176983#comment-13176983
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 161
bq.  > 
bq.  >
bq.  > use constant

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 48
bq.  > 
bq.  >
bq.  > LOG.isDebugEnabled -- or maybe this should even be TRACE level

removed this completely, not needed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 34
bq.  > 
bq.  >
bq.  > private final

removed completely.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 32
bq.  > 
bq.  >
bq.  > this should be all caps -- but also probably something from the 
configuration

changed


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 
23
bq.  > 
bq.  >
bq.  > does it have to be public?

now default.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 57
bq.  > 
bq.  >
bq.  > hashCode() on a byte[] is identity-based - you should use 
Bytes.hashCode()

yup. i just figured this out. cost me a ton of pain. was wondering why things 
weren't compressing the way they should.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, 
lines 82-85
bq.  > 
bq.  >
bq.  > indentation

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java,
 lines 144-150
bq.  > 
bq.  >
bq.  > again the Context object here would make things a little cleaner to 
integrate:
bq.  > - you can drop "compression" boolean and just check "if 
(compressionContext != null)"
bq.  > - you only add one integration point to the existing code instead of 
lots of new member vars

will do in a refactoring pass.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 90
bq.  > 
bq.  >
bq.  > I'd call this clear()

done.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 64
bq.  > 
bq.  >
bq.  > equals is identity based here... should use Bytes.equals()
bq.  > 
bq.  > Also Bytes.equals I believe handles nulls, so you can collapse two 
of these three clauses together

also just figured this out.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 
1655
bq.  > 
bq.  >
bq.  > since we have several methods that take all these parameters, and we 
might want to change the compression scheme in the future, I think it makes 
sense to introduce a class WALCompressionContext with getters for each of the 
dictionaries

Will make a compression context during refactoring.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
lines 57-58
bq.  > 
bq.  >
bq.  > we should probably use vints here - most keys and many values are 
<100bytes long, so we could store the lengths in 1 byte instead of the 4 used 
here

Will do. I didn't bother compression the size values in KeyValue. Should do 
that as well - squeeze out extra space.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 70
bq.  > 

[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-30 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177845#comment-13177845
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2011-12-31 00:20:40.770066)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
---

WritableContext makes things cleaner. Some space optimizations to make 
compression even more efficient.


Summary
---

Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with 
something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java
 PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-30 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177852#comment-13177852
 ] 

Zhihong Yu commented on HBASE-4608:
---

@Li:
Do you want submit latest patch to Hadoop QA ?

Thanks

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-30 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177873#comment-13177873
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2011-12-31 02:06:00.510532)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
---

fixed a failing test.


Summary
---

Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with 
something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java
 PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-30 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177874#comment-13177874
 ] 

Li Pi commented on HBASE-4608:
--

Yup. good time to do it.

On Fri, Dec 30, 2011 at 4:35 PM, Zhihong Yu (Commented) (JIRA)


> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178033#comment-13178033
 ] 

Zhihong Yu commented on HBASE-4608:
---

@Li:
Please use '--no-prefix' to generate diff.
Otherwise Hadoop QA won't be able to apply your patch.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-31 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178075#comment-13178075
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2011-12-31 20:19:11.951711)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary (updated)
---

HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java
 PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178078#comment-13178078
 ] 

Zhihong Yu commented on HBASE-4608:
---

@Li:
Hadoop QA is taking a vacation :-) See 
https://builds.apache.org/job/PreCommit-HBASE-Build/

I ran patch v5 on Linux and didn't observe notable issue. But then you have a 
new patch.
Please try to run your latest patch through test suite.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-31 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178079#comment-13178079
 ] 

Lars Hofhansl commented on HBASE-4608:
--

@Li: How big do you expect the in-memory dictionary to grow?
I was wondering if the reading or writing process could give the compressor 
hints about when would be a good time to reset the dictionary (for example when 
memstore flush entry was found).
The compressor can choose to ignore the hints and use some internal logic, or 
reset the dictionary when it got hinted.


> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-31 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178084#comment-13178084
 ] 

Li Pi commented on HBASE-4608:
--

max size = 64k * around 100-200 bytes. Really not that big. Less than 100 
megabytes.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-31 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178085#comment-13178085
 ] 

Li Pi commented on HBASE-4608:
--

I was thinking of replacing the 1-way associative with a a 127 sized LRU 
dictionary. should allow us to save a few bytes, and also be far more efficient 
with our eviction strategy.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178086#comment-13178086
 ] 

Zhihong Yu commented on HBASE-4608:
---

How about using Guava's MapMaker ?
>From SingleSizeCache.java:
{code}
backingMap = new MapMaker().maximumSize(numBlocks - 1)
.evictionListener(listener).makeMap();
{code}

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2011-12-31 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178087#comment-13178087
 ] 

Li Pi commented on HBASE-4608:
--

Guava's mapmaker doesn't guarantee consistent eviction. You'd want to either 
use 2 LinkedHashMap's or your own LRU style system.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-01 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178131#comment-13178131
 ] 

Hadoop QA commented on HBASE-4608:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508998/4608v6.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -149 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 78 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/645//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/645//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/645//console

This message is automatically generated.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-02 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178580#comment-13178580
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4172
---



src/main/java/org/apache/hadoop/hbase/HConstants.java


This name may refer to the compression algorithm.
I think the word 'enable' should be part of the name.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


No year needed.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


This javadoc should be combined with above block.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


Should read 'Compresses and ...'



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java


Add license, please.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java


Add jaavdoc for this class, please.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


License, please.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


Add javadoc for the parameters, please.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java


Should there be disableCompression ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java


Remove this year line, please.


- Ted


On 2011-12-31 20:19:11, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2011-12-31 20:19:11)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java
 PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also

[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-05 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180967#comment-13180967
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2012-01-06 00:01:44.856233)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
---

Added a LRU dictionary. Should be more efficient than a 1-way associative cache.


Summary
---

HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java
 PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-05 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180999#comment-13180999
 ] 

Hadoop QA commented on HBASE-4608:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509633/4608v7.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -149 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 80 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/677//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/677//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/677//console

This message is automatically generated.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181756#comment-13181756
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2012-01-07 01:25:20.762498)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
---

Addressed ted yu's changes. Also switched SimpleDictionary to LRUDictionary. 
Much smarter eviction algorithm.


Summary
---

HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181757#comment-13181757
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 
1655
bq.  > 
bq.  >
bq.  > Should there be disableCompression ?

Compression is always enabled if config. Otherwise decompressor won't know 
whether to try to decompress the log or not.


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 2
bq.  > 
bq.  >
bq.  > Remove this year line, please.

Done.


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 137
bq.  > 
bq.  >
bq.  > Add javadoc for the parameters, please.

Added.


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 1
bq.  > 
bq.  >
bq.  > License, please.

Done


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/HConstants.java, line 579
bq.  > 
bq.  >
bq.  > This name may refer to the compression algorithm.
bq.  > I think the word 'enable' should be part of the name.

fixed.


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
line 2
bq.  > 
bq.  >
bq.  > No year needed.

fixed.


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
line 51
bq.  > 
bq.  >
bq.  > This javadoc should be combined with above block.

fixed.


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
line 87
bq.  > 
bq.  >
bq.  > Should read 'Compresses and ...'

fixed.


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java, 
line 1
bq.  > 
bq.  >
bq.  > Add license, please.

fixed.


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4172
---


On 2012-01-07 01:25:20, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-07 01:25:20)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/Te

[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181790#comment-13181790
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2012-01-07 03:13:33.237858)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary
---

HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-06 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181831#comment-13181831
 ] 

Hadoop QA commented on HBASE-4608:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509763/4608v8fixed.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -149 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 82 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.regionserver.wal.TestHLog
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/693//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/693//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/693//console

This message is automatically generated.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181846#comment-13181846
 ] 

Zhihong Yu commented on HBASE-4608:
---

>From 
>https://builds.apache.org/job/PreCommit-HBASE-Build/693//testReport/org.apache.hadoop.hbase.regionserver.wal/TestHLog/testAppendClose/:
{code}
java.net.BindException: Problem binding to localhost/127.0.0.1:50150 : Address 
already in use
at org.apache.hadoop.ipc.Server.bind(Server.java:227)
{code}
Strange, was the above caused by parallel test case execution ?

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181869#comment-13181869
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4232
---



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


Add javadoc please.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


Please give this config parameter better name.
How about 'hbase.regionserver.wal.compressed' ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


Would this be able to hold large number of HLog.Entry's in memory ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


Since short is signed, how do I know that the return value would be 
positive ?
e.g. (short)0xFE00 == -512



src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java


I suggest naming this class Node.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java


Is compressed a better name ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java


White space makes indentation look weird.



src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java


Please add Apache license.



src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java


Add short javadoc and test category, please.



src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java


Please remove year.



src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java


You will add the real test, right ?

Also, missing test category.


- Ted


On 2012-01-07 03:13:33, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-07 03:13:33)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.tx

[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-09 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183023#comment-13183023
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 35
bq.  > 
bq.  >
bq.  > Add javadoc please.

Done.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 84
bq.  > 
bq.  >
bq.  > Please give this config parameter better name.
bq.  > How about 'hbase.regionserver.wal.compressed' ?

Done.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 91
bq.  > 
bq.  >
bq.  > Would this be able to hold large number of HLog.Entry's in memory ?

An HLog is at most 400mb, should be okay?


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 229
bq.  > 
bq.  >
bq.  > Since short is signed, how do I know that the return value would be 
positive ?
bq.  > e.g. (short)0xFE00 == -512

if the hi bit is negative, (we read that), then we do something else, because 
its not part of the dictionary. added an assert anyways.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 
166
bq.  > 
bq.  >
bq.  > I suggest naming this class Node.

Done.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, 
line 74
bq.  > 
bq.  >
bq.  > Is compressed a better name ?

Done.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, 
line 127
bq.  > 
bq.  >
bq.  > White space makes indentation look weird.

fixed.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > 
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java, 
line 1
bq.  > 
bq.  >
bq.  > Please add Apache license.

fixed.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > 
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java, 
line 12
bq.  > 
bq.  >
bq.  > Add short javadoc and test category, please.

test category - small?


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > 
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java,
 line 2
bq.  > 
bq.  >
bq.  > Please remove year.

Done.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > 
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java,
 line 29
bq.  > 
bq.  >
bq.  > You will add the real test, right ?
bq.  > 
bq.  > Also, missing test category.

This is actually a really good test. If testWALReplay works after compression 
is enabled, then the compression/decompression is working. This is the real 
test.


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4232
---


On 2012-01-10 02:34:06, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-10 02:34:06)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regi

[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-09 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183024#comment-13183024
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2012-01-10 02:34:06.162265)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary
---

HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-10 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183495#comment-13183495
 ] 

Zhihong Yu commented on HBASE-4608:
---

I got the following on my MacBook for 4608v9.txt:
{code}
testReplayEditsWrittenViaHRegion(org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed)
  Time elapsed: 2.009 sec  <<< FAILURE!
java.lang.AssertionError
  at org.junit.Assert.fail(Assert.java:92)
  at org.junit.Assert.assertTrue(Assert.java:43)
  at org.junit.Assert.assertTrue(Assert.java:54)
  at 
org.apache.hadoop.hbase.regionserver.wal.TestWALReplay.testReplayEditsWrittenViaHRegion(TestWALReplay.java:289)
{code}

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185361#comment-13185361
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2012-01-13 00:58:40.183584)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
---

fixed failing test. added a few new ones to detect LRU dictionary failure.


Summary
---

HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185377#comment-13185377
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2012-01-13 01:34:31.569679)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
---

fixed bug in dictionary causing another test to fail. passes small tests now. 
running med tests.


Summary
---

HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185378#comment-13185378
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2012-01-13 01:37:35.790343)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
---

removed debug printf.


Summary
---

HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-19 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189520#comment-13189520
 ] 

Zhihong Yu commented on HBASE-4608:
---

I tried to run TestWALReplayCompressed:
{code}
Running org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.008 sec

Results :

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0

[INFO] 
[INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase 
---
[INFO] Tests are skipped.
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 3:50.838s
{code}
Looks like the ShutdownHooks took a long time to finish:
{code}
"main" prio=5 tid=104000800 nid=0x100601000 in Object.wait() [10060]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1210)
- locked <78e887470> (a org.apache.hadoop.fs.FileSystem$ClientFinalizer)
at java.lang.Thread.join(Thread.java:1263)
at 
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79)
at 
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24)
at java.lang.Shutdown.runHooks(Shutdown.java:79)
at java.lang.Shutdown.sequence(Shutdown.java:123)
at java.lang.Shutdown.exit(Shutdown.java:168)
- locked <7faf9d288> (a java.lang.Class for java.lang.Shutdown)
at java.lang.Runtime.exit(Runtime.java:90)
at java.lang.System.exit(System.java:921)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:73)
{code}

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-19 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189530#comment-13189530
 ] 

Li Pi commented on HBASE-4608:
--

Are the shutdown hooks slower than TestWALReplay without compression?

On Thu, Jan 19, 2012 at 4:12 PM, Zhihong Yu (Commented) (JIRA)


> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-19 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189534#comment-13189534
 ] 

Zhihong Yu commented on HBASE-4608:
---

Similar result:
{code}
Running org.apache.hadoop.hbase.regionserver.wal.TestWALReplay
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.865 sec

Results :

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0

[INFO] 
[INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase 
---
[INFO] Tests are skipped.
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 3:46.105s
{code}

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190189#comment-13190189
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4508
---



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


'/less' should be removed.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


javadoc needs update.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


Either remove the word 'a' or change it into 'an'



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


Please change ourKV to keyval or something similar.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


Update javadoc to match the context parameter.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


I think adding 'the effect of compression would be good' at the end would 
make the sentence more easily understandable.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java


Remove whitespace.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java


This javadoc is more suitable for the init() method.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java


Please include e in new IOE.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java


Please include e in the new IOE.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java


Please remove year.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java


Please put this line at the end of line 34.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java


'ad' should be 'add'


- Ted


On 2012-01-13 01:37:35, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-13 01:37:35)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Comp

[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-21 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190573#comment-13190573
 ] 

Lars Hofhansl commented on HBASE-4608:
--

It occurred to me yesterday that we should clear the dictionaries after each 
successful memstore flush...?
Otherwise we might have to go further back in the log than necessary in order 
to replay.

I realize memstore flushes a pre region, whereas the WAL is per region server, 
still it seems prudent to reset the dictionary after each flush. Thoughts?

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192024#comment-13192024
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
line 34
bq.  > 
bq.  >
bq.  > '/less' should be removed.

fixed.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
line 42
bq.  > 
bq.  >
bq.  > javadoc needs update.

fixed.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
line 43
bq.  > 
bq.  >
bq.  > Either remove the word 'a' or change it into 'an'

fixed.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
line 78
bq.  > 
bq.  >
bq.  > Please change ourKV to keyval or something similar.

fixed.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
line 82
bq.  > 
bq.  >
bq.  > Update javadoc to match the context parameter.

fixed.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
line 94
bq.  > 
bq.  >
bq.  > I think adding 'the effect of compression would be good' at the end 
would make the sentence more easily understandable.

fixed


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java,
 line 60
bq.  > 
bq.  >
bq.  > Remove whitespace.

fixed.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java,
 line 154
bq.  > 
bq.  >
bq.  > This javadoc is more suitable for the init() method.

fixed.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java,
 line 186
bq.  > 
bq.  >
bq.  > Please include e in new IOE.

fixed. I assume you mean store it as the cause.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java,
 line 93
bq.  > 
bq.  >
bq.  > Please include e in the new IOE.

fixed above.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 
2
bq.  > 
bq.  >
bq.  > Please remove year.

fixed above.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 
35
bq.  > 
bq.  >
bq.  > Please put this line at the end of line 34.

fixed


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 
53
bq.  > 
bq.  >
bq.  > 'ad' should be 'add'

fixed.


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4508
---


On 2012-01-13 01:37:35, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-13 01:37:35)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq

[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192025#comment-13192025
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2012-01-24 09:00:37.768707)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary
---

HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  CHANGES.txt 1d7238e 
  bin/hbase 350abef 
  bin/hbase-daemon.sh 5c42ac1 
  dev-support/findHangingTest.sh PRE-CREATION 
  pom.xml 6566a1c 
  src/docbkx/book.xml c67ca06 
  src/docbkx/configuration.xml 7fd90e7 
  src/docbkx/ops_mgt.xml f93c9f2 
  src/docbkx/performance.xml e61248f 
  src/docbkx/preface.xml 10fa755 
  src/docbkx/troubleshooting.xml 0b7c93a 
  src/docbkx/upgrading.xml c0642f5 
  src/main/jamon/org/apache/hbase/tmpl/regionserver/RSStatusTmpl.jamon 24caabd 
  src/main/java/org/apache/hadoop/hbase/HBaseConfiguration.java 0477be8 
  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 8ec5042 
  src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java 6cdeec1 
  src/main/java/org/apache/hadoop/hbase/client/ConnectionUtils.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/client/Delete.java 51bbc63 
  src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8cd9bd0 
  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 0e78d96 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 852a810 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 839d79b 
  src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 0bc9577 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 4135e55 
  src/main/java/org/apache/hadoop/hbase/client/RowMutation.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 9b568e3 
  
src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java 
0d4a9e4 
  
src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
ba3414d 
  src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
f25ba11 
  src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 
b47423c 
  src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 9002a0f 
  src/main/java/org/apache/hadoop/hbase/ipc/ExecRPCInvoker.java 3ad6cd5 
  src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java 07ddbca 
  src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 4327a44 
  src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java 39c73f5 
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
bd574b2 
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormat.java 3dcbf74 
  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java e6f8a6e 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084 
  src/main/java/org/apache/hadoop/hbase/master/LoadBalancerFactory.java 89685bb 
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 3938fa7 
  src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 9de1784 
  src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 667a8b1 
  src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java 
2dfc3e7 
  
src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 
2dd497b 
  src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java 
493dcdb 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java fb4ec05 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 3917d40 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionThriftServer.java 
18b6c13 
  src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java c840e7c 
  src/main/java/org/apache/hadoop/hbase/regionserver/OperationStatus.java 
b6f7456 
  src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 
7cee17c 
  src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff 
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java b928731 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java bd6f70d 
  
src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseMetaHandler.java
 e8e95ed 
  
src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
 a25ca32 
  
src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRootHandler.java
 fa38ad6 
  
src/main/java/org/apache/hadoop/hb

[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192033#comment-13192033
 ] 

Li Pi commented on HBASE-4608:
--

@Lars

Unless we know when exactly the dictionary is flushed, we can't rebuild the 
original HLog, can't we?

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192252#comment-13192252
 ] 

Lars Hofhansl commented on HBASE-4608:
--

The flush will place a special WAL entry. See HLog.completeCacheFlush(...).
The compressor could take this as a flag to reset the dictionary.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192420#comment-13192420
 ] 

Todd Lipcon commented on HBASE-4608:


Why reset on flush? Seems to me we need to reset on log roll, but not flush.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192431#comment-13192431
 ] 

Lars Hofhansl commented on HBASE-4608:
--

On recovery we'd always to have scan the entire log from the beginning. Maybe 
that's not a big deal, because log size in limited?

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192494#comment-13192494
 ] 

Todd Lipcon commented on HBASE-4608:


Don't we already have to scan the entire log from the beginning on recovery? 
Log splitting splits entire segments, afaik. Am I forgetting about some index 
structure or something?

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192580#comment-13192580
 ] 

Lars Hofhansl commented on HBASE-4608:
--

You know more about that than I do :)
I'm saying that we do not need to scan the entire log, especially if we add 
some custom log replaying tools (for example replaying for region).
If we're not careful now we shut ourselves out from future optimizations.
Might not be a big deal as the logs are rolled anyway and that naturally limits 
the amount of WALEdit we have to scan go back to find a dictionary.


> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192604#comment-13192604
 ] 

Nicolas Spiegelberg commented on HBASE-4608:


I think, if we want to avoid scanning the entire log and seek as an 
optimization, we should put more effort into rolling logs at a lower size 
threshold and having log GC be size-based and get rid of (or greatly raise) the 
file-count-based pressure.

In production, the major bottleneck for us in log replay (after distributed log 
splitting) has been IO dominated.  We normally don't max out CPU.  Anything we 
can do to minimize IO size at the expense of CPU would be beneficial to 
reduction.

As an aside, do we currently compress the output of our log split?  Having the 
output of the resulting per-region logs be in LZO or GZ format will decrease 
our reply time, perhaps more than this optimization will.  That said, this 
feature is very useful, just want to make sure that we're not missing less cool 
but potentially more beneficial optimizations.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192613#comment-13192613
 ] 

Todd Lipcon commented on HBASE-4608:


Nope, we don't currently compress the log-split output. Good idea, Nicolas. We 
can use both compression mechanisms there - LZO/Snappy on top of the dictionary 
compression should be very good. The dictionary compression alone will be a big 
improvement there, though, since we'll save len(region key) bytes per edit 
guaranteed.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192619#comment-13192619
 ] 

Li Pi commented on HBASE-4608:
--

I need to run a test against LZO or GZ. I wouldn't be surprised if 4608 is more 
efficient on some inputs - it's very well tailored for certain kinds of data.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192622#comment-13192622
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2012-01-24 22:26:21.830142)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
---

last diff was against the wrong (non-trunk) branch.


Summary
---

HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java c92cc02 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192625#comment-13192625
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2012-01-24 22:27:32.723446)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary
---

HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
  src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java c92cc02 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192627#comment-13192627
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2012-01-24 22:29:18.791094)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary
---

HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-24 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192878#comment-13192878
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4585
---


Nice work.
Will try out the Compressor tool.


src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


Should we verify that length is larger than pos ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


I would expect different implementations to be instantiated based on the 
prefix of path.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


Why do we instantiate Configuration again (there is already one @ line 113) 
?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


Typo, should read 'to start reading from'.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


NOT_IN_DICTIONARY should be used here.


- Ted


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-31 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197521#comment-13197521
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4732
---


Only got about halfway through. Will continue to look soon. Overall looking 
pretty good!


src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


I'd rename this class to KeyValueCompression or even KVCompression. Then 
rename readFields to just "read" -- since this is just utility functions, not 
actually an instance of a compressed keyvalue.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java


rather than using keyVal.getRow(), keyVal.getFamily(), 
keyVal.getQualifer(), you should use the versions of those functions that just 
return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the 
writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're 
making needless copies/garbage here.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java


Since this is so simple, I'd move it to be a static inner class of 
KVCompression above



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


I think we can merge this with the other class that just has static methods 
as well.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


this function requires that the whole log data fit in RAM - not a great 
assumption



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


why is this split into two if/elses? looks like the top clauses can be 
combined, as can the bottom clauses



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


switch order of "in" and "offset" here.

Perhaps clearer to name this as "uncompressIntoArray"?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


worth a comment here to explain that the "status" byte actually has the 
high-order byte of the dictionary entry in the case that it's in the dictionary



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


*un*compressed value, right?


- Todd


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews

[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-31 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197529#comment-13197529
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4736
---



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


If we use 
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ArrayBlockingQueue.html#offer%28E,%20long,%20java.util.concurrent.TimeUnit%29,
 we should be able to tell that the queue is full.
This implies that readFile() would be called multiple times for a single 
file.


- Ted


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-01-31 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197591#comment-13197591
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-02-01 02:50:08, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 100
bq.  > 
bq.  >
bq.  > If we use 
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ArrayBlockingQueue.html#offer%28E,%20long,%20java.util.concurrent.TimeUnit%29,
 we should be able to tell that the queue is full.
bq.  > This implies that readFile() would be called multiple times for a 
single file.

That's beside the point. Using a queue here is just silly. reading a file 
should probably be a different interface altogether rather than writing to a 
queue -- ie it should be a pull interface, not a push.

I also mentioned to Li offline that it would make sense to add a metadata 
header to the HLog sequencefiles which indicates that they're compressed. In 
that case, this code could just use the existing log reader code and log writer 
code, but vary the output between compressed/uncompressed using the 
configuration flag.


- Todd


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4736
---


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-03 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199925#comment-13199925
 ] 

Kannan Muthukkaruppan commented on HBASE-4608:
--

Li: Is there a writeup/description of the scheme that this patch is 
implementing? If not, would you mind giving a quick overview. Thanks much.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201997#comment-13201997
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4852
---


I tried to use the command line tool to compress an HLog written by 0.92 and 
got the follwoing:

Exception in thread "main" java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.close(SequenceFileLogReader.java:192)
at 
org.apache.hadoop.hbase.regionserver.wal.Compressor.readFile(Compressor.java:104)
at 
org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:64)

Also, if you use the command line tool with no arguments, it should print its 
help (right now it prints an IndexOutOfBOundsException).

I'll try again with an hlog written by trunk - I'm guessing the hlog 
serialization version might have changed or something.

- Todd


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202032#comment-13202032
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4853
---


I tried the compression tool on a log created by YCSB in "load" mode with the 
standard dataset. Since the values are fairly large here (100 bytes) it didn't 
get a huge compression ratio - from about 64MB down to 52MB (~20%). But still 
not bad. I looked at the resulting data using xxd and it looks like there's 
still a number of places where we could use variable length integers instead of 
non-variable length. I wrote a quick C program to count the number of 0x00 
bytes in the log and found about 3MB worth (~5%). Since the actual table data 
is all human-readable text in this case, all of the 0x00s should be able to be 
compressed away, I think.

I also tested on a YCSB workload where each row has 1000 columns of 4 bytes 
each (similar to an indexing workload) and the compression ratio was 60% (64M 
down to 25M) with another 4.2MB of 0x00 bytes which could probably be removed.


src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


invert the order of these || clauses - otherwise you get an out-of-bounds 
just running the tool with no arguments



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


I think the better way of expressing this usage would be:

WALCompressor [-u | -c]  

  -u - uncompresses the input log
  -c - compresses the output log

Exactly one of -u or -c must be specified





src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


this code doesn't work properly. Here's what you want to do:

  Configuration conf = new Configuration();
  FileSystem fs = path.getFileSystem(conf);



- Todd


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> 

[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-07 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202158#comment-13202158
 ] 

Li Pi commented on HBASE-4608:
--

The compression uses 2 byte dictionary indices, so the first 255 entries should 
start off with 0x00. This might be causing it.

@Karthik, I'll try to get documentation out when I'm less busy. This quarter is 
pretty painful so far.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-13 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207444#comment-13207444
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5066
---


Nice patch and good job !  I have two questions inline and maybe I just 
misunderstood the code.


src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


WritableUtils.getVIntSize could help you to decide how many bytes are need 
for the entry. So you don't need to pass down sizeBytes in this function.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


Should the data be added back to the dict in this case?
dict.addEntry(data) ?


- Liyin


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-13 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207460#comment-13207460
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5068
---



src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java


Look like there are side effect to call findEntry() since you will put the 
data into the dictionary.



- Liyin


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208226#comment-13208226
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-01-25 06:20:23, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 112
bq.  > 
bq.  >
bq.  > I would expect different implementations to be instantiated based on 
the prefix of path.

I figured people would only use this on their local machine. I guess the path 
can actually point to HDFS. Got any examples of how to do this easily?


bq.  On 2012-01-25 06:20:23, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 116
bq.  > 
bq.  >
bq.  > Why do we instantiate Configuration again (there is already one @ 
line 113) ?

Hmm. Good point. Waste of heap, but I wasn't really optimizing the command line 
tool. Fixed!


bq.  On 2012-01-25 06:20:23, Ted Yu wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
line 71
bq.  > 
bq.  >
bq.  > Should we verify that length is larger than pos ?

I don't think it makes a difference. 


bq.  On 2012-01-25 06:20:23, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 169
bq.  > 
bq.  >
bq.  > Typo, should read 'to start reading from'.

fixed.


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4585
---


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
> 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please conta

[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208228#comment-13208228
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
---

(Updated 2012-02-15 04:57:45.411924)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
---

fixed as per ted yu's review


Summary
---

HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.


This addresses bug HBase-4608.
https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
23d27fd 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
---


Thanks,

Li



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 
> 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208234#comment-13208234
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5113
---



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java


FileSystem has the following methods:

  /** Returns the configured filesystem implementation.*/
  public static FileSystem get(Configuration conf) throws IOException {

  public static FileSystem get(URI uri, Configuration conf) throws 
IOException {

I think the second get() should allow you to read HLog on hdfs


- Ted


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
23d27fd 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 
> 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-14 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208236#comment-13208236
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-02-15 05:23:04, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 112
bq.  > 
bq.  >
bq.  > FileSystem has the following methods:
bq.  > 
bq.  >   /** Returns the configured filesystem implementation.*/
bq.  >   public static FileSystem get(Configuration conf) throws 
IOException {
bq.  > 
bq.  >   public static FileSystem get(URI uri, Configuration conf) throws 
IOException {
bq.  > 
bq.  > I think the second get() should allow you to read HLog on hdfs

see my earlier comment on this review: path.getFilesystem(conf) is what you 
want to use


- Todd


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5113
---


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
23d27fd 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 
> 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-15 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208358#comment-13208358
 ] 

Hadoop QA commented on HBASE-4608:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12514600/4608v13.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/966//console

This message is automatically generated.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 
> 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-15 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208763#comment-13208763
 ] 

Zhihong Yu commented on HBASE-4608:
---

@Li:
Do you have time to address Todd and Liying's comments ?

Thanks

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 
> 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-15 Thread Li Pi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209015#comment-13209015
 ] 

Li Pi commented on HBASE-4608:
--

Doing so right now. Will be done before weekend.

> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 
> 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-17 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210601#comment-13210601
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
line 37
bq.  > 
bq.  >
bq.  > I'd rename this class to KeyValueCompression or even KVCompression. 
Then rename readFields to just "read" -- since this is just utility functions, 
not actually an instance of a compressed keyvalue.

fixed. legacy name. <3 eclipse.


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 207
bq.  > 
bq.  >
bq.  > *un*compressed value, right?

fixed.


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java, 
line 28
bq.  > 
bq.  >
bq.  > Since this is so simple, I'd move it to be a static inner class of 
KVCompression above

fixed.


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 152
bq.  > 
bq.  >
bq.  > why is this split into two if/elses? looks like the top clauses can 
be combined, as can the bottom clauses

fixed.


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 174
bq.  > 
bq.  >
bq.  > switch order of "in" and "offset" here.
bq.  > 
bq.  > Perhaps clearer to name this as "uncompressIntoArray"?

fixed.


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 44
bq.  > 
bq.  >
bq.  > I think we can merge this with the other class that just has static 
methods as well.

Compressor contains static methods for general purpose compression. 
KeyValueCompression.java contains static methods for compressing the KeyValue 
type. Should I merge them?


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 185
bq.  > 
bq.  >
bq.  > worth a comment here to explain that the "status" byte actually has 
the high-order byte of the dictionary entry in the case that it's in the 
dictionary

done


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
line 96
bq.  > 
bq.  >
bq.  > rather than using keyVal.getRow(), keyVal.getFamily(), 
keyVal.getQualifer(), you should use the versions of those functions that just 
return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the 
writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're 
making needless copies/garbage here.

This is gonna take a while. Since I'm currently relying on default 
Array.HashCode. Will need to use Bytes.HashCode and do a wrapper for insertion 
into the dictionary.


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 100
bq.  > 
bq.  >
bq.  > this function requires that the whole log data fit in RAM - not a 
great assumption

old one. will do eventually...


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4732
---


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.  

[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-17 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210655#comment-13210655
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-02-14 02:29:24, Liyin Tang wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 
42
bq.  > 
bq.  >
bq.  > Look like there are side effect to call findEntry() since you will 
put the data into the dictionary.
bq.  >

This is intentional. When we look for an entry, that means we intend to 
compress with it. If we don't find it, then its inserted into the dictionary.


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5068
---


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
23d27fd 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 
> 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212245#comment-13212245
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-02-14 01:33:09, Liyin Tang wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 230
bq.  > 
bq.  >
bq.  > Should the data be added back to the dict in this case?
bq.  > dict.addEntry(data) ?

This is taken care of during findentry.


bq.  On 2012-02-14 01:33:09, Liyin Tang wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 192
bq.  > 
bq.  >
bq.  > WritableUtils.getVIntSize could help you to decide how many bytes 
are need for the entry. So you don't need to pass down sizeBytes in this 
function.

This is part of the way HBase stores data uncompressed. It doesn't use an vInt.


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5066
---


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
23d27fd 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 
> 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212812#comment-13212812
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-02-15 05:23:04, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 112
bq.  > 
bq.  >
bq.  > FileSystem has the following methods:
bq.  > 
bq.  >   /** Returns the configured filesystem implementation.*/
bq.  >   public static FileSystem get(Configuration conf) throws 
IOException {
bq.  > 
bq.  >   public static FileSystem get(URI uri, Configuration conf) throws 
IOException {
bq.  > 
bq.  > I think the second get() should allow you to read HLog on hdfs
bq.  
bq.  Todd Lipcon wrote:
bq.  see my earlier comment on this review: path.getFilesystem(conf) is 
what you want to use

fixed. hopefully this should work.


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5113
---


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
23d27fd 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 
> 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212813#comment-13212813
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--



bq.  On 2012-02-07 02:58:00, Todd Lipcon wrote:
bq.  > I tried the compression tool on a log created by YCSB in "load" mode 
with the standard dataset. Since the values are fairly large here (100 bytes) 
it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%). 
But still not bad. I looked at the resulting data using xxd and it looks like 
there's still a number of places where we could use variable length integers 
instead of non-variable length. I wrote a quick C program to count the number 
of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual 
table data is all human-readable text in this case, all of the 0x00s should be 
able to be compressed away, I think.
bq.  > 
bq.  > I also tested on a YCSB workload where each row has 1000 columns of 4 
bytes each (similar to an indexing workload) and the compression ratio was 60% 
(64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be 
removed.

checked it out. looks like in YCSB workloads the 0x00 bytes are actually 
indexes pointing to the 0th entry of the dictionary.


bq.  On 2012-02-07 02:58:00, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 52
bq.  > 
bq.  >
bq.  > invert the order of these || clauses - otherwise you get an 
out-of-bounds just running the tool with no arguments

fixed.


bq.  On 2012-02-07 02:58:00, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
lines 86-88
bq.  > 
bq.  >
bq.  > this code doesn't work properly. Here's what you want to do:
bq.  > 
bq.  >   Configuration conf = new Configuration();
bq.  >   FileSystem fs = path.getFileSystem(conf);
bq.  >

fixed.


- Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4853
---


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
23d27fd 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



> HLog Compression
> 
>
> Key: HBASE-4608
> URL: https://issues.apache.org/jira/browse/HBASE-4608
> Project: HBase
>  Issue Type: New Feature
>Reporter: Li Pi
>Assignee: Li Pi
> Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 
> 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write spe

  1   2   3   >