[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145984#comment-13145984 ]
jirapos...@reviews.apache.org commented on HBASE-4608: ------------------------------------------------------ bq. On 2011-11-07 23:39:59, Lars Hofhansl wrote: bq. > Cool stuff. bq. > bq. > I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again. bq. > bq. > Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index. bq. > On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right? bq. > bq. > (As I said, I am probably missing something). bq. > bq. > See minor comments inline. bq. bq. Li Pi wrote: bq. You aren't missing anything! Thats exactly how it works. bq. bq. Each WAL starts off with a brand new shiny dictionary. We build up the dictionary as we write, and when we read, we start off with a shiny new dictionary again. The dictionary is recreated upon read. Ok... What I cannot find then, is the code that builds the dictionary during read :) Also as a general concern... We write these WAL logs (in part) for redundancy. Compression is the opposite of redundancy... So say, we garble the beginning of a WAL file, then the entire file will be useless to us... I don't think that is a big deal, though. As the WAL entries are variable length this is mostly true even today. bq. On 2011-11-07 23:39:59, Lars Hofhansl wrote: bq. > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 157 bq. > <https://reviews.apache.org/r/2740/diff/1/?file=56624#file56624line157> bq. > bq. > Could we have a no-op compressor instead? bq. bq. Li Pi wrote: bq. no-op compressor? as in one that does nothing? Yep... So compression will never be null, and we can safe if-statements (and make the code more readable) :) - Lars ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review3093 ----------------------------------------------------------- On 2011-11-07 23:12:37, Li Pi wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2740/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-11-07 23:12:37) bq. bq. bq. Review request for hbase, Eli Collins and Todd Lipcon. bq. bq. bq. Summary bq. ------- bq. bq. Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. bq. bq. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. bq. bq. bq. This addresses bug HBase-4608. bq. https://issues.apache.org/jira/browse/HBase-4608 bq. bq. bq. Diffs bq. ----- bq. bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef bq. bq. Diff: https://reviews.apache.org/r/2740/diff bq. bq. bq. Testing bq. ------- bq. bq. bq. Thanks, bq. bq. Li bq. bq. > HLog Compression > ---------------- > > Key: HBASE-4608 > URL: https://issues.apache.org/jira/browse/HBASE-4608 > Project: HBase > Issue Type: New Feature > Reporter: Li Pi > Assignee: Li Pi > Attachments: 4608v1.txt > > > The current bottleneck to HBase write speed is replicating the WAL appends > across different datanodes. We can speed up this process by compressing the > HLog. Current plan involves using a dictionary to compress table name, region > id, cf name, and possibly other bits of repeated data. Also, HLog format may > be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira