[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219947#comment-13219947
 ] 

[email protected] commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 37
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line37>
bq.  >
bq.  >     Should this javadoc here in the class include the notes you made for 
Kannan where you describe how it all works?  If not here, where else will doc. 
on how the Compressor works go?
bq.  >     
bq.  >     Maybe you should purge all mention of WAL from this class -- e.g. 
WALDictionary -- because it seems like it could be easily generalized (I 
suppose we can do that later).

Included!


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 47
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line47>
bq.  >
bq.  >     The way the usage is written, -u and -c are optional.  You should 
fix that.  Looks like they are required going by fact that args.length needs to 
be 3.  Also, it looks like you take --help, the long form, or -u/-c the short 
forms.  Either take all short forms or take both long and short form to be 
consistent.

System.out.println("Exactly one of -u or -c must be specified"); should take 
care of the required thing.

Help now takes both short and long forms. Everything else just takes short 
forms.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 66
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line66>
bq.  >
bq.  >     Why is the tool called WALCompressor in the usage but the class I 
invoke is Compressor?

Probably should be called compressor.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 79
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line79>
bq.  >
bq.  >     This does not need to be an HBaseConfiguration?  There are no 
configs in hbase-site.xml that might effect whats going on here?

Not really. All that matters is whether compression is on or off.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 108
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line108>
bq.  >
bq.  >     Doc the '@return'

fixed.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 141
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line141>
bq.  >
bq.  >     Doc the return

fixed.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 
1671
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78623#file78623line1671>
bq.  >
bq.  >     White space

fixed.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 
1675
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78623#file78623line1675>
bq.  >
bq.  >     When is this called?  Post construction?  Should it be part of 
constructor?  What happens if its called part way through the writing of a WAL? 
 Will we start compressing a WAL in the middle?

Its called when an logwriter is created. We will start compression a log in the 
middle if we happen to call it at that time. But that shouldn't happen.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, 
line 270
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78624#file78624line270>
bq.  >
bq.  >     I don't follow whats going on here.  What happens when len >= 0?  
Why is it < 0?  Whats that mean?  Whats v2 of hlogkey?  What if keyContext is 
not null?

HLogKey has two different formats. If len < 0, that means we're reading the old 
version of the HLog.

Keycontext is the compression context that holds the dictionaries used in 
compression. If it isn't null, that means compression is enabled.

If len > 0, we're on version 1. We can't compress version 1, but the code for 
reading version 1 is still in there, for transitioning from earlier HLogs. 
Compression should never be enabled if we're reading in version 1 Hlogs, 
because there shouldn't be any version 1 hlogs.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java,
 line 119
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78625#file78625line119>
bq.  >
bq.  >     Class comment on what this is about?

Just a tuple class for holding the various dictionaries used in compression.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java,
 line 141
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78625#file78625line141>
bq.  >
bq.  >     Why do I clear this?  Why not just throw it away?  Does clearing 
make it so I can recycle this instance?

Correct. We clear it so we can recycle this instance instead of having to 
create a new dictionary. Not sure if this makes a huge difference in terms of 
performance.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 
29
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line29>
bq.  >
bq.  >     Why would I ever let go of terms in the dictionary?  Should you 
explain why in class comment?

We let go of terms in the dictionary since we have only an finite amount of 
space, and ability to reference terms of the dictionary.

If we're using a 2 byte key, that limits our reference space to 65536. We could 
end up using vints for entries into the dictionary, but this could end up with 
it growing pretty huge.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 
64
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line64>
bq.  >
bq.  >     Should this be static?  Does it need reference to outer class?

It doesn't need to reference the outer class. Made static.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 
168
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line168>
bq.  >
bq.  >     Class comment?  Should this be static?

made static.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java,
 line 176
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78627#file78627line176>
bq.  >
bq.  >     Why am I reading whether compression is on or off by looking at 
config?  Why am I not looking into head of the WAL file and figure its 
compressed and then decompressing?  Otherwise, if config is disabled but I'm 
fed a compressed file, do I just burp?  See the white space added here.

We just burp if compression is on and we get fed an uncompressed file. This 
should be easy to change though - on the read side.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 
28
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78629#file78629line28>
bq.  >
bq.  >     Should be just called Dictionary. Its in the wal package.  No need 
of the redundant prefix?

Sure. But we have WALActionsListener and a bunch of other things starting with 
WAL. I figured we can just have that as well.

Renamed to dictionary.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > 
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java,
 line 38
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78634#file78634line38>
bq.  >
bq.  >     This will run all the tests in TestWALReplay?  Nice.

Yup. thats exactly what it does.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5265
-----------------------------------------------------------


On 2012-02-22 03:46:12, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-22 03:46:12)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java 
PRE-CREATION 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b 
bq.    
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.    
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
23d27fd 
bq.    
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 
> 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to