[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497528#comment-13497528
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
-------------------------------------------

bq. Does CompressionContext class have to be public? Can it stay pkg private? 
You'll have to move your new class into wal package but that seems fine to me.
bq. My sense though is that this is a generally useful WAL reader? One that can 
do compressed or non-compressed WAL? One that can be used by replication but 
also by fellas who want to index hbase, etc.
bq. Can it be in the wal package? Then don't have to open up so much of HLog?

You're right that it can be a generally useful WAL reader, for users that need 
to be able to seek directly into newly open files without having to scan 
everything that comes before if compression is on. Right now its API is 
tailored to replication's need, we could make it more general but, unless we 
have another use case for it right now, I don't see the point.

So I'll move it to wal and rename the class/methods a bit.

bq. Does the base Reader interface have to know about a compression context? 
Can this not be internal to the implementation?

Until HDFS lets us tail a file under construction we need to pass the dict back 
when opening the file.

bq. Yeah, why can't an implementation of HLog.Reader manage the compression 
context internally? Why it have to be out here in this ReplicationHLogReader 
class? Afterall, isn't the dictionary reconstructed on read? You don't save it 
around?

It would be fine if we didn't have to:

 - seek into a file we never read before (after a region server died and we 
pick up the queue)
 - reopen files in order to tail them (when normally replicating)

We could augment HLog.Reader to support reopening of files, basically push down 
the stuff is doing ReplicationHLogReader even more. That way we could hide all 
the dirty details? I haven't thought about modifying that before.

bq. Missing a license

Oh thanks.

bq. Its unfortunate that you can't tell its a compressed wal from reading say 
some magic or metadata at the head of the file. It seems a bit broke consulting 
configuration.

Good point, it would simplify a lot of things. HLog compression was implemented 
at the HLog.Entry level though so technically it's not even the WAL itself 
that's compressed. My next comment shows what that means.

bq. Is this right:

Yes, if you keep the compression context in there it'll replicate the 
HLog.Entry[] compressed with a dictionary that the slave has no knowledge of. I 
had this comment in my first patch and I think I forgot to move it over:

bq. // Setting it to null prevents from sending compressed edits that the sink 
wouldn't parse
                
> Turn on WAL compression by default
> ----------------------------------
>
>                 Key: HBASE-5778
>                 URL: https://issues.apache.org/jira/browse/HBASE-5778
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.96.0
>
>         Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
> HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778.patch
>
>
> I ran some tests to verify if WAL compression should be turned on by default.
> For a use case where it's not very useful (values two order of magnitude 
> bigger than the keys), the insert time wasn't different and the CPU usage 15% 
> higher (150% CPU usage VS 130% when not compressing the WAL).
> When values are smaller than the keys, I saw a 38% improvement for the insert 
> run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
> WAL compression accounts for all the additional CPU usage, it might just be 
> that we're able to insert faster and we spend more time in the MemStore per 
> second (because our MemStores are bad when they contain tens of thousands of 
> values).
> Those are two extremes, but it shows that for the price of some CPU we can 
> save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
> CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to