I found FAST_DIFF to be the fastest of the block encoders.
(Prefix tree is in 0.96+ only as far as I know.)

-- Lars



----- Original Message -----
From: Vladimir Rodionov <[email protected]>
To: "[email protected]" <[email protected]>; lars hofhansl 
<[email protected]>
Cc: 
Sent: Saturday, October 19, 2013 9:08 PM
Subject: Re: Beware of PREFIX_TREE block encoding

*Now, which encoder did you test specifically? I seen a 20-40% slowdown
when everything is in the blockcache (which is the worst case scenario
here), certainly not a 10x slowdown.*

I have 1.3M rows (very small - 48 bytes) in a block cache which I read
sequentially, using encoding NONE, PREFIX_TREE and
StoreScanner/StoreFileScanner (close to metal - block cache :)

Time to read all 1.3M rows reported in ms.

encoding  = NONE,                scanner = StoreScanner;      time = 300  ms
encoding  = PREFIX_TREE,  scanner = StoreScanner;      time = 860  ms
encoding  = NONE              ,  scanner = StoreFileScanner; time = 52   ms
encoding  = PREFIX_TREE,  scanner = StoreFileScanner; time = 545 ms

-Vladimir




On Sat, Oct 19, 2013 at 8:50 PM, lars hofhansl <[email protected]> wrote:

> That is (unfortunately) a known issue. The main problem is that HBase
> expects each KV to be backed by a contiguous byte[]. For any prefix
> encoding it is thus necessary to rematerialize the KV (i.e. copy all the
> partial bytes into a new location).
> That is inefficient. Nobody has taken on to fix this (we're 1/2 there with
> Cells in 0.96, though).
>
> There a jiras out there to fix this like HBASE-7320 and more recently
> HBASE-9794.
>
> Now, which encoder did you test specifically? I seen a 20-40% slowdown
> when everything is in the blockcache (which is the worst case scenario
> here), certainly not a 10x slowdown.
>
> Note that with block encoding the block are stored encoded in the
> blockcache, so more data fits into the cache, and (obviously) there's less
> IO when the data is not in the cache). So the extra work CPU cycles and
> memory bandwidth used are offset by that.
>
> There're other problems too. I just filed an issue (HBASE-9807) where with
> block encoders we make a copy of the key portion of the KV on each reseek,
> just to compare it the current scan key.
>
> -- Lars
> ________________________________
> From: Vladimir Rodionov <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Saturday, October 19, 2013 7:34 PM
> Subject: RE: Beware of PREFIX_TREE block encoding
>
>
> What I wanted to say by this? HBase still does not have block encoding
> which is optimal for both scan and seek (re-seek).
> I do not think these goals are mutually exclusive.
>
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: [email protected]
>
> ________________________________________
>
> From: Vladimir Rodionov [[email protected]]
> Sent: Saturday, October 19, 2013 7:32 PM
> To: [email protected]
> Subject: Beware of PREFIX_TREE block encoding
>
> The scan performance is bad. 10 x slower on my tests than for blocks with
> NONE encoding. I scan data directly from block cache through
> StoreFileScanner (bypassing all StoreScanner/KeyValueHeap stuff). It should
> be clearly stated  that this encoding degrades overall performance
> significantly in favor of data size reduction and is suitable only for Gets
> - not for Scans.
>
> Best regards,
> -Vladimir Rodionov
>
> -
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or [email protected] and
> delete or destroy any copy of this message and its attachments.
>

Reply via email to