FAST_DIFF: Time to read all 1.3M rows reported in ms. encoding = NONE, scanner = StoreScanner; time = 300 ms encoding = PREFIX_TREE, scanner = StoreScanner; time = 860 ms encoding = FAST_DIFF, scanner = StoreScanner; time = 460 ms encoding = NONE , scanner = StoreFileScanner; time = 52 ms encoding = PREFIX_TREE, scanner = StoreFileScanner; time = 545 ms encoding = FAST_DIFF, scanner = StoreFileScanner; time = 195 ms
-Vladimir On Sun, Oct 20, 2013 at 4:06 AM, Jean-Marc Spaggiari < [email protected]> wrote: > Vladimir, any chance to run the same test with FAST_DIFF? > > J > > > 2013/10/20 Vladimir Rodionov <[email protected]> > > > I wanted to try PREFIX_TREE because it is supposed to be fastest on > > seek/reseek. > > > > > > On Sat, Oct 19, 2013 at 9:12 PM, lars hofhansl <[email protected]> wrote: > > > > > I found FAST_DIFF to be the fastest of the block encoders. > > > (Prefix tree is in 0.96+ only as far as I know.) > > > > > > -- Lars > > > > > > > > > > > > ----- Original Message ----- > > > From: Vladimir Rodionov <[email protected]> > > > To: "[email protected]" <[email protected]>; lars hofhansl < > > > [email protected]> > > > Cc: > > > Sent: Saturday, October 19, 2013 9:08 PM > > > Subject: Re: Beware of PREFIX_TREE block encoding > > > > > > *Now, which encoder did you test specifically? I seen a 20-40% slowdown > > > when everything is in the blockcache (which is the worst case scenario > > > here), certainly not a 10x slowdown.* > > > > > > I have 1.3M rows (very small - 48 bytes) in a block cache which I read > > > sequentially, using encoding NONE, PREFIX_TREE and > > > StoreScanner/StoreFileScanner (close to metal - block cache :) > > > > > > Time to read all 1.3M rows reported in ms. > > > > > > encoding = NONE, scanner = StoreScanner; time = > 300 > > > ms > > > encoding = PREFIX_TREE, scanner = StoreScanner; time = 860 ms > > > encoding = NONE , scanner = StoreFileScanner; time = 52 > > ms > > > encoding = PREFIX_TREE, scanner = StoreFileScanner; time = 545 ms > > > > > > -Vladimir > > > > > > > > > > > > > > > On Sat, Oct 19, 2013 at 8:50 PM, lars hofhansl <[email protected]> > wrote: > > > > > > > That is (unfortunately) a known issue. The main problem is that HBase > > > > expects each KV to be backed by a contiguous byte[]. For any prefix > > > > encoding it is thus necessary to rematerialize the KV (i.e. copy all > > the > > > > partial bytes into a new location). > > > > That is inefficient. Nobody has taken on to fix this (we're 1/2 there > > > with > > > > Cells in 0.96, though). > > > > > > > > There a jiras out there to fix this like HBASE-7320 and more recently > > > > HBASE-9794. > > > > > > > > Now, which encoder did you test specifically? I seen a 20-40% > slowdown > > > > when everything is in the blockcache (which is the worst case > scenario > > > > here), certainly not a 10x slowdown. > > > > > > > > Note that with block encoding the block are stored encoded in the > > > > blockcache, so more data fits into the cache, and (obviously) there's > > > less > > > > IO when the data is not in the cache). So the extra work CPU cycles > and > > > > memory bandwidth used are offset by that. > > > > > > > > There're other problems too. I just filed an issue (HBASE-9807) where > > > with > > > > block encoders we make a copy of the key portion of the KV on each > > > reseek, > > > > just to compare it the current scan key. > > > > > > > > -- Lars > > > > ________________________________ > > > > From: Vladimir Rodionov <[email protected]> > > > > To: "[email protected]" <[email protected]> > > > > Sent: Saturday, October 19, 2013 7:34 PM > > > > Subject: RE: Beware of PREFIX_TREE block encoding > > > > > > > > > > > > What I wanted to say by this? HBase still does not have block > encoding > > > > which is optimal for both scan and seek (re-seek). > > > > I do not think these goals are mutually exclusive. > > > > > > > > > > > > Best regards, > > > > Vladimir Rodionov > > > > Principal Platform Engineer > > > > Carrier IQ, www.carrieriq.com > > > > e-mail: [email protected] > > > > > > > > ________________________________________ > > > > > > > > From: Vladimir Rodionov [[email protected]] > > > > Sent: Saturday, October 19, 2013 7:32 PM > > > > To: [email protected] > > > > Subject: Beware of PREFIX_TREE block encoding > > > > > > > > The scan performance is bad. 10 x slower on my tests than for blocks > > with > > > > NONE encoding. I scan data directly from block cache through > > > > StoreFileScanner (bypassing all StoreScanner/KeyValueHeap stuff). It > > > should > > > > be clearly stated that this encoding degrades overall performance > > > > significantly in favor of data size reduction and is suitable only > for > > > Gets > > > > - not for Scans. > > > > > > > > Best regards, > > > > -Vladimir Rodionov > > > > > > > > - > > > > > > > > Confidentiality Notice: The information contained in this message, > > > > including any attachments hereto, may be confidential and is intended > > to > > > be > > > > read only by the individual or entity to whom this message is > > addressed. > > > If > > > > the reader of this message is not the intended recipient or an agent > or > > > > designee of the intended recipient, please note that any review, use, > > > > disclosure or distribution of this message or its attachments, in any > > > form, > > > > is strictly prohibited. If you have received this message in error, > > > please > > > > immediately notify the sender and/or [email protected] and > > > > delete or destroy any copy of this message and its attachments. > > > > > > > > > > > > >
