Re: Poor HBase random read performance

Ted Yu Mon, 01 Jul 2013 16:28:10 -0700

bq.  lets say you compact a really old file with a new file.

I think stripe compaction is supposed to handle the above scenario. Take a
look at:
https://issues.apache.org/jira/browse/HBASE-7667


Please also refer to Sergey's talk @ HBaseCon.

Cheers

On Mon, Jul 1, 2013 at 4:10 PM, Varun Sharma <va...@pinterest.com> wrote:

> Going back to leveldb vs hbase, I am not sure if we can come with a clean
> way to identify HFiles containing more recent data in the wake of
> compactions
>
> I though wonder if this works with minor compactions, lets say you compact
> a really old file with a new file. Now since this file's most recent
> timestamp is very recent because of the new file, you look into this file,
> but then retrieve something from the "old" portion of this file. So you end
> with older data.
>
> I guess one way would be just order the files by time ranges. Non
> intersecting time range files can be ordered in reverse time order.
> Intersecting stuff can be seeked together.
>
>      File1
> |-----------------|
>                           File2
>                      |---------------|
>                                        File3
>                              |-----------------------------|
>                                                                      File4
>
>  |--------------------|
>
> So in this case, we seek
>
> [File1], [File2, File3], [File4]
>
> I think for random single key value looks (row, col)->key - this could lead
> to good savings for time ordered clients (which are quite common). Unless
> File1 and File4 get compacted, in which case, we always need to seek into
> both.
>
>
>
> On Mon, Jul 1, 2013 at 12:10 PM, lars hofhansl <la...@apache.org> wrote:
>
> > Sorry. Hit enter too early.
> >
> > Some discussion here:
> > http://apache-hbase.679495.n3.nabble.com/keyvalue-cache-td3882628.html
> > but no actionable outcome.
> >
> > -- Lars
> > ________________________________
> > From: lars hofhansl <la...@apache.org>
> > To: "dev@hbase.apache.org" <dev@hbase.apache.org>
> > Sent: Monday, July 1, 2013 12:05 PM
> > Subject: Re: Poor HBase random read performance
> >
> >
> > This came up a few times before.
> >
> >
> >
> > ________________________________
> > From: Vladimir Rodionov <vrodio...@carrieriq.com>
> > To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars hofhansl <
> > la...@apache.org>
> > Sent: Monday, July 1, 2013 11:08 AM
> > Subject: RE: Poor HBase random read performance
> >
> >
> > I would like to remind that in original BigTable's design  there is scan
> > cache to take care of random reads and this
> > important feature is still missing in HBase.
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodio...@carrieriq.com
> >
> > ________________________________________
> > From: lars hofhansl [la...@apache.org]
> > Sent: Saturday, June 29, 2013 3:24 PM
> > To: dev@hbase.apache.org
> > Subject: Re: Poor HBase random read performance
> >
> > Should also say that random reads this way are somewhat of a worst case
> > scenario.
> >
> > If the working set is much larger than the block cache and the reads are
> > random, then each read will likely have to bring in an entirely new block
> > from the OS cache,
> > even when the KVs are much smaller than a block.
> >
> > So in order to read a (say) 1k KV HBase needs to bring 64k (default block
> > size) from the OS cache.
> > As long as the dataset fits into the block cache this difference in size
> > has no performance impact, but as soon as the dataset does not fit, we
> have
> > to bring much more data from the OS cache than we're actually interested
> in.
> >
> > Indeed in my test I found that HBase brings in about 60x the data size
> > from the OS cache (used PE with ~1k KVs). This can be improved with
> smaller
> > block sizes; and with a more efficient way to instantiate HFile blocks in
> > Java (which we need to work on).
> >
> >
> > -- Lars
> >
> > ________________________________
> > From: lars hofhansl <la...@apache.org>
> > To: "dev@hbase.apache.org" <dev@hbase.apache.org>
> > Sent: Saturday, June 29, 2013 3:09 PM
> > Subject: Re: Poor HBase random read performance
> >
> >
> > I've seen the same bad performance behavior when I tested this on a real
> > cluster. (I think it was in 0.94.6)
> >
> >
> > Instead of en/disabling the blockcache, I tested sequential and random
> > reads on a data set that does not fit into the (aggregate) block cache.
> > Sequential reads were drastically faster than Random reads (7 vs 34
> > minutes), which can really only be explained with the fact that the next
> > get will with high probability hit an already cached block, whereas in
> the
> > random read case it likely will not.
> >
> > In the RandomRead case I estimate that each RegionServer brings in
> between
> > 100 and 200mb/s from the OS cache. Even at 200mb/s this would be quite
> > slow.I understand that performance is bad when index/bloom blocks are not
> > cached, but bringing in data blocks from the OS cache should be faster
> than
> > it is.
> >
> >
> > So this is something to debug.
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> > From: Varun Sharma <va...@pinterest.com>
> > To: "dev@hbase.apache.org" <dev@hbase.apache.org>
> > Sent: Saturday, June 29, 2013 12:13 PM
> > Subject: Poor HBase random read performance
> >
> >
> > Hi,
> >
> > I was doing some tests on how good HBase random reads are. The setup is
> > consists of a 1 node cluster with dfs replication set to 1. Short circuit
> > local reads and HBase checksums are enabled. The data set is small enough
> > to be largely cached in the filesystem cache - 10G on a 60G machine.
> >
> > Client sends out multi-get operations in batches to 10 and I try to
> measure
> > throughput.
> >
> > Test #1
> >
> > All Data was cached in the block cache.
> >
> > Test Time = 120 seconds
> > Num Read Ops = 12M
> >
> > Throughput = 100K per second
> >
> > Test #2
> >
> > I disable block cache. But now all the data is in the file system cache.
> I
> > verify this by making sure that IOPs on the disk drive are 0 during the
> > test. I run the same test with batched ops.
> >
> > Test Time = 120 seconds
> > Num Read Ops = 0.6M
> > Throughput = 5K per second
> >
> > Test #3
> >
> > I saw all the threads are now stuck in idLock.lockEntry(). So I now run
> > with the lock disabled and the block cache disabled.
> >
> > Test Time = 120 seconds
> > Num Read Ops = 1.2M
> > Throughput = 10K per second
> >
> > Test #4
> >
> > I re enable block cache and this time hack hbase to only cache Index and
> > Bloom blocks but data blocks come from File System cache.
> >
> > Test Time = 120 seconds
> > Num Read Ops = 1.6M
> > Throughput = 13K per second
> >
> > So, I wonder how come such a massive drop in throughput. I know that HDFS
> > code adds tremendous overhead but this seems pretty high to me. I use
> > 0.94.7 and cdh 4.2.0
> >
> > Thanks
> > Varun
> >
> > Confidentiality Notice:  The information contained in this message,
> > including any attachments hereto, may be confidential and is intended to
> be
> > read only by the individual or entity to whom this message is addressed.
> If
> > the reader of this message is not the intended recipient or an agent or
> > designee of the intended recipient, please note that any review, use,
> > disclosure or distribution of this message or its attachments, in any
> form,
> > is strictly prohibited.  If you have received this message in error,
> please
> > immediately notify the sender and/or notificati...@carrieriq.com and
> > delete or destroy any copy of this message and its attachments.
> >
>

Re: Poor HBase random read performance

Reply via email to