bq. lets say you compact a really old file with a new file. I think stripe compaction is supposed to handle the above scenario. Take a look at: https://issues.apache.org/jira/browse/HBASE-7667
Please also refer to Sergey's talk @ HBaseCon. Cheers On Mon, Jul 1, 2013 at 4:10 PM, Varun Sharma <va...@pinterest.com> wrote: > Going back to leveldb vs hbase, I am not sure if we can come with a clean > way to identify HFiles containing more recent data in the wake of > compactions > > I though wonder if this works with minor compactions, lets say you compact > a really old file with a new file. Now since this file's most recent > timestamp is very recent because of the new file, you look into this file, > but then retrieve something from the "old" portion of this file. So you end > with older data. > > I guess one way would be just order the files by time ranges. Non > intersecting time range files can be ordered in reverse time order. > Intersecting stuff can be seeked together. > > File1 > |-----------------| > File2 > |---------------| > File3 > |-----------------------------| > File4 > > |--------------------| > > So in this case, we seek > > [File1], [File2, File3], [File4] > > I think for random single key value looks (row, col)->key - this could lead > to good savings for time ordered clients (which are quite common). Unless > File1 and File4 get compacted, in which case, we always need to seek into > both. > > > > On Mon, Jul 1, 2013 at 12:10 PM, lars hofhansl <la...@apache.org> wrote: > > > Sorry. Hit enter too early. > > > > Some discussion here: > > http://apache-hbase.679495.n3.nabble.com/keyvalue-cache-td3882628.html > > but no actionable outcome. > > > > -- Lars > > ________________________________ > > From: lars hofhansl <la...@apache.org> > > To: "dev@hbase.apache.org" <dev@hbase.apache.org> > > Sent: Monday, July 1, 2013 12:05 PM > > Subject: Re: Poor HBase random read performance > > > > > > This came up a few times before. > > > > > > > > ________________________________ > > From: Vladimir Rodionov <vrodio...@carrieriq.com> > > To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars hofhansl < > > la...@apache.org> > > Sent: Monday, July 1, 2013 11:08 AM > > Subject: RE: Poor HBase random read performance > > > > > > I would like to remind that in original BigTable's design there is scan > > cache to take care of random reads and this > > important feature is still missing in HBase. > > > > Best regards, > > Vladimir Rodionov > > Principal Platform Engineer > > Carrier IQ, www.carrieriq.com > > e-mail: vrodio...@carrieriq.com > > > > ________________________________________ > > From: lars hofhansl [la...@apache.org] > > Sent: Saturday, June 29, 2013 3:24 PM > > To: dev@hbase.apache.org > > Subject: Re: Poor HBase random read performance > > > > Should also say that random reads this way are somewhat of a worst case > > scenario. > > > > If the working set is much larger than the block cache and the reads are > > random, then each read will likely have to bring in an entirely new block > > from the OS cache, > > even when the KVs are much smaller than a block. > > > > So in order to read a (say) 1k KV HBase needs to bring 64k (default block > > size) from the OS cache. > > As long as the dataset fits into the block cache this difference in size > > has no performance impact, but as soon as the dataset does not fit, we > have > > to bring much more data from the OS cache than we're actually interested > in. > > > > Indeed in my test I found that HBase brings in about 60x the data size > > from the OS cache (used PE with ~1k KVs). This can be improved with > smaller > > block sizes; and with a more efficient way to instantiate HFile blocks in > > Java (which we need to work on). > > > > > > -- Lars > > > > ________________________________ > > From: lars hofhansl <la...@apache.org> > > To: "dev@hbase.apache.org" <dev@hbase.apache.org> > > Sent: Saturday, June 29, 2013 3:09 PM > > Subject: Re: Poor HBase random read performance > > > > > > I've seen the same bad performance behavior when I tested this on a real > > cluster. (I think it was in 0.94.6) > > > > > > Instead of en/disabling the blockcache, I tested sequential and random > > reads on a data set that does not fit into the (aggregate) block cache. > > Sequential reads were drastically faster than Random reads (7 vs 34 > > minutes), which can really only be explained with the fact that the next > > get will with high probability hit an already cached block, whereas in > the > > random read case it likely will not. > > > > In the RandomRead case I estimate that each RegionServer brings in > between > > 100 and 200mb/s from the OS cache. Even at 200mb/s this would be quite > > slow.I understand that performance is bad when index/bloom blocks are not > > cached, but bringing in data blocks from the OS cache should be faster > than > > it is. > > > > > > So this is something to debug. > > > > -- Lars > > > > > > > > ________________________________ > > From: Varun Sharma <va...@pinterest.com> > > To: "dev@hbase.apache.org" <dev@hbase.apache.org> > > Sent: Saturday, June 29, 2013 12:13 PM > > Subject: Poor HBase random read performance > > > > > > Hi, > > > > I was doing some tests on how good HBase random reads are. The setup is > > consists of a 1 node cluster with dfs replication set to 1. Short circuit > > local reads and HBase checksums are enabled. The data set is small enough > > to be largely cached in the filesystem cache - 10G on a 60G machine. > > > > Client sends out multi-get operations in batches to 10 and I try to > measure > > throughput. > > > > Test #1 > > > > All Data was cached in the block cache. > > > > Test Time = 120 seconds > > Num Read Ops = 12M > > > > Throughput = 100K per second > > > > Test #2 > > > > I disable block cache. But now all the data is in the file system cache. > I > > verify this by making sure that IOPs on the disk drive are 0 during the > > test. I run the same test with batched ops. > > > > Test Time = 120 seconds > > Num Read Ops = 0.6M > > Throughput = 5K per second > > > > Test #3 > > > > I saw all the threads are now stuck in idLock.lockEntry(). So I now run > > with the lock disabled and the block cache disabled. > > > > Test Time = 120 seconds > > Num Read Ops = 1.2M > > Throughput = 10K per second > > > > Test #4 > > > > I re enable block cache and this time hack hbase to only cache Index and > > Bloom blocks but data blocks come from File System cache. > > > > Test Time = 120 seconds > > Num Read Ops = 1.6M > > Throughput = 13K per second > > > > So, I wonder how come such a massive drop in throughput. I know that HDFS > > code adds tremendous overhead but this seems pretty high to me. I use > > 0.94.7 and cdh 4.2.0 > > > > Thanks > > Varun > > > > Confidentiality Notice: The information contained in this message, > > including any attachments hereto, may be confidential and is intended to > be > > read only by the individual or entity to whom this message is addressed. > If > > the reader of this message is not the intended recipient or an agent or > > designee of the intended recipient, please note that any review, use, > > disclosure or distribution of this message or its attachments, in any > form, > > is strictly prohibited. If you have received this message in error, > please > > immediately notify the sender and/or notificati...@carrieriq.com and > > delete or destroy any copy of this message and its attachments. > > >