So, I just major compacted the table which initially had 3 store files and performance went 3X from 1.6M to 4M+.
The tests I am running, have 8 byte keys with ~ 80-100 byte values. Right now i am working with 64K block size, I am going to make it 8K and see if that helps. The one point though is the IdLock mechanism - that seems to add a huge amount of overhead 2x - however in that test I was not caching index blocks in the block cache, which means a lot higher contention on those blocks. I believe it was used so that we dont load the same block twice from disk. I am wondering, when IOPs are surplus (ssds for example), if we should have an option to disable it though I probably should reevaluate it, with index blocks in block cache. On Sat, Jun 29, 2013 at 3:24 PM, lars hofhansl <[email protected]> wrote: > Should also say that random reads this way are somewhat of a worst case > scenario. > > If the working set is much larger than the block cache and the reads are > random, then each read will likely have to bring in an entirely new block > from the OS cache, > even when the KVs are much smaller than a block. > > So in order to read a (say) 1k KV HBase needs to bring 64k (default block > size) from the OS cache. > As long as the dataset fits into the block cache this difference in size > has no performance impact, but as soon as the dataset does not fit, we have > to bring much more data from the OS cache than we're actually interested in. > > Indeed in my test I found that HBase brings in about 60x the data size > from the OS cache (used PE with ~1k KVs). This can be improved with smaller > block sizes; and with a more efficient way to instantiate HFile blocks in > Java (which we need to work on). > > > -- Lars > > ________________________________ > From: lars hofhansl <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Saturday, June 29, 2013 3:09 PM > Subject: Re: Poor HBase random read performance > > > I've seen the same bad performance behavior when I tested this on a real > cluster. (I think it was in 0.94.6) > > > Instead of en/disabling the blockcache, I tested sequential and random > reads on a data set that does not fit into the (aggregate) block cache. > Sequential reads were drastically faster than Random reads (7 vs 34 > minutes), which can really only be explained with the fact that the next > get will with high probability hit an already cached block, whereas in the > random read case it likely will not. > > In the RandomRead case I estimate that each RegionServer brings in between > 100 and 200mb/s from the OS cache. Even at 200mb/s this would be quite > slow.I understand that performance is bad when index/bloom blocks are not > cached, but bringing in data blocks from the OS cache should be faster than > it is. > > > So this is something to debug. > > -- Lars > > > > ________________________________ > From: Varun Sharma <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Saturday, June 29, 2013 12:13 PM > Subject: Poor HBase random read performance > > > Hi, > > I was doing some tests on how good HBase random reads are. The setup is > consists of a 1 node cluster with dfs replication set to 1. Short circuit > local reads and HBase checksums are enabled. The data set is small enough > to be largely cached in the filesystem cache - 10G on a 60G machine. > > Client sends out multi-get operations in batches to 10 and I try to measure > throughput. > > Test #1 > > All Data was cached in the block cache. > > Test Time = 120 seconds > Num Read Ops = 12M > > Throughput = 100K per second > > Test #2 > > I disable block cache. But now all the data is in the file system cache. I > verify this by making sure that IOPs on the disk drive are 0 during the > test. I run the same test with batched ops. > > Test Time = 120 seconds > Num Read Ops = 0.6M > Throughput = 5K per second > > Test #3 > > I saw all the threads are now stuck in idLock.lockEntry(). So I now run > with the lock disabled and the block cache disabled. > > Test Time = 120 seconds > Num Read Ops = 1.2M > Throughput = 10K per second > > Test #4 > > I re enable block cache and this time hack hbase to only cache Index and > Bloom blocks but data blocks come from File System cache. > > Test Time = 120 seconds > Num Read Ops = 1.6M > Throughput = 13K per second > > So, I wonder how come such a massive drop in throughput. I know that HDFS > code adds tremendous overhead but this seems pretty high to me. I use > 0.94.7 and cdh 4.2.0 > > Thanks > Varun >
