Bryan Beaudreault created HBASE-27896:
-----------------------------------------
Summary: Disable hdfs readahead for pread reads
Key: HBASE-27896
URL: https://issues.apache.org/jira/browse/HBASE-27896
Project: HBase
Issue Type: Improvement
Reporter: Bryan Beaudreault
In https://issues.apache.org/jira/browse/HBASE-17914, a flag was introduced
{{{}hbase.store.reader.no-readahead{}}}. The default is false, so readahead is
enabled. This flag is used for creating the default store reader (i.e. the one
used by PREAD reads). Stream readers don't use this flag, instead they always
pass -1.
When that flag is false, we pass a readahead value of 0 to
FSDataInputStream.setReadahead. When the flag is true, we pass -1 which
triggers hdfs default behavior. The default behavior is to use a readahead of
4MB.
It seems to me that we don't want readahead for PREAD reads, and especially not
such a large readahead. Our default block size is 64kb, which is much smaller
than that. A PREAD read is not likely to do sequential IO, so not likely to
utilize the cached readahead buffer.
I set no-readahead to true in a few of our clusters and in each case saw a
massive reduction in disk IO and thus increase in throughput. I load tested
this in a test cluster which does fully random reads of ~300 byte rows on a
dataset which is 20x larger than memory. The load test was able to achieve
nearly double the throughput.
As a follow-on, we might consider tuning the readahead for STREAM reads. 4mb
seems way too big for many common workloads.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)