Bryan Beaudreault created HBASE-27896:
-----------------------------------------

             Summary: Disable hdfs readahead for pread reads
                 Key: HBASE-27896
                 URL: https://issues.apache.org/jira/browse/HBASE-27896
             Project: HBase
          Issue Type: Improvement
            Reporter: Bryan Beaudreault


In https://issues.apache.org/jira/browse/HBASE-17914, a flag was introduced 
{{{}hbase.store.reader.no-readahead{}}}. The default is false, so readahead is 
enabled. This flag is used for creating the default store reader (i.e. the one 
used by PREAD reads). Stream readers don't use this flag, instead they always 
pass -1.

When that flag is false, we pass a readahead value of 0 to 
FSDataInputStream.setReadahead. When the flag is true, we pass -1 which 
triggers hdfs default behavior. The default behavior is to use a readahead of 
4MB.

It seems to me that we don't want readahead for PREAD reads, and especially not 
such a large readahead. Our default block size is 64kb, which is much smaller 
than that. A PREAD read is not likely to do sequential IO, so not likely to 
utilize the cached readahead buffer.

I set no-readahead to true in a few of our clusters and in each case saw a 
massive reduction in disk IO and thus increase in throughput. I load tested 
this in a test cluster which does fully random reads of ~300 byte rows on a 
dataset which is 20x larger than memory. The load test was able to achieve 
nearly double the throughput.

As a follow-on, we might consider tuning the readahead for STREAM reads. 4mb 
seems way too big for many common workloads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to