Hello team,

I recently discovered "hbase.store.reader.no-readahead", which defaults to
false (so readahead is enabled). This only applies to PREAD reads, not
STREAM reads which always use readahead. When readahead is enabled, the
default readahead amount in the DFSClient is 4mb. In my opinion this is
extremely huge for HBase's use-case.

Further, reads in HBase are always for a block at a time and blocks
typically have more than one row in them. So we are already reading ahead a
bit via block reads. And lastly, readahead is typically useful for
sequential read scenarios. It's unlikely for someone to do sequential IO
via PREAD, instead they would use Scans (thus STREAM). In the case where
someone is doing sequential IO via PREAD, they'd get some natural readahead
due to our reading of blocks at a time.

I disabled readahead on about 50 servers across various clusters in our
production environment, and saw a massive (10x or more) drop in disk IO for
random read and mixed read cases. Scan workloads were mostly unaffected due
to not using this setting. I also did a targeted load test of a cluster,
with and without readahead, and was able to get double the random read
throughput with it disabled.

I'd like to update the default for this config to "true", thus disabling
readahead for PREAD by default. I also think it's worth investigating
making readahead configurable for STREAM reads, perhaps based on the scan's
max result size or blockBytesScanned of the last next() call.

Any objections to changing the default?

https://issues.apache.org/jira/browse/HBASE-27896

Reply via email to