Hello team, I recently discovered "hbase.store.reader.no-readahead", which defaults to false (so readahead is enabled). This only applies to PREAD reads, not STREAM reads which always use readahead. When readahead is enabled, the default readahead amount in the DFSClient is 4mb. In my opinion this is extremely huge for HBase's use-case.
Further, reads in HBase are always for a block at a time and blocks typically have more than one row in them. So we are already reading ahead a bit via block reads. And lastly, readahead is typically useful for sequential read scenarios. It's unlikely for someone to do sequential IO via PREAD, instead they would use Scans (thus STREAM). In the case where someone is doing sequential IO via PREAD, they'd get some natural readahead due to our reading of blocks at a time. I disabled readahead on about 50 servers across various clusters in our production environment, and saw a massive (10x or more) drop in disk IO for random read and mixed read cases. Scan workloads were mostly unaffected due to not using this setting. I also did a targeted load test of a cluster, with and without readahead, and was able to get double the random read throughput with it disabled. I'd like to update the default for this config to "true", thus disabling readahead for PREAD by default. I also think it's worth investigating making readahead configurable for STREAM reads, perhaps based on the scan's max result size or blockBytesScanned of the last next() call. Any objections to changing the default? https://issues.apache.org/jira/browse/HBASE-27896
