Sahil Takiar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14635
Change subject: IMPALA-8525: preads should use hdfsPreadFully rather than hdfsPread ...................................................................... IMPALA-8525: preads should use hdfsPreadFully rather than hdfsPread Modifies HdfsFileReader so that it calls hdfsPreadFully instead of hdfsPread. hdfsPreadFully is a new libhdfs API introduced by HDFS-14564 (Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable). hdfsPreadFully improves performance of preads, especially when reading data from S3. The major difference between hdfsPread and hdfsPreadFully is that hdfsPreadFully is guaranteed to read all the requested bytes, whereas hdfsPread is only guaranteed to read up to the number of requested bytes. hdfsPreadFully reduces the amount of JNI array allocations necessary when reading data from S3. When any read method in libhdfs is called, the method allocates an array whose size is equal to the amount of data requested. The issue is that Java's InputStream#read only guarantees that it will read up to the amount of data requested. This can lead to issues where a libhdfs read request allocates a large Java array, even though the read request only partially fills it up. PositionedReadable#readFully on the other hand, guarantees that all requested data will be read, thus preventing any unnecessary JNI array allocations. hdfsPreadFully improves the effectiveness of fs.s3a.experimental.input.fadvise=RANDOM (HADOOP-13203). S3A recommends setting fadvise=RANDOM when doing random reads, which is common in Impala when reading Parquet or ORC files. fadvise=RANDOM causes the HTTP GET request that reads the S3 data to simply request the data bounded by the parameters of the current read request (e.g. for 'read(long position, ..., int length)' it requests 'length' bytes). The chunk-size optimization in HdfsFileReader hurts performance when fadvise=RANDOM because each HTTP GET request will only request 'chunk-size' amount of bytes at a time. Which is why this patch removes the chunk-size optimization as well. hdfsPreadFully helps here because all the data in the scan range will be requested by a single HTTP GET request. Since hdfsPreadFully improves S3 read performance, this patch enables preads for S3A files by default. Even if fadvise=SEQUENTIAL, hdfsPreadFully still improves performance since it avoids unnecessary JNI allocation overhead. The chunk-size optimization (added in https://gerrit.cloudera.org/#/c/63/) is no longer necessary after this patch. hdfsPreadFully prevents any unnecessary array allocations. Furthermore, it is likely the chunk-size optimization was added due to overhead fixed by HDFS-14285. Fixes a bug in IMPALA-8884 where the 'impala-server.io-mgr.queue-$i.read-size' statistics were being updated with the chunk-size passed to HdfsFileReader::ReadFromPosInternal, which is not necessarily equivalent to the amount of data actually read. Testing: * Ran core tests * Ran core tests on S3 Change-Id: I29ea34897096bc790abdeb98073a47f1c4c10feb --- M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-file-reader.h M be/src/runtime/io/local-file-reader.cc M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc 5 files changed, 13 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/14635/1 -- To view, visit http://gerrit.cloudera.org:8080/14635 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I29ea34897096bc790abdeb98073a47f1c4c10feb Gerrit-Change-Number: 14635 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar <stak...@cloudera.com>