Mingchen_Ma created ORC-1458:
--------------------------------

             Summary: reduce namenode getFileinfo rpc 
                 Key: ORC-1458
                 URL: https://issues.apache.org/jira/browse/ORC-1458
             Project: ORC
          Issue Type: Wish
          Components: Java, Reader
            Reporter: Mingchen_Ma


In the ReaderImpl.java code, there is the following logic:
if (maxFileLength == Long. MAX_VALUE) {
         FileStatus fileStatus = fs.getFileStatus(path);
         size = fileStatus. getLen();
         modificationTime = fileStatus. getModificationTime();
}
The above logic is to obtain the length of the file so as to read the footer of 
orc. But because of this, when we read the orc file on hdfs, an open operation 
will cause an additional getFileinfo rpc operation by default (unless we set 
the file length through ReaderOptions.set before the orc open).
Because we have opened the file in ReaderImpl, can we optimize the rpc call of 
NN in the following way (in a high-load cluster, the pressure on the namenode 
can be significantly reduced):
if (maxFileLength == Long. MAX_VALUE) {
         FileStatus fileStatus = fs.getFileStatus(path);
         size = (DFSInputStream)file.getWrappedStream.getFileLength();
         modificationTime = fileStatus. getModificationTime();
}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to