[ https://issues.apache.org/jira/browse/HDFS-17593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Csaba Ringhofer updated HDFS-17593: ----------------------------------- Description: The HDFS client seems to always get block locations from the namenode when opening a file: https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java#L1099 This leads to unnecessary RPCs in Apache Impala when doing reads, as the block locations are cached globally and the executors already have a good guess about the block locations when opening a stream. Unless the cached block locations are stale ideally no RPC should be made to the namenode. was: The HDFS client seems to always get block locations from the namenode when opening a file: https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java#L1099 This leads to unnecessary RPCs in Apache Impala when doing remote reads, as the block locations are cached globally and the executors already have a good guess about the block locations when opening a stream. Unless the cached block locations are stale ideally no RPC should be made to the namenode. > Allow setting block locations when opening streams > -------------------------------------------------- > > Key: HDFS-17593 > URL: https://issues.apache.org/jira/browse/HDFS-17593 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Csaba Ringhofer > Priority: Major > > The HDFS client seems to always get block locations from the namenode when > opening a file: > https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java#L1099 > This leads to unnecessary RPCs in Apache Impala when doing reads, as the > block locations are cached globally and the executors already have a good > guess about the block locations when opening a stream. Unless the cached > block locations are stale ideally no RPC should be made to the namenode. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org