[ 
https://issues.apache.org/jira/browse/HDFS-17593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated HDFS-17593:
-----------------------------------
    Description: 
The HDFS client seems to always get block locations from the namenode when 
opening a file:

https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java#L1099

This leads to unnecessary RPCs in Apache Impala when doing reads, as the block 
locations are cached globally and the executors already have a good guess about 
the block locations when opening a stream. Unless the cached block locations 
are stale ideally no RPC should be made to the namenode.

  was:
The HDFS client seems to always get block locations from the namenode when 
opening a file:

https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java#L1099

This leads to unnecessary RPCs in Apache Impala when doing remote reads, as the 
block locations are cached globally and the executors already have a good guess 
about the block locations when opening a stream. Unless the cached block 
locations are stale ideally no RPC should be made to the namenode.


> Allow setting block locations when opening streams
> --------------------------------------------------
>
>                 Key: HDFS-17593
>                 URL: https://issues.apache.org/jira/browse/HDFS-17593
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Csaba Ringhofer
>            Priority: Major
>
> The HDFS client seems to always get block locations from the namenode when 
> opening a file:
> https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java#L1099
> This leads to unnecessary RPCs in Apache Impala when doing reads, as the 
> block locations are cached globally and the executors already have a good 
> guess about the block locations when opening a stream. Unless the cached 
> block locations are stale ideally no RPC should be made to the namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to