gagan taneja created SPARK-19705:
------------------------------------

             Summary: Preferred location supporting HDFS Cache for FileScanRDD
                 Key: SPARK-19705
                 URL: https://issues.apache.org/jira/browse/SPARK-19705
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.0
            Reporter: gagan taneja


Although NewHadoopRDD and HadoopRdd considers HDFS cache while calculating 
preferredLocations, FileScanRDD do not take into account HDFS cache while 
calculating preferredLocations
The enhancement can be easily implemented for large files where FilePartition 
only contains single HDFS file
The enhancement will also result in significant performance improvement for 
cached hdfs partitions



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to