gagan taneja created SPARK-19705: ------------------------------------ Summary: Preferred location supporting HDFS Cache for FileScanRDD Key: SPARK-19705 URL: https://issues.apache.org/jira/browse/SPARK-19705 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: gagan taneja
Although NewHadoopRDD and HadoopRdd considers HDFS cache while calculating preferredLocations, FileScanRDD do not take into account HDFS cache while calculating preferredLocations The enhancement can be easily implemented for large files where FilePartition only contains single HDFS file The enhancement will also result in significant performance improvement for cached hdfs partitions -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org