[ https://issues.apache.org/jira/browse/SPARK-19705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940897#comment-15940897 ]
gagan taneja commented on SPARK-19705: -------------------------------------- Sandy Would you be able to help with this bug. I saw you had filed earlier bug for HDFS Cache support to RDD and this is relatively minor change. We have been running this code in production and we are able to achieve major performance boost Below is the reference to earlier bug filed by you https://issues.apache.org/jira/browse/SPARK-1767 > Preferred location supporting HDFS Cache for FileScanRDD > -------------------------------------------------------- > > Key: SPARK-19705 > URL: https://issues.apache.org/jira/browse/SPARK-19705 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0 > Reporter: gagan taneja > > Although NewHadoopRDD and HadoopRdd considers HDFS cache while calculating > preferredLocations, FileScanRDD do not take into account HDFS cache while > calculating preferredLocations > The enhancement can be easily implemented for large files where FilePartition > only contains single HDFS file > The enhancement will also result in significant performance improvement for > cached hdfs partitions -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org