[ 
https://issues.apache.org/jira/browse/SPARK-19705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940897#comment-15940897
 ] 

gagan taneja commented on SPARK-19705:
--------------------------------------

Sandy
Would you be able to help with this bug.

I saw you had filed earlier bug for HDFS Cache support to RDD and this is 
relatively minor change. We have been running this code in production and we 
are able to achieve major performance boost
Below is the reference to earlier bug filed by you 
https://issues.apache.org/jira/browse/SPARK-1767

> Preferred location supporting HDFS Cache for FileScanRDD
> --------------------------------------------------------
>
>                 Key: SPARK-19705
>                 URL: https://issues.apache.org/jira/browse/SPARK-19705
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: gagan taneja
>
> Although NewHadoopRDD and HadoopRdd considers HDFS cache while calculating 
> preferredLocations, FileScanRDD do not take into account HDFS cache while 
> calculating preferredLocations
> The enhancement can be easily implemented for large files where FilePartition 
> only contains single HDFS file
> The enhancement will also result in significant performance improvement for 
> cached hdfs partitions



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to