[jira] [Created] (SPARK-24088) only HadoopRDD leverage HDFS Cache as preferred location

Xiaoju Wu (JIRA) Wed, 25 Apr 2018 08:46:52 -0700

Xiaoju Wu created SPARK-24088:
---------------------------------

             Summary: only HadoopRDD leverage HDFS Cache as preferred location
                 Key: SPARK-24088
                 URL: https://issues.apache.org/jira/browse/SPARK-24088
             Project: Spark
          Issue Type: Improvement
          Components: Input/Output
    Affects Versions: 2.3.0
            Reporter: Xiaoju Wu



Only HadoopRDD implements convertSplitLocationInfo which will convert location 
to HDFSCacheTaskLocation based on if the block is cached in Datanode memory.  
While FileScanRDD not. In FileScanRDD, all split location information is 
dropped. 

private[spark] def convertSplitLocationInfo(
 infos: Array[SplitLocationInfo]): Option[Seq[String]] = {
 Option(infos).map(_.flatMap { loc =>
 val locationStr = loc.getLocation
 if (locationStr != "localhost") {
 if (loc.isInMemory) {
 logDebug(s"Partition $locationStr is cached by Hadoop.")
 Some(HDFSCacheTaskLocation(locationStr).toString)
 } else {
 Some(HostTaskLocation(locationStr).toString)
 }
 } else {
 None
 }
 })
}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24088) only HadoopRDD leverage HDFS Cache as preferred location

Reply via email to