Xiaoju Wu created SPARK-24088: --------------------------------- Summary: only HadoopRDD leverage HDFS Cache as preferred location Key: SPARK-24088 URL: https://issues.apache.org/jira/browse/SPARK-24088 Project: Spark Issue Type: Improvement Components: Input/Output Affects Versions: 2.3.0 Reporter: Xiaoju Wu
Only HadoopRDD implements convertSplitLocationInfo which will convert location to HDFSCacheTaskLocation based on if the block is cached in Datanode memory. While FileScanRDD not. In FileScanRDD, all split location information is dropped. private[spark] def convertSplitLocationInfo( infos: Array[SplitLocationInfo]): Option[Seq[String]] = { Option(infos).map(_.flatMap { loc => val locationStr = loc.getLocation if (locationStr != "localhost") { if (loc.isInMemory) { logDebug(s"Partition $locationStr is cached by Hadoop.") Some(HDFSCacheTaskLocation(locationStr).toString) } else { Some(HostTaskLocation(locationStr).toString) } } else { None } }) } -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org