[ https://issues.apache.org/jira/browse/SPARK-24088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463653#comment-16463653 ]
Marco Gaido commented on SPARK-24088: ------------------------------------- [~xiaojuwu] I don't understand which problem is stated here. {{FileScanRDD}} uses as preferred location the hosts form which the highest number of bytes can be retrieved. What is the problem with this policy? Which issue are you experiencing? > only HadoopRDD leverage HDFS Cache as preferred location > -------------------------------------------------------- > > Key: SPARK-24088 > URL: https://issues.apache.org/jira/browse/SPARK-24088 > Project: Spark > Issue Type: Improvement > Components: Input/Output > Affects Versions: 2.3.0 > Reporter: Xiaoju Wu > Priority: Minor > > Only HadoopRDD implements convertSplitLocationInfo which will convert > location to HDFSCacheTaskLocation based on if the block is cached in Datanode > memory. While FileScanRDD not. In FileScanRDD, all split location > information is dropped. > private[spark] def convertSplitLocationInfo( > infos: Array[SplitLocationInfo]): Option[Seq[String]] = { > Option(infos).map(_.flatMap { loc => > val locationStr = loc.getLocation > if (locationStr != "localhost") { > if (loc.isInMemory) { > logDebug(s"Partition $locationStr is cached by Hadoop.") > Some(HDFSCacheTaskLocation(locationStr).toString) > } else { > Some(HostTaskLocation(locationStr).toString) > } > } else { > None > } > }) > } -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org