[ 
https://issues.apache.org/jira/browse/SPARK-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603576#comment-14603576
 ] 

Perinkulam I Ganesh commented on SPARK-3528:
--------------------------------------------

Have a question:

If the driver is on one node and the slave on another node. Then the file may 
be local to the driver node but it won't be local on the slave. So is it proper 
to tag the file as NODE_LOCAL?

thanks

- P. I. 

> Reading data from file:/// should be called NODE_LOCAL not PROCESS_LOCAL
> ------------------------------------------------------------------------
>
>                 Key: SPARK-3528
>                 URL: https://issues.apache.org/jira/browse/SPARK-3528
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Andrew Ash
>            Priority: Critical
>
> Note that reading from {{file:///.../pom.xml}} is called a PROCESS_LOCAL task
> {noformat}
> spark> sc.textFile("pom.xml").count
> ...
> 14/09/15 00:59:13 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
> localhost, PROCESS_LOCAL, 1191 bytes)
> 14/09/15 00:59:13 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 
> localhost, PROCESS_LOCAL, 1191 bytes)
> 14/09/15 00:59:13 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
> 14/09/15 00:59:13 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
> 14/09/15 00:59:13 INFO HadoopRDD: Input split: 
> file:/Users/aash/git/spark/pom.xml:20862+20863
> 14/09/15 00:59:13 INFO HadoopRDD: Input split: 
> file:/Users/aash/git/spark/pom.xml:0+20862
> {noformat}
> There is an outstanding TODO in {{HadoopRDD.scala}} that may be related:
> {noformat}
>   override def getPreferredLocations(split: Partition): Seq[String] = {
>     // TODO: Filtering out "localhost" in case of file:// URLs
>     val hadoopSplit = split.asInstanceOf[HadoopPartition]
>     hadoopSplit.inputSplit.value.getLocations.filter(_ != "localhost")
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to