[ https://issues.apache.org/jira/browse/SPARK-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954983#comment-14954983 ]
Piotr Kołaczkowski commented on SPARK-6987: ------------------------------------------- Probably just having ability to list the host-names that Spark knows of would be enough. > Node Locality is determined with String Matching instead of Inet Comparison > --------------------------------------------------------------------------- > > Key: SPARK-6987 > URL: https://issues.apache.org/jira/browse/SPARK-6987 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core > Affects Versions: 1.2.0, 1.3.0 > Reporter: Russell Alexander Spitzer > > When determining whether or not a task can be run NodeLocal the > TaskSetManager ends up using a direct string comparison between the > preferredIp and the executor's bound interface. > https://github.com/apache/spark/blob/c84d91692aa25c01882bcc3f9fd5de3cfa786195/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L878-L880 > https://github.com/apache/spark/blob/c84d91692aa25c01882bcc3f9fd5de3cfa786195/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L488-L490 > This means that the preferredIp must be a direct string match of the ip the > the worker is bound to. This means that apis which are gathering data from > other distributed sources must develop their own mapping between the > interfaces bound (or exposed) by the external sources and the interface bound > by the Spark executor since these may be different. > For example, Cassandra exposes a broadcast rpc address which doesn't have to > match the address which the service is bound to. This means when adding > preferredLocation data we must add both the rpc and the listen address to > ensure that we can get a string match (and of course we are out of luck if > Spark has been bound on to another interface). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org