[jira] [Commented] (SPARK-6987) Node Locality is determined with String Matching instead of Inet Comparison

2015-10-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954983#comment-14954983
 ] 

Piotr Kołaczkowski commented on SPARK-6987:
---

Probably just having ability to list the host-names that Spark knows of would 
be enough.

> Node Locality is determined with String Matching instead of Inet Comparison
> ---
>
> Key: SPARK-6987
> URL: https://issues.apache.org/jira/browse/SPARK-6987
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Russell Alexander Spitzer
>
> When determining whether or not a task can be run NodeLocal the 
> TaskSetManager ends up using a direct string comparison between the 
> preferredIp and the executor's bound interface.
> https://github.com/apache/spark/blob/c84d91692aa25c01882bcc3f9fd5de3cfa786195/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L878-L880
> https://github.com/apache/spark/blob/c84d91692aa25c01882bcc3f9fd5de3cfa786195/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L488-L490
> This means that the preferredIp must be a direct string match of the ip the 
> the worker is bound to. This means that apis which are gathering data from 
> other distributed sources must develop their own mapping between the 
> interfaces bound (or exposed) by the external sources and the interface bound 
> by the Spark executor since these may be different. 
> For example, Cassandra exposes a broadcast rpc address which doesn't have to 
> match the address which the service is bound to. This means when adding 
> preferredLocation data we must add both the rpc and the listen address to 
> ensure that we can get a string match (and of course we are out of luck if 
> Spark has been bound on to another interface). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6987) Node Locality is determined with String Matching instead of Inet Comparison

2015-06-05 Thread Russell Alexander Spitzer (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574983#comment-14574983
 ] 

Russell Alexander Spitzer commented on SPARK-6987:
--

Or being able to specify an identifier for each spark worker that wasn't 
dependent on ip?

> Node Locality is determined with String Matching instead of Inet Comparison
> ---
>
> Key: SPARK-6987
> URL: https://issues.apache.org/jira/browse/SPARK-6987
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Russell Alexander Spitzer
>
> When determining whether or not a task can be run NodeLocal the 
> TaskSetManager ends up using a direct string comparison between the 
> preferredIp and the executor's bound interface.
> https://github.com/apache/spark/blob/c84d91692aa25c01882bcc3f9fd5de3cfa786195/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L878-L880
> https://github.com/apache/spark/blob/c84d91692aa25c01882bcc3f9fd5de3cfa786195/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L488-L490
> This means that the preferredIp must be a direct string match of the ip the 
> the worker is bound to. This means that apis which are gathering data from 
> other distributed sources must develop their own mapping between the 
> interfaces bound (or exposed) by the external sources and the interface bound 
> by the Spark executor since these may be different. 
> For example, Cassandra exposes a broadcast rpc address which doesn't have to 
> match the address which the service is bound to. This means when adding 
> preferredLocation data we must add both the rpc and the listen address to 
> ensure that we can get a string match (and of course we are out of luck if 
> Spark has been bound on to another interface). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6987) Node Locality is determined with String Matching instead of Inet Comparison

2015-06-03 Thread Russell Alexander Spitzer (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572218#comment-14572218
 ] 

Russell Alexander Spitzer commented on SPARK-6987:
--

For Inet comparison I think the best think you can do is compare resolved 
hostnames but even that isn't really great. I've been looking into other 
solutions but  haven't found anything really satisfactory.

Having each worker/executor list all interfaces could be useful, that way any 
service as long as it is bound to a real interface on the machine could be 
properly matched.

> Node Locality is determined with String Matching instead of Inet Comparison
> ---
>
> Key: SPARK-6987
> URL: https://issues.apache.org/jira/browse/SPARK-6987
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Russell Alexander Spitzer
>
> When determining whether or not a task can be run NodeLocal the 
> TaskSetManager ends up using a direct string comparison between the 
> preferredIp and the executor's bound interface.
> https://github.com/apache/spark/blob/c84d91692aa25c01882bcc3f9fd5de3cfa786195/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L878-L880
> https://github.com/apache/spark/blob/c84d91692aa25c01882bcc3f9fd5de3cfa786195/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L488-L490
> This means that the preferredIp must be a direct string match of the ip the 
> the worker is bound to. This means that apis which are gathering data from 
> other distributed sources must develop their own mapping between the 
> interfaces bound (or exposed) by the external sources and the interface bound 
> by the Spark executor since these may be different. 
> For example, Cassandra exposes a broadcast rpc address which doesn't have to 
> match the address which the service is bound to. This means when adding 
> preferredLocation data we must add both the rpc and the listen address to 
> ensure that we can get a string match (and of course we are out of luck if 
> Spark has been bound on to another interface). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6987) Node Locality is determined with String Matching instead of Inet Comparison

2015-05-27 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562120#comment-14562120
 ] 

holdenk commented on SPARK-6987:


What do you mean by inet comparison? If the problem is the same host has 
multiple interfaces with different IPs what would the fix be? (do you want each 
worker to list all of its ips?)

> Node Locality is determined with String Matching instead of Inet Comparison
> ---
>
> Key: SPARK-6987
> URL: https://issues.apache.org/jira/browse/SPARK-6987
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Russell Alexander Spitzer
>
> When determining whether or not a task can be run NodeLocal the 
> TaskSetManager ends up using a direct string comparison between the 
> preferredIp and the executor's bound interface.
> https://github.com/apache/spark/blob/c84d91692aa25c01882bcc3f9fd5de3cfa786195/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L878-L880
> https://github.com/apache/spark/blob/c84d91692aa25c01882bcc3f9fd5de3cfa786195/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L488-L490
> This means that the preferredIp must be a direct string match of the ip the 
> the worker is bound to. This means that apis which are gathering data from 
> other distributed sources must develop their own mapping between the 
> interfaces bound (or exposed) by the external sources and the interface bound 
> by the Spark executor since these may be different. 
> For example, Cassandra exposes a broadcast rpc address which doesn't have to 
> match the address which the service is bound to. This means when adding 
> preferredLocation data we must add both the rpc and the listen address to 
> ensure that we can get a string match (and of course we are out of luck if 
> Spark has been bound on to another interface). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org