[
https://issues.apache.org/jira/browse/SPARK-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967185#comment-13967185
]
Mridul Muralidharan commented on SPARK-542:
---
Spark uses only hostnames - not ip's.
Even for hostnames, it should ideally pick only the canonical hostname - not
the others.
This was done by design in 0.8 ... try to find if multiple host names/ip's are
all referring to the same physical host/container is fraught with too many
issues.
Cache Miss when machine have multiple hostname
--
Key: SPARK-542
URL: https://issues.apache.org/jira/browse/SPARK-542
Project: Spark
Issue Type: Bug
Reporter: frankvictor
HI, I encountered a weird runtime of pagerank in last few day.
After debugging the job, I found it was caused by the DNS name.
The machines of my cluster have multiple hostname, for example, slave 1 have
name (c001 and c001.cm.cluster)
when spark adding cache in cacheTracker, it get c001 and add cache use it.
But when schedule task in SimpleJob, the msos offer give spark
c001.cm.cluster.
so It will never get preferred location!
I thinks spark should handle the multiple hostname case(by using ip instead
of hostname, or some other methods).
Thanks!
--
This message was sent by Atlassian JIRA
(v6.2#6252)