GitHub user wulei-bj-cn opened a pull request:

    https://github.com/apache/spark/pull/8533

    Function localHostName() is trying to fetch the hostname for each of …

    …the hosts, yet when "SPARK_LOCAL_HOSTNAME" is not set, i.e. 
customHostname is null, this function will try to fetch the IP addresses for 
the hosts. That's because localIpAddress.getHostAddress is called, which will 
fetch the IP addresses in case customHostname is null. However, the returned IP 
addresses (1.2.3.4) will not match the hostnames (host1) that are fetched from 
DFS file systems. Hence locality level will always be 'ANY' and lots of network 
I/O is introduced when input files are read from DFS file systems. Therefore, 
to make function return real hostnames when "SPARK_LOCAL_HOSTNAME" is not set, 
localIpAddress.getHostAddress is replaced by localIpAddress.getHostName, which 
will return a real hostname.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wulei-bj-cn/spark lei-branch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8533.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8533
    
----
commit 807524fdb89ac37bb84a73c09891cc1298f0ba84
Author: Lei Wu <wulei.bj...@gmail.com>
Date:   2015-08-31T07:02:34Z

    Function localHostName() is trying to fetch the hostname for each of the 
hosts, yet when "SPARK_LOCAL_HOSTNAME" is not set, i.e. customHostname is null, 
this function will try to fetch the IP addresses for the hosts. That's because 
localIpAddress.getHostAddress is called, which will fetch the IP addresses in 
case customHostname is null. However, the returned IP addresses (1.2.3.4) will 
not match the hostnames (host1) that are fetched from DFS file systems. Hence 
locality level will always be 'ANY' and lots of network I/O is introduced when 
input files are read from DFS file systems. Therefore, to make function return 
real hostnames when "SPARK_LOCAL_HOSTNAME" is not set, 
localIpAddress.getHostAddress is replaced by localIpAddress.getHostName, which 
will return a real hostname.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to