[jira] [Commented] (SPARK-2669) Hadoop configuration is not localised when submitting job in yarn-cluster mode

Apache Spark (JIRA) Thu, 24 Jul 2014 09:17:50 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073338#comment-14073338
 ]


Apache Spark commented on SPARK-2669:
-------------------------------------

User 'redbaron' has created a pull request for this issue:
https://github.com/apache/spark/pull/1574

> Hadoop configuration is not localised when submitting job in yarn-cluster mode
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-2669
>                 URL: https://issues.apache.org/jira/browse/SPARK-2669
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Maxim Ivanov
>
> I'd like to propose a fix for a problem when Hadoop configuration is not 
> localized when job is submitted in yarn-cluster mode. Here is a description 
> from github pull request https://github.com/apache/spark/pull/1574
> This patch fixes a problem when Spark driver is run in the container
> managed by YARN ResourceManager it inherits configuration from a
> NodeManager process, which can be different from the Hadoop
> configuration present on the client (submitting machine). Problem is
> most vivid when fs.defaultFS property differs between these two.
> Hadoop MR solves it by serializing client's Hadoop configuration into
> job.xml in application staging directory and then making Application
> Master to use it. That guarantees that regardless of execution nodes
> configurations all application containers use same config identical to
> one on the client side.
> This patch uses similar approach. YARN ClientBase serializes
> configuration and adds it to ClientDistributedCacheManager under
> "job.xml" link name. ClientDistributedCacheManager is then utilizes
> Hadoop localizer to deliver it to whatever container is started by this
> application, including the one running Spark driver.
> YARN ClientBase also adds "SPARK_LOCAL_HADOOPCONF" env variable to AM
> container request which is then used by SparkHadoopUtil.newConfiguration
> to trigger new behavior when machine-wide hadoop configuration is merged
> with application specific job.xml (exactly how it is done in Hadoop MR).
> SparkContext is then follows same approach, adding
> SPARK_LOCAL_HADOOPCONF env to all spawned containers to make them use
> client-side Hadopo configuration.
> Also all the references to "new Configuration()" which might be executed
> on YARN cluster side are changed to use SparkHadoopUtil.get.conf
> Please note that it fixes only core Spark, the part which I am
> comfortable to test and verify the result. I didn't descend into
> steaming/shark directories, so things might need to be changed there too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2669) Hadoop configuration is not localised when submitting job in yarn-cluster mode

Reply via email to