[jira] [Commented] (SPARK-7706) Allow setting YARN_CONF_DIR from spark argument

2015-05-18 Thread Shaik Idris Ali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548418#comment-14548418
 ] 

Shaik Idris Ali commented on SPARK-7706:


I think the cleaner way in SparkSubmitArguments.class is to support both Class 
arguments and Env variables, in fact that is how it done for most of the 
variables. And Spark java cmd can appropriately set these in classpath. Maybe 
it is uses ENV because these are generic variables.
{code}
executorMemory = Option(executorMemory)
  .orElse(sparkProperties.get("spark.executor.memory"))
  .orElse(env.get("SPARK_EXECUTOR_MEMORY"))
  .orNull
{code}
Meanwhile I will check if we have a simpler solution without fixing this in 
Spark. Thanks.




> Allow setting YARN_CONF_DIR from spark argument
> ---
>
> Key: SPARK-7706
> URL: https://issues.apache.org/jira/browse/SPARK-7706
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 1.3.1
>Reporter: Shaik Idris Ali
>  Labels: oozie, yarn
>
> Currently in SparkSubmitArguments.scala when master is set to "yarn" 
> (yarn-cluster mode)
> https://github.com/apache/spark/blob/b1f4ca82d170935d15f1fe6beb9af0743b4d81cd/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L236
> Spark checks if YARN_CONF_DIR or HADOOP_CONF_DIR is set in EVN.
> However we should additionally allow passing YARN_CONF_DIR from command line 
> argument this is particularly handy when Spark is being launched from 
> schedulers like OOZIE or FALCON.
> Reason being, oozie launcher App starts in one of the container assigned by 
> Yarn RM and we do not want to set YARN_CONF_DIR in ENV for all the nodes in 
> cluster. Just passing the argument like -yarnconfdir with conf dir (ex: 
> /etc/hadoop/conf) should avoid setting the ENV variable.
> This is blocking us to onboard spark from oozie or falcon. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7706) Allow setting YARN_CONF_DIR from spark argument

2015-05-18 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548322#comment-14548322
 ] 

Sean Owen commented on SPARK-7706:
--

YARN_CONF_DIR is a YARN env variable right? not Spark-specific or app-specific. 
It should point to the cluster's YARN configuration, which is not app-specific 
either. I think any gateway machine will have this set. Right, or, is this not 
valid for Oozie somehow?

> Allow setting YARN_CONF_DIR from spark argument
> ---
>
> Key: SPARK-7706
> URL: https://issues.apache.org/jira/browse/SPARK-7706
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 1.3.1
>Reporter: Shaik Idris Ali
>  Labels: oozie, yarn
>
> Currently in SparkSubmitArguments.scala when master is set to "yarn" 
> (yarn-cluster mode)
> https://github.com/apache/spark/blob/b1f4ca82d170935d15f1fe6beb9af0743b4d81cd/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L236
> Spark checks if YARN_CONF_DIR or HADOOP_CONF_DIR is set in EVN.
> However we should additionally allow passing YARN_CONF_DIR from command line 
> argument this is particularly handy when Spark is being launched from 
> schedulers like OOZIE or FALCON.
> Reason being, oozie launcher App starts in one of the container assigned by 
> Yarn RM and we do not want to set YARN_CONF_DIR in ENV for all the nodes in 
> cluster. Just passing the argument like -yarnconfdir with conf dir (ex: 
> /etc/hadoop/conf) should avoid setting the ENV variable.
> This is blocking us to onboard spark from oozie or falcon. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7706) Allow setting YARN_CONF_DIR from spark argument

2015-05-18 Thread Shaik Idris Ali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548352#comment-14548352
 ] 

Shaik Idris Ali commented on SPARK-7706:


YARN does not require YARN_CONF_DIR to be set on any machine (client or cluster 
nodes), this variable was introduced by Spark if I understand correctly.
As you said gateway machine will have this set pointing it to right conf 
directory. But this is not valid for Oozie, as Oozie might launch SparkMain 
from any of the node in yarn cluster.

> Allow setting YARN_CONF_DIR from spark argument
> ---
>
> Key: SPARK-7706
> URL: https://issues.apache.org/jira/browse/SPARK-7706
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 1.3.1
>Reporter: Shaik Idris Ali
>  Labels: oozie, yarn
>
> Currently in SparkSubmitArguments.scala when master is set to "yarn" 
> (yarn-cluster mode)
> https://github.com/apache/spark/blob/b1f4ca82d170935d15f1fe6beb9af0743b4d81cd/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L236
> Spark checks if YARN_CONF_DIR or HADOOP_CONF_DIR is set in EVN.
> However we should additionally allow passing YARN_CONF_DIR from command line 
> argument this is particularly handy when Spark is being launched from 
> schedulers like OOZIE or FALCON.
> Reason being, oozie launcher App starts in one of the container assigned by 
> Yarn RM and we do not want to set YARN_CONF_DIR in ENV for all the nodes in 
> cluster. Just passing the argument like -yarnconfdir with conf dir (ex: 
> /etc/hadoop/conf) should avoid setting the ENV variable.
> This is blocking us to onboard spark from oozie or falcon. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7706) Allow setting YARN_CONF_DIR from spark argument

2015-05-18 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548365#comment-14548365
 ] 

Sean Owen commented on SPARK-7706:
--

Hm, I'd swear I've used this outside Spark but I am not sure. Can the Oozie job 
shell out to invoke Spark? that might be safer in several regards rather than 
relying on the classpath/env of Oozie. Spark isn't intended to be invoked 
in-process like that, so I imagine you'll run into a number of things like this 
if you go down that road anyway.

> Allow setting YARN_CONF_DIR from spark argument
> ---
>
> Key: SPARK-7706
> URL: https://issues.apache.org/jira/browse/SPARK-7706
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 1.3.1
>Reporter: Shaik Idris Ali
>  Labels: oozie, yarn
>
> Currently in SparkSubmitArguments.scala when master is set to "yarn" 
> (yarn-cluster mode)
> https://github.com/apache/spark/blob/b1f4ca82d170935d15f1fe6beb9af0743b4d81cd/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L236
> Spark checks if YARN_CONF_DIR or HADOOP_CONF_DIR is set in EVN.
> However we should additionally allow passing YARN_CONF_DIR from command line 
> argument this is particularly handy when Spark is being launched from 
> schedulers like OOZIE or FALCON.
> Reason being, oozie launcher App starts in one of the container assigned by 
> Yarn RM and we do not want to set YARN_CONF_DIR in ENV for all the nodes in 
> cluster. Just passing the argument like -yarnconfdir with conf dir (ex: 
> /etc/hadoop/conf) should avoid setting the ENV variable.
> This is blocking us to onboard spark from oozie or falcon. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7706) Allow setting YARN_CONF_DIR from spark argument

2015-05-18 Thread Shaik Idris Ali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548306#comment-14548306
 ] 

Shaik Idris Ali commented on SPARK-7706:


We might have multiple types of applications running on Yarn cluster and it 
might not be a good idea to setup application specific Env variables in all the 
nodes of a cluster.

I understand that for submitting Ad-hoc jobs from command line it make sense to 
have YARN_CONF_DIR set in one of the machine which acts like Spark client. We 
can support both Spark argument and Env variable for such configurations.

> Allow setting YARN_CONF_DIR from spark argument
> ---
>
> Key: SPARK-7706
> URL: https://issues.apache.org/jira/browse/SPARK-7706
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 1.3.1
>Reporter: Shaik Idris Ali
>  Labels: oozie, yarn
>
> Currently in SparkSubmitArguments.scala when master is set to "yarn" 
> (yarn-cluster mode)
> https://github.com/apache/spark/blob/b1f4ca82d170935d15f1fe6beb9af0743b4d81cd/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L236
> Spark checks if YARN_CONF_DIR or HADOOP_CONF_DIR is set in EVN.
> However we should additionally allow passing YARN_CONF_DIR from command line 
> argument this is particularly handy when Spark is being launched from 
> schedulers like OOZIE or FALCON.
> Reason being, oozie launcher App starts in one of the container assigned by 
> Yarn RM and we do not want to set YARN_CONF_DIR in ENV for all the nodes in 
> cluster. Just passing the argument like -yarnconfdir with conf dir (ex: 
> /etc/hadoop/conf) should avoid setting the ENV variable.
> This is blocking us to onboard spark from oozie or falcon. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7706) Allow setting YARN_CONF_DIR from spark argument

2015-05-18 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548242#comment-14548242
 ] 

Sean Owen commented on SPARK-7706:
--

I am not sure Spark is designed to be invoked this way. You may need to invoke 
it in a shell.  but why isn't the YARN_CONF_DIR in the environment on a Hadoop 
cluster not correct and sufficient? that's the idea.

> Allow setting YARN_CONF_DIR from spark argument
> ---
>
> Key: SPARK-7706
> URL: https://issues.apache.org/jira/browse/SPARK-7706
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 1.3.1
>Reporter: Shaik Idris Ali
>  Labels: oozie, yarn
>
> Currently in SparkSubmitArguments.scala when master is set to "yarn" 
> (yarn-cluster mode)
> https://github.com/apache/spark/blob/b1f4ca82d170935d15f1fe6beb9af0743b4d81cd/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L236
> Spark checks if YARN_CONF_DIR or HADOOP_CONF_DIR is set in EVN.
> However we should additionally allow passing YARN_CONF_DIR from command line 
> argument this is particularly handy when Spark is being launched from 
> schedulers like OOZIE or FALCON.
> Reason being, oozie launcher App starts in one of the container assigned by 
> Yarn RM and we do not want to set YARN_CONF_DIR in ENV for all the nodes in 
> cluster. Just passing the argument like -yarnconfdir with conf dir (ex: 
> /etc/hadoop/conf) should avoid setting the ENV variable.
> This is blocking us to onboard spark from oozie or falcon. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7706) Allow setting YARN_CONF_DIR from spark argument

2015-05-18 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548243#comment-14548243
 ] 

Sean Owen commented on SPARK-7706:
--

I am not sure Spark is designed to be invoked this way. You may need to invoke 
it in a shell.  but why isn't the YARN_CONF_DIR in the environment on a Hadoop 
cluster not correct and sufficient? that's the idea.

> Allow setting YARN_CONF_DIR from spark argument
> ---
>
> Key: SPARK-7706
> URL: https://issues.apache.org/jira/browse/SPARK-7706
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 1.3.1
>Reporter: Shaik Idris Ali
>  Labels: oozie, yarn
>
> Currently in SparkSubmitArguments.scala when master is set to "yarn" 
> (yarn-cluster mode)
> https://github.com/apache/spark/blob/b1f4ca82d170935d15f1fe6beb9af0743b4d81cd/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L236
> Spark checks if YARN_CONF_DIR or HADOOP_CONF_DIR is set in EVN.
> However we should additionally allow passing YARN_CONF_DIR from command line 
> argument this is particularly handy when Spark is being launched from 
> schedulers like OOZIE or FALCON.
> Reason being, oozie launcher App starts in one of the container assigned by 
> Yarn RM and we do not want to set YARN_CONF_DIR in ENV for all the nodes in 
> cluster. Just passing the argument like -yarnconfdir with conf dir (ex: 
> /etc/hadoop/conf) should avoid setting the ENV variable.
> This is blocking us to onboard spark from oozie or falcon. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7706) Allow setting YARN_CONF_DIR from spark argument

2015-05-18 Thread Shaik Idris Ali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548203#comment-14548203
 ] 

Shaik Idris Ali commented on SPARK-7706:


Hi, [~srowen], 

Thanks for the quick response, sorry I did not get, basically the way actions 
are launched in oozie or any other scheduler is from a Java program. Which 
takes the Main class and bunch of arguments to that class.

Ex: org.apache.spark.deploy.SparkSubmit.main(args); and we do not require to 
set anything in System EVN variables.
Link to Oozie code:
https://github.com/apache/oozie/blob/master/sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java#L104

> Allow setting YARN_CONF_DIR from spark argument
> ---
>
> Key: SPARK-7706
> URL: https://issues.apache.org/jira/browse/SPARK-7706
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 1.3.1
>Reporter: Shaik Idris Ali
>  Labels: oozie, yarn
>
> Currently in SparkSubmitArguments.scala when master is set to "yarn" 
> (yarn-cluster mode)
> https://github.com/apache/spark/blob/b1f4ca82d170935d15f1fe6beb9af0743b4d81cd/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L236
> Spark checks if YARN_CONF_DIR or HADOOP_CONF_DIR is set in EVN.
> However we should additionally allow passing YARN_CONF_DIR from command line 
> argument this is particularly handy when Spark is being launched from 
> schedulers like OOZIE or FALCON.
> Reason being, oozie launcher App starts in one of the container assigned by 
> Yarn RM and we do not want to set YARN_CONF_DIR in ENV for all the nodes in 
> cluster. Just passing the argument like -yarnconfdir with conf dir (ex: 
> /etc/hadoop/conf) should avoid setting the ENV variable.
> This is blocking us to onboard spark from oozie or falcon. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7706) Allow setting YARN_CONF_DIR from spark argument

2015-05-18 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548106#comment-14548106
 ] 

Sean Owen commented on SPARK-7706:
--

Can't you just use {{YARN_CONF_DIR=... command ...}}?

> Allow setting YARN_CONF_DIR from spark argument
> ---
>
> Key: SPARK-7706
> URL: https://issues.apache.org/jira/browse/SPARK-7706
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 1.3.1
>Reporter: Shaik Idris Ali
>  Labels: oozie, yarn
>
> Currently in SparkSubmitArguments.scala when master is set to "yarn" 
> (yarn-cluster mode)
> https://github.com/apache/spark/blob/b1f4ca82d170935d15f1fe6beb9af0743b4d81cd/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L236
> Spark checks if YARN_CONF_DIR or HADOOP_CONF_DIR is set in EVN.
> However we should additionally allow passing YARN_CONF_DIR from command line 
> argument this is particularly handy when Spark is being launched from 
> schedulers like OOZIE or FALCON.
> Reason being, oozie launcher App starts in one of the container assigned by 
> Yarn RM and we do not want to set YARN_CONF_DIR in ENV for all the nodes in 
> cluster. Just passing the argument like -yarnconfdir with conf dir (ex: 
> /etc/hadoop/conf) should avoid setting the ENV variable.
> This is blocking us to onboard spark from oozie or falcon. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org