> On Aug. 27, 2015, 10:03 p.m., Rohini Palaniswamy wrote:
> > core/src/main/java/org/apache/oozie/service/SparkConfigurationService.java, 
> > line 85
> > <https://reviews.apache.org/r/37452/diff/1/?file=1039661#file1039661line85>
> >
> >     Should we be doing this? From 
> > http://spark.apache.org/docs/latest/running-on-yarn.html ,it refers to hdfs 
> > location or local installation in task node. Since it applies for other 
> > clients, should we retain that in Oozie as well or we are saying that Oozie 
> > is only going to use spark libraries via sharelib?  Or atleast it should be 
> > configurable to retain this setting to support users have local 
> > installation in task nodes.

In CDH 5.4, we were shipping the assembly jar in the sharelib, and this would 
override the sharelib jar.  And this config is loaded from the Oozie Server's 
host which can be a different location than the Launcher Job's host, where 
Spark is run.  We saw a problem with this when they didn't match.  With the 
dependency changes in this patch, the assembly jar isn't required in the 
sharelib (or at all), which is good because it's not published to maven.  When 
testing the different modes, at least one of them (I forget which) also had a 
weird Serialization error because the assembly jar was from a different build 
than the sharelib jars.

I'll add a config to enable/disable removing it to be flexible just in case 
someone wants to use it, but I think we should remove it by default.


> On Aug. 27, 2015, 10:03 p.m., Rohini Palaniswamy wrote:
> > sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java, 
> > lines 60-61
> > <https://reviews.apache.org/r/37452/diff/1/?file=1039664#file1039664line60>
> >
> >     Robert, 
> >        In our chat, you mentioned about ability to specify this 
> > alternatively as -master yarn -mode client and -master yarn -mode cluster. 
> > Will have to handle that as well.

Good point.  Also, I had the argument name wrong: it's --deploy-mode instead of 
--mode, but the behavior is as I described to you.


> On Aug. 27, 2015, 10:03 p.m., Rohini Palaniswamy wrote:
> > sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java, 
> > line 78
> > <https://reviews.apache.org/r/37452/diff/1/?file=1039664#file1039664line78>
> >
> >     If local mode will be ever used in Oozie, then all the new code can go 
> > into a if (yarnClusterMode || yarnClientMode) block to be done only for 
> > non-local mode.

local mode still requires setting --jars, so I'd have to duplicate that part of 
the new code in an else statement which I think might be harder to 
follow/maintain.  Can we leave this as is?


> On Aug. 27, 2015, 10:03 p.m., Rohini Palaniswamy wrote:
> > sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java, 
> > line 113
> > <https://reviews.apache.org/r/37452/diff/1/?file=1039664#file1039664line113>
> >
> >     Can you place local files also in spark.yarn.dist.files and spark takes 
> > care of shipping them like --jars? Asking because you are adding files from 
> > java.classpath to sparkJars. Atleast in hadoop mapreduce.cache.files have 
> > to be hdfs paths.

I don't think spark.yarn.dist.files actually sends the jars anywhere.  I'm 
pretty sure the combination of master/modes and the many jar-related Spark 
configs that I'm using is the only way that will work for each master/mode 
type.  It took a lot of trial-and-error and checking with our Spark team who 
wasn't 100% sure on the necessary configs either.  (I don't know why they had 
to make this so complicated)


> On Aug. 27, 2015, 10:03 p.m., Rohini Palaniswamy wrote:
> > sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java, 
> > line 123
> > <https://reviews.apache.org/r/37452/diff/1/?file=1039664#file1039664line123>
> >
> >     Can you add just add a comment here saying this is redundant for 
> > yarnClientMode as driver is the launcher jvm and it is already launched.

Surprisingly, IIRC, this is actually required even though you'd think it would 
be able to use the JVM's classpath (I think they must do something funny with 
classloaders).  I'll double check and add a comment if it's not.


- Robert


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37452/#review95987
-----------------------------------------------------------


On Aug. 13, 2015, 11:42 p.m., Robert Kanter wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37452/
> -----------------------------------------------------------
> 
> (Updated Aug. 13, 2015, 11:42 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-2277
>     https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/SparkConfigurationService.java 
> 1b7cf4a 
>   
> core/src/test/java/org/apache/oozie/service/TestSparkConfigurationService.java
>  b2c499d 
>   sharelib/spark/pom.xml 6f7e74a 
>   sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java 
> b18a0b9 
>   
> sharelib/spark/src/test/java/org/apache/oozie/action/hadoop/TestSparkActionExecutor.java
>  f271abc 
> 
> Diff: https://reviews.apache.org/r/37452/diff/
> 
> 
> Testing
> -------
> 
> - Ran unit tests with Hadoop 1 and Hadoop 2
> - Ran in a Hadoop 2 cluster with local, yarn-client, and yarn-cluster modes
> 
> 
> Thanks,
> 
> Robert Kanter
> 
>

Reply via email to