[
https://issues.apache.org/jira/browse/PIG-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734656#comment-14734656
]
Srikanth Sundarrajan commented on PIG-4667:
-------------------------------------------
[~xuefuz], Jars that are added to the spark context are available to the
executors and in yarn-client mode the driver is running within the same JVM as
pig and we have issues with neither of this. The issue really is in making all
the spark-libs available for the AM (which is invoking ExecutorLauncher in
yarn-client mode). If we dont have the assembly, spark code simply ships
spark-yarn jar (which is the jarOf(Client)), while spark-core and other
dependent libs such as scala, akka etc doesn't get shipped. ClientArguments()
class in spark-yarn module allows for additional jars to be added to dist-cache
and the AM classpath, however when we create a SparkContext, there doesn't seem
to be any way to pass these jars. Tried adding them to --files, though they are
added to dist-cache and are localized, they are not part of the classpath.
Here are the options that I am currently considering.
1. Create a maven pom to create a shaded assembly jar and then use them
2. Try using ant tasks to re-create shaded assembly similar to what
spark-assembly module
3. Allow users to specify SPARK_HOME and the wire up bin/pig to use the
artifacts from the SPARK_HOME, without which spark version will work with local
mode.
I am inclined to go with option #3, as it is clean and allows for us to keep in
line with changes that might happen in spark dependencies/packaging.
Would like to hear your thoughts.
> Enable Pig on Spark to run on Yarn Client/Cluster mode
> ------------------------------------------------------
>
> Key: PIG-4667
> URL: https://issues.apache.org/jira/browse/PIG-4667
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Srikanth Sundarrajan
> Assignee: Srikanth Sundarrajan
> Fix For: spark-branch
>
> Attachments: PIG-4667-logs.tgz
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)