[ 
https://issues.apache.org/jira/browse/PIG-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734656#comment-14734656
 ] 

Srikanth Sundarrajan commented on PIG-4667:
-------------------------------------------

[~xuefuz], Jars that are added to the spark context are available to the 
executors and in yarn-client mode the driver is running within the same JVM as 
pig and we have issues with neither of this. The issue really is in making all 
the spark-libs available for the AM (which is invoking ExecutorLauncher in 
yarn-client mode). If we dont have the assembly, spark code simply ships 
spark-yarn jar (which is the jarOf(Client)), while spark-core and other 
dependent libs such as scala, akka etc doesn't get shipped. ClientArguments() 
class in spark-yarn module allows for additional jars to be added to dist-cache 
and the AM classpath, however when we create a SparkContext, there doesn't seem 
to be any way to pass these jars. Tried adding them to --files, though they are 
added to dist-cache and are localized, they are not part of the classpath.

Here are the options that I am currently considering. 

1. Create a maven pom to create a shaded assembly jar and then use them
2. Try using ant tasks to re-create shaded assembly similar to what 
spark-assembly module 
3. Allow users to specify SPARK_HOME and the wire up bin/pig to use the 
artifacts from the SPARK_HOME, without which spark version will work with local 
mode.

I am inclined to go with option #3, as it is clean and allows for us to keep in 
line with changes that might happen in spark dependencies/packaging.

Would like to hear your thoughts.

> Enable Pig on Spark to run on Yarn Client/Cluster mode
> ------------------------------------------------------
>
>                 Key: PIG-4667
>                 URL: https://issues.apache.org/jira/browse/PIG-4667
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Srikanth Sundarrajan
>            Assignee: Srikanth Sundarrajan
>             Fix For: spark-branch
>
>         Attachments: PIG-4667-logs.tgz
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to