[ https://issues.apache.org/jira/browse/PIG-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734656#comment-14734656 ]
Srikanth Sundarrajan commented on PIG-4667: ------------------------------------------- [~xuefuz], Jars that are added to the spark context are available to the executors and in yarn-client mode the driver is running within the same JVM as pig and we have issues with neither of this. The issue really is in making all the spark-libs available for the AM (which is invoking ExecutorLauncher in yarn-client mode). If we dont have the assembly, spark code simply ships spark-yarn jar (which is the jarOf(Client)), while spark-core and other dependent libs such as scala, akka etc doesn't get shipped. ClientArguments() class in spark-yarn module allows for additional jars to be added to dist-cache and the AM classpath, however when we create a SparkContext, there doesn't seem to be any way to pass these jars. Tried adding them to --files, though they are added to dist-cache and are localized, they are not part of the classpath. Here are the options that I am currently considering. 1. Create a maven pom to create a shaded assembly jar and then use them 2. Try using ant tasks to re-create shaded assembly similar to what spark-assembly module 3. Allow users to specify SPARK_HOME and the wire up bin/pig to use the artifacts from the SPARK_HOME, without which spark version will work with local mode. I am inclined to go with option #3, as it is clean and allows for us to keep in line with changes that might happen in spark dependencies/packaging. Would like to hear your thoughts. > Enable Pig on Spark to run on Yarn Client/Cluster mode > ------------------------------------------------------ > > Key: PIG-4667 > URL: https://issues.apache.org/jira/browse/PIG-4667 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: Srikanth Sundarrajan > Assignee: Srikanth Sundarrajan > Fix For: spark-branch > > Attachments: PIG-4667-logs.tgz > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)