[ https://issues.apache.org/jira/browse/PIG-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322904#comment-15322904 ]
Rohini Palaniswamy commented on PIG-4903: ----------------------------------------- bq. By default running pig -x spark will run this in local mode You mean spark_local mode? bq. Shouldn't we be doing this only for spark mode Agree with Srikanth. Those should only be mandatory for spark and not for mapreduce, tez or even spark_local . The exectype should be parsed out and checked against. You can iterate through the args and check for values of -x and -exectype. We do the same thing in our internal version of the bin/pig to setup environment to work with our deployment paths. It is in perl though, else I could have posted the code snippet here for you to copy and use. > Avoid add all spark dependency jars to SPARK_YARN_DIST_FILES and > SPARK_DIST_CLASSPATH > -------------------------------------------------------------------------------------- > > Key: PIG-4903 > URL: https://issues.apache.org/jira/browse/PIG-4903 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Attachments: PIG-4903.patch, PIG-4903_1.patch, PIG-4903_2.patch > > > There are some comments about bin/pig on > https://reviews.apache.org/r/45667/#comment198955. > {code} > ################# ADDING SPARK DEPENDENCIES ################## > # Spark typically works with a single assembly file. However this > # assembly isn't available as a artifact to pull in via ivy. > # To work around this short coming, we add all the jars barring > # spark-yarn to DIST through dist-files and then add them to classpath > # of the executors through an independent env variable. The reason > # for excluding spark-yarn is because spark-yarn is already being added > # by the spark-yarn-client via jarOf(Client.Class) > for f in $PIG_HOME/lib/*.jar; do > if [[ $f == $PIG_HOME/lib/spark-assembly* ]]; then > # Exclude spark-assembly.jar from shipped jars, but retain in > classpath > SPARK_JARS=${SPARK_JARS}:$f; > else > SPARK_JARS=${SPARK_JARS}:$f; > SPARK_YARN_DIST_FILES=${SPARK_YARN_DIST_FILES},file://$f; > SPARK_DIST_CLASSPATH=${SPARK_DIST_CLASSPATH}:\${PWD}/`basename $f` > fi > done > CLASSPATH=${CLASSPATH}:${SPARK_JARS} > export SPARK_YARN_DIST_FILES=`echo ${SPARK_YARN_DIST_FILES} | sed 's/^,//g'` > export SPARK_JARS=${SPARK_YARN_DIST_FILES} > export SPARK_DIST_CLASSPATH > {code} > Here we first copy all spark dependency jar like > spark-network-shuffle_2.10-1.6.1 jar to distcache(SPARK_YARN_DIST_FILES) then > add them to the classpath of executor(SPARK_DIST_CLASSPATH). Actually we need > not copy all these depency jar to SPARK_DIST_CLASSPATH because all these > dependency jars are included in spark-assembly.jar and spark-assembly.jar is > uploaded with the spark job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)