[ 
https://issues.apache.org/jira/browse/PIG-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321850#comment-15321850
 ] 

Srikanth Sundarrajan commented on PIG-4903:
-------------------------------------------

Looked at the latest patch, it still seems like SPARK_HOME (additionally 
SPARK_JAR) is being checked if they are present. Shouldn't we be doing this 
only for spark mode ? I think some special handling is necessary for this.

By default running pig -x spark will run this in local mode, and doesn't 
require any spark cluster or hdfs to be present and this allows new users to 
try and use it quickly. My feeling is that requiring and mandating these 
exports to be present always seems a bit unfriendly. But will not hold this 
back for this reason.

> Avoid add all spark dependency jars to  SPARK_YARN_DIST_FILES and 
> SPARK_DIST_CLASSPATH
> --------------------------------------------------------------------------------------
>
>                 Key: PIG-4903
>                 URL: https://issues.apache.org/jira/browse/PIG-4903
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>         Attachments: PIG-4903.patch, PIG-4903_1.patch, PIG-4903_2.patch
>
>
> There are some comments about bin/pig on 
> https://reviews.apache.org/r/45667/#comment198955.
> {code}
> ################# ADDING SPARK DEPENDENCIES ##################
> # Spark typically works with a single assembly file. However this
> # assembly isn't available as a artifact to pull in via ivy.
> # To work around this short coming, we add all the jars barring
> # spark-yarn to DIST through dist-files and then add them to classpath
> # of the executors through an independent env variable. The reason
> # for excluding spark-yarn is because spark-yarn is already being added
> # by the spark-yarn-client via jarOf(Client.Class)
> for f in $PIG_HOME/lib/*.jar; do
>     if [[ $f == $PIG_HOME/lib/spark-assembly* ]]; then
>         # Exclude spark-assembly.jar from shipped jars, but retain in 
> classpath
>         SPARK_JARS=${SPARK_JARS}:$f;
>     else
>         SPARK_JARS=${SPARK_JARS}:$f;
>         SPARK_YARN_DIST_FILES=${SPARK_YARN_DIST_FILES},file://$f;
>         SPARK_DIST_CLASSPATH=${SPARK_DIST_CLASSPATH}:\${PWD}/`basename $f`
>     fi
> done
> CLASSPATH=${CLASSPATH}:${SPARK_JARS}
> export SPARK_YARN_DIST_FILES=`echo ${SPARK_YARN_DIST_FILES} | sed 's/^,//g'`
> export SPARK_JARS=${SPARK_YARN_DIST_FILES}
> export SPARK_DIST_CLASSPATH
> {code}
> Here we first copy all spark dependency jar like 
> spark-network-shuffle_2.10-1.6.1 jar to distcache(SPARK_YARN_DIST_FILES) then 
> add them to the classpath of executor(SPARK_DIST_CLASSPATH). Actually we need 
> not copy all these depency jar to SPARK_DIST_CLASSPATH because all these 
> dependency jars are included in spark-assembly.jar and spark-assembly.jar is 
> uploaded with the spark job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to