[ 
https://issues.apache.org/jira/browse/PIG-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036459#comment-16036459
 ] 

liyunzhang_intel commented on PIG-5246:
---------------------------------------

[~rohini]: thanks for suggestion, for spark1 and spark2, it will be done by 
checking for spark-assembly.jar or other things in the script and user need not 
specify the version of spark.
bq. For eg: In Spark JobMetricsListener will redirect to 
JobMetricsListenerSpark1 or JobMetricsListenerSpark2. But for users it makes it 
very simple as they can use same pig installation to run against any version.
It will be convenient for users in that way but not sure whether there is 
conflicts if both jars of spark1 and spark2 in the pig classpath.
 [~zjffdu]:  bq. Actually SPARK_ASSEMBLY_JAR is not a must-have thing for 
spark. 
  If SPARK_ASSEMBLY_JAR is not a must-have thing for spark1, how to judge 
spark1 or spark2?
bq.IMO, pig don't need to specify that, it is supposed to be set in 
spark-defaults.conf which would apply to all spark apps.
  Pig on Spark use spark installation and will copy 
$SPARK_HOME/lib/spark-assembly*jar(spark1) and $SPARK_HOME/jars/*jar to the 
classpath of pig. But we don't read spark-defaults.conf.  We parse 
pig.properties and save the configuration about spark to 
[SparkContext|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java#L584].

 

> Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2
> ------------------------------------------------------------------------------
>
>                 Key: PIG-5246
>                 URL: https://issues.apache.org/jira/browse/PIG-5246
>             Project: Pig
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: HBase9498.patch, PIG-5246.1.patch, PIG-5246.patch
>
>
> in bin/pig.
> we copy assembly jar to pig's classpath in spark1.6.
> {code}
> # For spark mode:
> # Please specify SPARK_HOME first so that we can locate 
> $SPARK_HOME/lib/spark-assembly*.jar,
> # we will add spark-assembly*.jar to the classpath.
> if [ "$isSparkMode"  == "true" ]; then
>     if [ -z "$SPARK_HOME" ]; then
>        echo "Error: SPARK_HOME is not set!"
>        exit 1
>     fi
>     # Please specify SPARK_JAR which is the hdfs path of spark-assembly*.jar 
> to allow YARN to cache spark-assembly*.jar on nodes so that it doesn't need 
> to be distributed each time an application runs.
>     if [ -z "$SPARK_JAR" ]; then
>        echo "Error: SPARK_JAR is not set, SPARK_JAR stands for the hdfs 
> location of spark-assembly*.jar. This allows YARN to cache 
> spark-assembly*.jar on nodes so that it doesn't need to be distributed each 
> time an application runs."
>        exit 1
>     fi
>     if [ -n "$SPARK_HOME" ]; then
>         echo "Using Spark Home: " ${SPARK_HOME}
>         SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*`
>         CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR
>     fi
> fi
> {code}
> after upgrade to spark2.0, we may modify it



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to