[ 
https://issues.apache.org/jira/browse/PIG-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051474#comment-16051474
 ] 

liyunzhang_intel commented on PIG-5246:
---------------------------------------

[~nkollar]:
  bq. For Spark 2.x do we have to add all jar under $SPARK_HOME/jars?
some guy suggested to to add all jar under $SPARK_HOME/jars in Hive on 
Spark([HIVE-15302|https://issues.apache.org/jira/browse/HIVE-15302]), It seems 
this is not accepted by [~vanzin]. But in [Hive 
wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started],
 it is said that we need not append all jars under $SPARK_HOME/jars.
{noformat}
Configuring Hive
To add the Spark dependency to Hive:
Prior to Hive 2.2.0, link the spark-assembly jar to HIVE_HOME/lib.
Since Hive 2.2.0, Hive on Spark runs with Spark 2.0.0 and above, which doesn't 
have an assembly jar.
To run with YARN mode (either yarn-client or yarn-cluster), link the following 
jars to HIVE_HOME/lib.
scala-library
spark-core
spark-network-common
To run with LOCAL mode (for debugging only), link the following jars in 
addition to those above to HIVE_HOME/lib.
chill-java  chill  jackson-module-paranamer  jackson-module-scala  
jersey-container-servlet-core
jersey-server  json4s-ast  kryo-shaded  minlog  scala-xml  spark-launcher
spark-network-shuffle  spark-unsafe  xbean-asm5-shaded
{noformat}

 I don't know whether there is performance influence if we append all jar under 
$SPARK_HOME/jars to the pig classpath.
bq.Could we avoid creating temp files? Instead of creating spark.version, would 
something like this work?
yes, this works, thanks for suggestion.

> Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2
> ------------------------------------------------------------------------------
>
>                 Key: PIG-5246
>                 URL: https://issues.apache.org/jira/browse/PIG-5246
>             Project: Pig
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: HBase9498.patch, PIG-5246.1.patch, PIG-5246_2.patch, 
> PIG-5246.patch
>
>
> in bin/pig.
> we copy assembly jar to pig's classpath in spark1.6.
> {code}
> # For spark mode:
> # Please specify SPARK_HOME first so that we can locate 
> $SPARK_HOME/lib/spark-assembly*.jar,
> # we will add spark-assembly*.jar to the classpath.
> if [ "$isSparkMode"  == "true" ]; then
>     if [ -z "$SPARK_HOME" ]; then
>        echo "Error: SPARK_HOME is not set!"
>        exit 1
>     fi
>     # Please specify SPARK_JAR which is the hdfs path of spark-assembly*.jar 
> to allow YARN to cache spark-assembly*.jar on nodes so that it doesn't need 
> to be distributed each time an application runs.
>     if [ -z "$SPARK_JAR" ]; then
>        echo "Error: SPARK_JAR is not set, SPARK_JAR stands for the hdfs 
> location of spark-assembly*.jar. This allows YARN to cache 
> spark-assembly*.jar on nodes so that it doesn't need to be distributed each 
> time an application runs."
>        exit 1
>     fi
>     if [ -n "$SPARK_HOME" ]; then
>         echo "Using Spark Home: " ${SPARK_HOME}
>         SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*`
>         CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR
>     fi
> fi
> {code}
> after upgrade to spark2.0, we may modify it



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to