[ https://issues.apache.org/jira/browse/PIG-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051474#comment-16051474 ]
liyunzhang_intel commented on PIG-5246: --------------------------------------- [~nkollar]: bq. For Spark 2.x do we have to add all jar under $SPARK_HOME/jars? some guy suggested to to add all jar under $SPARK_HOME/jars in Hive on Spark([HIVE-15302|https://issues.apache.org/jira/browse/HIVE-15302]), It seems this is not accepted by [~vanzin]. But in [Hive wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started], it is said that we need not append all jars under $SPARK_HOME/jars. {noformat} Configuring Hive To add the Spark dependency to Hive: Prior to Hive 2.2.0, link the spark-assembly jar to HIVE_HOME/lib. Since Hive 2.2.0, Hive on Spark runs with Spark 2.0.0 and above, which doesn't have an assembly jar. To run with YARN mode (either yarn-client or yarn-cluster), link the following jars to HIVE_HOME/lib. scala-library spark-core spark-network-common To run with LOCAL mode (for debugging only), link the following jars in addition to those above to HIVE_HOME/lib. chill-java chill jackson-module-paranamer jackson-module-scala jersey-container-servlet-core jersey-server json4s-ast kryo-shaded minlog scala-xml spark-launcher spark-network-shuffle spark-unsafe xbean-asm5-shaded {noformat} I don't know whether there is performance influence if we append all jar under $SPARK_HOME/jars to the pig classpath. bq.Could we avoid creating temp files? Instead of creating spark.version, would something like this work? yes, this works, thanks for suggestion. > Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2 > ------------------------------------------------------------------------------ > > Key: PIG-5246 > URL: https://issues.apache.org/jira/browse/PIG-5246 > Project: Pig > Issue Type: Bug > Reporter: liyunzhang_intel > Assignee: liyunzhang_intel > Attachments: HBase9498.patch, PIG-5246.1.patch, PIG-5246_2.patch, > PIG-5246.patch > > > in bin/pig. > we copy assembly jar to pig's classpath in spark1.6. > {code} > # For spark mode: > # Please specify SPARK_HOME first so that we can locate > $SPARK_HOME/lib/spark-assembly*.jar, > # we will add spark-assembly*.jar to the classpath. > if [ "$isSparkMode" == "true" ]; then > if [ -z "$SPARK_HOME" ]; then > echo "Error: SPARK_HOME is not set!" > exit 1 > fi > # Please specify SPARK_JAR which is the hdfs path of spark-assembly*.jar > to allow YARN to cache spark-assembly*.jar on nodes so that it doesn't need > to be distributed each time an application runs. > if [ -z "$SPARK_JAR" ]; then > echo "Error: SPARK_JAR is not set, SPARK_JAR stands for the hdfs > location of spark-assembly*.jar. This allows YARN to cache > spark-assembly*.jar on nodes so that it doesn't need to be distributed each > time an application runs." > exit 1 > fi > if [ -n "$SPARK_HOME" ]; then > echo "Using Spark Home: " ${SPARK_HOME} > SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*` > CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR > fi > fi > {code} > after upgrade to spark2.0, we may modify it -- This message was sent by Atlassian JIRA (v6.4.14#64029)