[
https://issues.apache.org/jira/browse/PIG-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051474#comment-16051474
]
liyunzhang_intel commented on PIG-5246:
---------------------------------------
[~nkollar]:
bq. For Spark 2.x do we have to add all jar under $SPARK_HOME/jars?
some guy suggested to to add all jar under $SPARK_HOME/jars in Hive on
Spark([HIVE-15302|https://issues.apache.org/jira/browse/HIVE-15302]), It seems
this is not accepted by [~vanzin]. But in [Hive
wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started],
it is said that we need not append all jars under $SPARK_HOME/jars.
{noformat}
Configuring Hive
To add the Spark dependency to Hive:
Prior to Hive 2.2.0, link the spark-assembly jar to HIVE_HOME/lib.
Since Hive 2.2.0, Hive on Spark runs with Spark 2.0.0 and above, which doesn't
have an assembly jar.
To run with YARN mode (either yarn-client or yarn-cluster), link the following
jars to HIVE_HOME/lib.
scala-library
spark-core
spark-network-common
To run with LOCAL mode (for debugging only), link the following jars in
addition to those above to HIVE_HOME/lib.
chill-java chill jackson-module-paranamer jackson-module-scala
jersey-container-servlet-core
jersey-server json4s-ast kryo-shaded minlog scala-xml spark-launcher
spark-network-shuffle spark-unsafe xbean-asm5-shaded
{noformat}
I don't know whether there is performance influence if we append all jar under
$SPARK_HOME/jars to the pig classpath.
bq.Could we avoid creating temp files? Instead of creating spark.version, would
something like this work?
yes, this works, thanks for suggestion.
> Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2
> ------------------------------------------------------------------------------
>
> Key: PIG-5246
> URL: https://issues.apache.org/jira/browse/PIG-5246
> Project: Pig
> Issue Type: Bug
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Attachments: HBase9498.patch, PIG-5246.1.patch, PIG-5246_2.patch,
> PIG-5246.patch
>
>
> in bin/pig.
> we copy assembly jar to pig's classpath in spark1.6.
> {code}
> # For spark mode:
> # Please specify SPARK_HOME first so that we can locate
> $SPARK_HOME/lib/spark-assembly*.jar,
> # we will add spark-assembly*.jar to the classpath.
> if [ "$isSparkMode" == "true" ]; then
> if [ -z "$SPARK_HOME" ]; then
> echo "Error: SPARK_HOME is not set!"
> exit 1
> fi
> # Please specify SPARK_JAR which is the hdfs path of spark-assembly*.jar
> to allow YARN to cache spark-assembly*.jar on nodes so that it doesn't need
> to be distributed each time an application runs.
> if [ -z "$SPARK_JAR" ]; then
> echo "Error: SPARK_JAR is not set, SPARK_JAR stands for the hdfs
> location of spark-assembly*.jar. This allows YARN to cache
> spark-assembly*.jar on nodes so that it doesn't need to be distributed each
> time an application runs."
> exit 1
> fi
> if [ -n "$SPARK_HOME" ]; then
> echo "Using Spark Home: " ${SPARK_HOME}
> SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*`
> CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR
> fi
> fi
> {code}
> after upgrade to spark2.0, we may modify it
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)