Thanks for the reply.

Actually, I don't think excluding spark-hive from spark-submit --packages
is a good idea.

I don't want to recompile spark by assembly for my cluster, every time a
new spark release is out.

I prefer using binary version of spark and then adding some jars for job
execution. e.g. Add spark-hive for HiveContext usage.

FYI, spark-hive is just 1.2MB:
http://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10/1.4.0

On Wed, Jul 8, 2015 at 2:03 AM, Burak Yavuz <brk...@gmail.com> wrote:

> spark-hive is excluded when using --packages, because it can be included
> in the spark-assembly by adding -Phive during mvn package or sbt assembly.
>
> Best,
> Burak
>
> On Tue, Jul 7, 2015 at 8:06 AM, Hao Ren <inv...@gmail.com> wrote:
>
>> I want to add spark-hive as a dependence to submit my job, but it seems
>> that
>> spark-submit can not resolve it.
>>
>> $ ./bin/spark-submit \
>> → --packages
>>
>> org.apache.spark:spark-hive_2.10:1.4.0,org.postgresql:postgresql:9.3-1103-jdbc3,joda-time:joda-time:2.8.1
>> \
>> → --class fr.leboncoin.etl.jobs.dwh.AdStateTraceDWHTransform \
>> → --master spark://localhost:7077 \
>>
>> Ivy Default Cache set to: /home/invkrh/.ivy2/cache
>> The jars for the packages stored in: /home/invkrh/.ivy2/jars
>> https://repository.jboss.org/nexus/content/repositories/releases/ added
>> as a
>> remote repository with the name: repo-1
>> :: loading settings :: url =
>>
>> jar:file:/home/invkrh/workspace/scala/spark/assembly/target/scala-2.10/spark-assembly-1.4.0-SNAPSHOT-hadoop2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>> org.apache.spark#spark-hive_2.10 added as a dependency
>> org.postgresql#postgresql added as a dependency
>> joda-time#joda-time added as a dependency
>> :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
>>         confs: [default]
>>         found org.postgresql#postgresql;9.3-1103-jdbc3 in local-m2-cache
>>         found joda-time#joda-time;2.8.1 in central
>> :: resolution report :: resolve 139ms :: artifacts dl 3ms
>>         :: modules in use:
>>         joda-time#joda-time;2.8.1 from central in [default]
>>         org.postgresql#postgresql;9.3-1103-jdbc3 from local-m2-cache in
>> [default]
>>
>> ---------------------------------------------------------------------
>>         |                  |            modules            ||
>>  artifacts   |
>>         |       conf       | number| search|dwnlded|evicted||
>> number|dwnlded|
>>
>> ---------------------------------------------------------------------
>>         |      default     |   2   |   0   |   0   |   0   ||   2   |
>>  0   |
>>
>> ---------------------------------------------------------------------
>> :: retrieving :: org.apache.spark#spark-submit-parent
>>         confs: [default]
>>         0 artifacts copied, 2 already retrieved (0kB/6ms)
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/spark/sql/hive/HiveContext
>>         at java.lang.Class.forName0(Native Method)
>>         at java.lang.Class.forName(Class.java:348)
>>         at
>>
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:633)
>>         at
>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
>>         at
>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
>>         at
>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.spark.sql.hive.HiveContext
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>         ... 7 more
>> Using Spark's default log4j profile:
>> org/apache/spark/log4j-defaults.properties
>> 15/07/07 16:57:59 INFO Utils: Shutdown hook called
>>
>> Any help is appreciated. Thank you.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-can-not-resolve-spark-hive-2-10-tp23695.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


-- 
Hao Ren

Data Engineer @ leboncoin

Paris, France

Reply via email to