Re: Why spark-submit works with package not with jar

Mich Talebzadeh Tue, 20 Oct 2020 09:44:11 -0700

Thanks Sean and Russell. Much appreciated.

Just to clarify recently I had issues with different versions of Google
Guava jar files in building Uber jar file (to evict the unwanted ones).
These used to work a year and half ago using Google Dataproc compute
engines (comes with Spark preloaded) and I could create an Uber jar file.


Unfortunately this has become problematic now so tried to use spark-submit
instead as follows:

${SPARK_HOME}/bin/spark-submit \
                --master yarn \
                --deploy-mode client \
                --conf spark.executor.memoryOverhead=3000 \
                --class org.apache.spark.repl.Main \
                --name "Spark shell on Yarn" "$@"
                --driver-class-path /home/hduser/jars/ddhybrid.jar \
                --jars /home/hduser/jars/spark-bigquery-latest.jar, \
                       /home/hduser/jars/ddhybrid.jar \
                --packages com.github.samelamin:spark-bigquery_2.11:0.2.6

Effectively tailored spark-shell. However, I do not think there is a
mechanism to resolve jar conflicts without  building an Uber jar file
through SBT?

Cheers



On Tue, 20 Oct 2020 at 16:54, Russell Spitzer <russell.spit...@gmail.com>
wrote:

> --jar Adds only that jar
> --package adds the Jar and a it's dependencies listed in maven
>
> On Tue, Oct 20, 2020 at 10:50 AM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a scenario that I use in Spark submit as follows:
>>
>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars
>> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar,
>> */home/hduser/jars/spark-bigquery_2.11-0.2.6.jar*
>>
>> As you can see the jar files needed are added.
>>
>>
>> This comes back with error message as below
>>
>>
>> Creating model test.weights_MODEL
>>
>> java.lang.NoClassDefFoundError:
>> com/google/api/client/http/HttpRequestInitializer
>>
>>   at
>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19)
>>
>>   at
>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19)
>>
>>   at
>> com.samelamin.spark.bigquery.BigQuerySQLContext.runDMLQuery(BigQuerySQLContext.scala:105)
>>
>>   ... 76 elided
>>
>> Caused by: java.lang.ClassNotFoundException:
>> com.google.api.client.http.HttpRequestInitializer
>>
>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>>
>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>
>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>
>>
>>
>> So there is an issue with finding the class, although the jar file used
>>
>>
>> /home/hduser/jars/spark-bigquery_2.11-0.2.6.jar
>>
>> has it.
>>
>>
>> Now if *I remove the above jar file and replace it with the same version
>> but package* it works!
>>
>>
>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars
>> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar
>> *-**-packages com.github.samelamin:spark-bigquery_2.11:0.2.6*
>>
>>
>> I have read the write-ups about packages searching the maven
>> libraries etc. Not convinced why using the package should make so much
>> difference between a failure and success. In other words, when to use a
>> package rather than a jar.
>>
>>
>> Any ideas will be appreciated.
>>
>>
>> Thanks
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>

Re: Why spark-submit works with package not with jar

Reply via email to