Re: Why spark-submit works with package not with jar

Nicolas Paris Tue, 20 Oct 2020 12:09:41 -0700

once you got the jars from --package in the ~/.ivy2 folder you can then
add the list to --jars . in this way there is no missing dependency.



ayan guha <guha.a...@gmail.com> writes:

> Hi
>
> One way to think of this is --packages is better when you have third party
> dependency and --jars is better when you have custom in-house built jars.
>
> On Wed, 21 Oct 2020 at 3:44 am, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Thanks Sean and Russell. Much appreciated.
>>
>> Just to clarify recently I had issues with different versions of Google
>> Guava jar files in building Uber jar file (to evict the unwanted ones).
>> These used to work a year and half ago using Google Dataproc compute
>> engines (comes with Spark preloaded) and I could create an Uber jar file.
>>
>> Unfortunately this has become problematic now so tried to use spark-submit
>> instead as follows:
>>
>> ${SPARK_HOME}/bin/spark-submit \
>>                 --master yarn \
>>                 --deploy-mode client \
>>                 --conf spark.executor.memoryOverhead=3000 \
>>                 --class org.apache.spark.repl.Main \
>>                 --name "Spark shell on Yarn" "$@"
>>                 --driver-class-path /home/hduser/jars/ddhybrid.jar \
>>                 --jars /home/hduser/jars/spark-bigquery-latest.jar, \
>>                        /home/hduser/jars/ddhybrid.jar \
>>                 --packages com.github.samelamin:spark-bigquery_2.11:0.2.6
>>
>> Effectively tailored spark-shell. However, I do not think there is a
>> mechanism to resolve jar conflicts without  building an Uber jar file
>> through SBT?
>>
>> Cheers
>>
>>
>>
>> On Tue, 20 Oct 2020 at 16:54, Russell Spitzer <russell.spit...@gmail.com>
>> wrote:
>>
>>> --jar Adds only that jar
>>> --package adds the Jar and a it's dependencies listed in maven
>>>
>>> On Tue, Oct 20, 2020 at 10:50 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a scenario that I use in Spark submit as follows:
>>>>
>>>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars
>>>> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar,
>>>> */home/hduser/jars/spark-bigquery_2.11-0.2.6.jar*
>>>>
>>>> As you can see the jar files needed are added.
>>>>
>>>>
>>>> This comes back with error message as below
>>>>
>>>>
>>>> Creating model test.weights_MODEL
>>>>
>>>> java.lang.NoClassDefFoundError:
>>>> com/google/api/client/http/HttpRequestInitializer
>>>>
>>>>   at
>>>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19)
>>>>
>>>>   at
>>>> com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19)
>>>>
>>>>   at
>>>> com.samelamin.spark.bigquery.BigQuerySQLContext.runDMLQuery(BigQuerySQLContext.scala:105)
>>>>
>>>>   ... 76 elided
>>>>
>>>> Caused by: java.lang.ClassNotFoundException:
>>>> com.google.api.client.http.HttpRequestInitializer
>>>>
>>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>>>>
>>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>
>>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>
>>>>
>>>>
>>>> So there is an issue with finding the class, although the jar file used
>>>>
>>>>
>>>> /home/hduser/jars/spark-bigquery_2.11-0.2.6.jar
>>>>
>>>> has it.
>>>>
>>>>
>>>> Now if *I remove the above jar file and replace it with the same
>>>> version but package* it works!
>>>>
>>>>
>>>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars
>>>> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar
>>>> *-**-packages com.github.samelamin:spark-bigquery_2.11:0.2.6*
>>>>
>>>>
>>>> I have read the write-ups about packages searching the maven
>>>> libraries etc. Not convinced why using the package should make so much
>>>> difference between a failure and success. In other words, when to use a
>>>> package rather than a jar.
>>>>
>>>>
>>>> Any ideas will be appreciated.
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>> --
> Best Regards,
> Ayan Guha


-- 
nicolas paris

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Why spark-submit works with package not with jar

Reply via email to