Re: Best Practice for Spark Job Jar Generation

Andy Dang Fri, 23 Dec 2016 10:12:34 -0800

We remodel Spark dependencies and ours together and chuck them under the
/jars path. There are other ways to do it but we want the classpath to be
strictly as close to development as possible.


-------
Regards,
Andy

On Fri, Dec 23, 2016 at 6:00 PM, Chetan Khatri <[email protected]>
wrote:

> Andy, Thanks for reply.
>
> If we download all the dependencies at separate location  and link with
> spark job jar on spark cluster, is it best way to execute spark job ?
>
> Thanks.
>
> On Fri, Dec 23, 2016 at 8:34 PM, Andy Dang <[email protected]> wrote:
>
>> I used to use uber jar in Spark 1.x because of classpath issues (we
>> couldn't re-model our dependencies based on our code, and thus cluster's
>> run dependencies could be very different from running Spark directly in the
>> IDE. We had to use userClasspathFirst "hack" to work around this.
>>
>> With Spark 2, it's easier to replace dependencies (say, Guava) than
>> before. We moved away from deploying superjar and just pass the libraries
>> as part of Spark jars (still can't use Guava v19 or later because Spark
>> uses a deprecated method that's not available, but that's not a big issue
>> for us).
>>
>> -------
>> Regards,
>> Andy
>>
>> On Fri, Dec 23, 2016 at 6:44 AM, Chetan Khatri <
>> [email protected]> wrote:
>>
>>> Hello Spark Community,
>>>
>>> For Spark Job Creation I use SBT Assembly to build Uber("Super") Jar and
>>> then submit to spark-submit.
>>>
>>> Example,
>>>
>>> bin/spark-submit --class hbase.spark.chetan.com.SparkHbaseJob
>>> /home/chetan/hbase-spark/SparkMSAPoc-assembly-1.0.jar
>>>
>>> But other folks has debate with for Uber Less Jar, Guys can you please
>>> explain me best practice industry standard for the same.
>>>
>>> Thanks,
>>>
>>> Chetan Khatri.
>>>
>>
>>
>

Re: Best Practice for Spark Job Jar Generation

Reply via email to