Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Chetan Khatri
Correct, so the approach you suggested and Uber Jar Approach. What i think that Uber Jar approach is best practice because if you wish to do environment migration then would be easy. and Performance wise also Uber Jar Approach would be more optimised rather than Uber less approach. Thanks. On

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Andy Dang
We remodel Spark dependencies and ours together and chuck them under the /jars path. There are other ways to do it but we want the classpath to be strictly as close to development as possible. --- Regards, Andy On Fri, Dec 23, 2016 at 6:00 PM, Chetan Khatri

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Chetan Khatri
Andy, Thanks for reply. If we download all the dependencies at separate location and link with spark job jar on spark cluster, is it best way to execute spark job ? Thanks. On Fri, Dec 23, 2016 at 8:34 PM, Andy Dang wrote: > I used to use uber jar in Spark 1.x because of

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Andy Dang
I used to use uber jar in Spark 1.x because of classpath issues (we couldn't re-model our dependencies based on our code, and thus cluster's run dependencies could be very different from running Spark directly in the IDE. We had to use userClasspathFirst "hack" to work around this. With Spark 2,

Best Practice for Spark Job Jar Generation

2016-12-22 Thread Chetan Khatri
Hello Spark Community, For Spark Job Creation I use SBT Assembly to build Uber("Super") Jar and then submit to spark-submit. Example, bin/spark-submit --class hbase.spark.chetan.com.SparkHbaseJob /home/chetan/hbase-spark/SparkMSAPoc-assembly-1.0.jar But other folks has debate with for Uber