Thanks for the enlightening solution! On Wed, Jul 1, 2015 at 12:03 AM Burak Yavuz <brk...@gmail.com> wrote:
> Hi, > In your build.sbt file, all the dependencies you have (hopefully they're > not too many, they only have a lot of transitive dependencies), for example: > ``` > libraryDependencies += "org.apache.hbase" % "hbase" % "1.1.1" > > libraryDependencies += "junit" % "junit" % "x" > > resolvers += "Some other repo" at "http://some.other.repo" > > resolvers += "Some other repo2" at "http://some.other.repo2" > ``` > > call `sbt package`, and then run spark-submit as: > > $ bin/spark-submit --packages org.apache.hbase:hbase:1.1.1, junit:junit:x > --repositories http://some.other.repo,http://some.other.repo2 $YOUR_JAR > > Best, > Burak > > > > > > On Mon, Jun 29, 2015 at 11:33 PM, SLiZn Liu <sliznmail...@gmail.com> > wrote: > >> Hi Burak, >> >> Is `--package` flag only available for maven, no sbt support? >> >> On Tue, Jun 30, 2015 at 2:26 PM Burak Yavuz <brk...@gmail.com> wrote: >> >>> You can pass `--packages your:comma-separated:maven-dependencies` to >>> spark submit if you have Spark 1.3 or greater. >>> >>> Best regards, >>> Burak >>> >>> On Mon, Jun 29, 2015 at 10:46 PM, SLiZn Liu <sliznmail...@gmail.com> >>> wrote: >>> >>>> Hey Spark Users, >>>> >>>> I'm writing a demo with Spark and HBase. What I've done is packaging a >>>> **fat jar**: place dependencies in `build.sbt`, and use `sbt assembly` to >>>> package **all dependencies** into one big jar. The rest work is copy the >>>> fat jar to Spark master node and then launch by `spark-submit`. >>>> >>>> The defect of the "fat jar" fashion is obvious: all dependencies is >>>> packed, yielding a huge jar file. Even worse, in my case, a vast amount of >>>> the conflicting package files in `~/.ivy/cache`fails when merging, I had >>>> to manually specify `MergingStrategy` as `rename` for all conflicting files >>>> to bypass this issue. >>>> >>>> Then I thought, there should exists an easier way to submit a "thin >>>> jar" with build.sbt-like file specifying dependencies, and then >>>> dependencies are automatically resolved across the cluster before the >>>> actual job is launched. I googled, except nothing related was found. Is >>>> this plausible, or is there other better ways to achieve the same goal? >>>> >>>> BEST REGARDS, >>>> Todd Leo >>>> >>> >>> >