Re: mvn or sbt for studying and developing Spark?

Michael Armbrust Sun, 16 Nov 2014 13:26:18 -0800

I'm going to have to disagree here.  If you are building a release
distribution or integrating with legacy systems then maven is probably the
correct choice.  However most of the core developers that I know use sbt,
and I think its a better choice for exploration and development overall.
That said, this probably falls into the category of a religious argument so
you might want to look at both options and decide for yourself.


In my experience the SBT build is significantly faster with less effort
(and I think sbt is still faster even if you go through the extra effort of
installing zinc) and easier to read.  The console mode of sbt (just run
sbt/sbt and then a long running console session is started that will accept
further commands) is great for building individual subprojects or running
single test suites.  In addition to being faster since its a long running
JVM, its got a lot of nice features like tab-completion for test case names.

For example, if I wanted to see what test cases are available in the SQL
subproject you can do the following:

[marmbrus@michaels-mbp spark (tpcds)]$ sbt/sbt
[info] Loading project definition from
/Users/marmbrus/workspace/spark/project/project
[info] Loading project definition from
/Users/marmbrus/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
[info] Set current project to spark-parent (in build
file:/Users/marmbrus/workspace/spark/)
> sql/test-only *<tab>*
--
 org.apache.spark.sql.CachedTableSuite
org.apache.spark.sql.DataTypeSuite
 org.apache.spark.sql.DslQuerySuite
org.apache.spark.sql.InsertIntoSuite
...

Another very useful feature is the development console, which starts an
interactive REPL including the most recent version of the code and a lot of
useful imports for some subprojects.  For example in the hive subproject it
automatically sets up a temporary database with a bunch of test data
pre-loaded:

$ sbt/sbt hive/console
> hive/console
...
import org.apache.spark.sql.hive._
import org.apache.spark.sql.hive.test.TestHive._
import org.apache.spark.sql.parquet.ParquetTestData
Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
1.7.0_45).
Type in expressions to have them evaluated.
Type :help for more information.

scala> sql("SELECT * FROM src").take(2)
res0: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86])

Michael

On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody <
dineshjweerakk...@gmail.com> wrote:

> Hi Stephen and Sean,
>
> Thanks for correction.
>
> On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen <so...@cloudera.com> wrote:
>
> > No, the Maven build is the main one.  I would use it unless you have a
> > need to use the SBT build in particular.
> > On Nov 16, 2014 2:58 AM, "Dinesh J. Weerakkody" <
> > dineshjweerakk...@gmail.com> wrote:
> >
> >> Hi Yiming,
> >>
> >> I believe that both SBT and MVN is supported in SPARK, but SBT is
> >> preferred
> >> (I'm not 100% sure about this :) ). When I'm using MVN I got some build
> >> failures. After that used SBT and works fine.
> >>
> >> You can go through these discussions regarding SBT vs MVN and learn pros
> >> and cons of both [1] [2].
> >>
> >> [1]
> >>
> >>
> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html
> >>
> >> [2]
> >>
> >>
> https://groups.google.com/forum/#!msg/spark-developers/OxL268v0-Qs/fBeBY8zmh3oJ
> >>
> >> Thanks,
> >>
> >> On Sun, Nov 16, 2014 at 7:11 AM, Yiming (John) Zhang <sdi...@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> >
> >> >
> >> > I am new in developing Spark and my current focus is about
> >> co-scheduling of
> >> > spark tasks. However, I am confused with the building tools: sometimes
> >> the
> >> > documentation uses mvn but sometimes uses sbt.
> >> >
> >> >
> >> >
> >> > So, my question is that which one is the preferred tool of Spark
> >> community?
> >> > And what's the technical difference between them? Thank you!
> >> >
> >> >
> >> >
> >> > Cheers,
> >> >
> >> > Yiming
> >> >
> >> >
> >>
> >>
> >> --
> >> Thanks & Best Regards,
> >>
> >> *Dinesh J. Weerakkody*
> >>
> >
>
>
> --
> Thanks & Best Regards,
>
> *Dinesh J. Weerakkody*
>

Re: mvn or sbt for studying and developing Spark?

Reply via email to