Introduce a sbt plugin to deploy and submit jobs to a spark cluster on ec2

pishen tsai Fri, 14 Aug 2015 00:57:12 -0700

Hello,

I have written a sbt plugin called spark-deployer, which is able to
deploy a standalone spark cluster on aws ec2 and submit jobs to it.
https://github.com/pishen/spark-deployer


Compared to current spark-ec2 script, this design may have several
benefits (features):
1. All the code are written in Scala.
2. Just add one line in your project/plugins.sbt and you are ready to
go. (You don't have to download the python code and store it at
someplace.)
3. The whole development flow (write code for spark job, compile the
code, launch the cluster, assembly and submit the job to master,
terminate the cluster when the job is finished) can be done in sbt.
4. Support parallel deployment of the worker machines by Scala's Future.
5. Allow dynamically add or remove worker machines to/from the current cluster.
6. All the configurations are stored in a typesafe config file. You
don't need to store it elsewhere and map the settings into spark-ec2's
command line arguments.
7. The core library is separated from sbt plugin, hence it's possible
to execute the deployment from an environment without sbt (only JVM is
required).
8. Support adjustable ec2 root disk size, custom security groups,
custom ami (can run on default Amazon ami), custom spark tarball, and
VPC. (Well, most of these are also supported in spark-ec2 in slightly
different form, just mention it anyway.)

Since this project is still in its early stage, it lacks some features
of spark-ec2 such as self-installed HDFS (we use s3 directly),
stoppable cluster, ganglia, and the copy script.
However, it's already usable for our company and we are trying to move
our production spark projects from spark-ec2 to spark-deployer.

Any suggestion, testing help, or pull request are highly appreicated.

On top of that, I would like to contribute this project to Spark,
maybe as another choice (suggestion link) alongside spark-ec2 on
Spark's official documentation.
Of course, before that, I have to make this project stable enough
(strange errors just happen on aws api from time to time).
I'm wondering if this kind of contribution is possible and is there
any rule to follow or anyone to contact?
(Maybe the source code will not be merged into spark's main
repository, since I've noticed that spark-ec2 is also planning to move
out.)

Regards,
Pishen Tsai

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Introduce a sbt plugin to deploy and submit jobs to a spark cluster on ec2

Reply via email to