Re: Introduce a sbt plugin to deploy and submit jobs to a spark cluster on ec2

2015-08-26 Thread rake
This looks promising. I'm trying to use spark-ec2 to launch a cluster with
Spark 1.5.0-SNAPSHOT and failing.

Where should we ask questions, report problems?

I couple of questions I have already after looking through the project:

- Where does the configuration file /spark-deployer.conf/ go (what
folder)?
- Should spark-deployer work with the Nightly/latest Builds available at
the 3 links listed here:
 Nightly Builds
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-NightlyBuilds
  

Thanks,
  Randy Kerber



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Introduce-a-sbt-plugin-to-deploy-and-submit-jobs-to-a-spark-cluster-on-ec2-tp13703p13829.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Introduce a sbt plugin to deploy and submit jobs to a spark cluster on ec2

2015-08-26 Thread pishen tsai
Please ask questions at the gitter channel for now.
https://gitter.im/pishen/spark-deployer

- spark-deployer.conf should be placed in your project's root directory
(beside build.sbt)
- To use the nightly builds, you can replace the value of spark-tgz-url
in spark-deployer.conf to the tgz you want to test. Also, remember to
change the version of Spark in build.sbt. The SNAPSHOT version is not
tested, if you meet any problem, please report it at gitter.

Thanks,
pishen


2015-08-26 16:04 GMT+08:00 rake [via Apache Spark Developers List] 
ml-node+s1001551n13829...@n3.nabble.com:

 This looks promising. I'm trying to use spark-ec2 to launch a cluster with
 Spark 1.5.0-SNAPSHOT and failing.

 Where should we ask questions, report problems?

 I couple of questions I have already after looking through the project:

 - Where does the configuration file *spark-deployer.conf* go (what
 folder)?
 - Should spark-deployer work with the Nightly/latest Builds available
 at the 3 links listed here:
 Nightly Builds
 https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-NightlyBuilds

 Thanks,
   Randy Kerber

 --
 If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-developers-list.1001551.n3.nabble.com/Introduce-a-sbt-plugin-to-deploy-and-submit-jobs-to-a-spark-cluster-on-ec2-tp13703p13829.html
 To unsubscribe from Introduce a sbt plugin to deploy and submit jobs to a
 spark cluster on ec2, click here
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=13703code=cGlzaGVuMDJAZ21haWwuY29tfDEzNzAzfC0xNjIyODI0MzY2
 .
 NAML
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



Re: Introduce a sbt plugin to deploy and submit jobs to a spark cluster on ec2

2015-08-25 Thread pishen
Thank you for the suggestions, actually this project is already on
spark-packages for 1~2 months.
Then I think what I need is some promotions :P

2015-08-25 23:51 GMT+08:00 saurfang [via Apache Spark Developers List] 
ml-node+s1001551n1380...@n3.nabble.com:

 This is very cool. I also have a sbt plugin that automates some aspects of
 spark-submit but for a slightly different goal:
 https://github.com/saurfang/sbt-spark-submit

 The hope there is to address the problem that one can have many Spark main
 functions in a single jar and doing development often involves: change the
 code, sbt assembly, scp the jar to cluster, run spark-submit with fully
 qualified classpath and additional application arguments.
 With my plugin, I'm able to capture all these steps into customizable
 single sbt tasks that are easy to remember (and auto completes in sbt
 console) so you can have multiple sbt tasks corresponding to different Main
 functions, sub-projects and/or default arguments and make the
 build/deploy/submit cycle straight through.


 Currently this works great for YARN because YARN takes care of the jar
 upload and master URL discovery. I have long wanted to make my plugin work
 with spark-ec2 so I can upload jar and infer the master URL
 programatically.

 Thanks for sharing and like Akhil said it'd be nice to have it on
 spark-packages for discovery.

 --
 If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-developers-list.1001551.n3.nabble.com/Introduce-a-sbt-plugin-to-deploy-and-submit-jobs-to-a-spark-cluster-on-ec2-tp13703p13809.html
 To unsubscribe from Introduce a sbt plugin to deploy and submit jobs to a
 spark cluster on ec2, click here
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=13703code=cGlzaGVuMDJAZ21haWwuY29tfDEzNzAzfC0xNjIyODI0MzY2
 .
 NAML
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Introduce-a-sbt-plugin-to-deploy-and-submit-jobs-to-a-spark-cluster-on-ec2-tp13703p13810.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Introduce a sbt plugin to deploy and submit jobs to a spark cluster on ec2

2015-08-25 Thread Akhil Das
You can add it to the spark packages i guess http://spark-packages.org/

Thanks
Best Regards

On Fri, Aug 14, 2015 at 1:45 PM, pishen tsai pishe...@gmail.com wrote:

 Sorry for previous line-breaking format, try to resend the mail again.

 I have written a sbt plugin called spark-deployer, which is able to deploy
 a standalone spark cluster on aws ec2 and submit jobs to it.
 https://github.com/pishen/spark-deployer

 Compared to current spark-ec2 script, this design may have several
 benefits (features):
 1. All the code are written in Scala.
 2. Just add one line in your project/plugins.sbt and you are ready to go.
 (You don't have to download the python code and store it at someplace.)
 3. The whole development flow (write code for spark job, compile the code,
 launch the cluster, assembly and submit the job to master, terminate the
 cluster when the job is finished) can be done in sbt.
 4. Support parallel deployment of the worker machines by Scala's Future.
 5. Allow dynamically add or remove worker machines to/from the current
 cluster.
 6. All the configurations are stored in a typesafe config file. You don't
 need to store it elsewhere and map the settings into spark-ec2's command
 line arguments.
 7. The core library is separated from sbt plugin, hence it's possible to
 execute the deployment from an environment without sbt (only JVM is
 required).
 8. Support adjustable ec2 root disk size, custom security groups, custom
 ami (can run on default Amazon ami), custom spark tarball, and VPC. (Well,
 most of these are also supported in spark-ec2 in slightly different form,
 just mention it anyway.)

 Since this project is still in its early stage, it lacks some features of
 spark-ec2 such as self-installed HDFS (we use s3 directly), stoppable
 cluster, ganglia, and the copy script.
 However, it's already usable for our company and we are trying to move our
 production spark projects from spark-ec2 to spark-deployer.

 Any suggestion, testing help, or pull request are highly appreciated.

 On top of that, I would like to contribute this project to Spark, maybe as
 another choice (suggestion link) alongside spark-ec2 on Spark's official
 documentation.
 Of course, before that, I have to make this project stable enough (strange
 errors just happen on aws api from time to time).
 I'm wondering if this kind of contribution is possible and is there any
 rule to follow or anyone to contact?
 (Maybe the source code will not be merged into spark's main repository,
 since I've noticed that spark-ec2 is also planning to move out.)

 Regards,
 Pishen Tsai




Re: Introduce a sbt plugin to deploy and submit jobs to a spark cluster on ec2

2015-08-14 Thread pishen tsai
Sorry for previous line-breaking format, try to resend the mail again.

I have written a sbt plugin called spark-deployer, which is able to deploy
a standalone spark cluster on aws ec2 and submit jobs to it.
https://github.com/pishen/spark-deployer

Compared to current spark-ec2 script, this design may have several benefits
(features):
1. All the code are written in Scala.
2. Just add one line in your project/plugins.sbt and you are ready to go.
(You don't have to download the python code and store it at someplace.)
3. The whole development flow (write code for spark job, compile the code,
launch the cluster, assembly and submit the job to master, terminate the
cluster when the job is finished) can be done in sbt.
4. Support parallel deployment of the worker machines by Scala's Future.
5. Allow dynamically add or remove worker machines to/from the current
cluster.
6. All the configurations are stored in a typesafe config file. You don't
need to store it elsewhere and map the settings into spark-ec2's command
line arguments.
7. The core library is separated from sbt plugin, hence it's possible to
execute the deployment from an environment without sbt (only JVM is
required).
8. Support adjustable ec2 root disk size, custom security groups, custom
ami (can run on default Amazon ami), custom spark tarball, and VPC. (Well,
most of these are also supported in spark-ec2 in slightly different form,
just mention it anyway.)

Since this project is still in its early stage, it lacks some features of
spark-ec2 such as self-installed HDFS (we use s3 directly), stoppable
cluster, ganglia, and the copy script.
However, it's already usable for our company and we are trying to move our
production spark projects from spark-ec2 to spark-deployer.

Any suggestion, testing help, or pull request are highly appreciated.

On top of that, I would like to contribute this project to Spark, maybe as
another choice (suggestion link) alongside spark-ec2 on Spark's official
documentation.
Of course, before that, I have to make this project stable enough (strange
errors just happen on aws api from time to time).
I'm wondering if this kind of contribution is possible and is there any
rule to follow or anyone to contact?
(Maybe the source code will not be merged into spark's main repository,
since I've noticed that spark-ec2 is also planning to move out.)

Regards,
Pishen Tsai