Re: Notes on writing complex spark applications

2014-11-24 Thread Evan R. Sparks
Thanks Patrick,

You raise a good point - for this to be useful it's imperative that it is
updated with new versions of spark.

My thought with putting it on the wiki was that it's lower friction for
community members to edit, but it likely won't have the same level of
quality control as the existing documentation.

At a higher level - some of these tips are best practices for writing
applications that depend on Spark. I'm wondering if a new document is in
order for things like "this is how you set up a project skeleton to link
against spark," and "this is how you handle external libraries," - etc.? I
know that in the past I've run into stumbling blocks on things like getting
classpaths correct, trying to link against a different version of akka, and
so on that would be useful to have in such a document, in addition to some
of the application architecture suggestions we propose in *this* document.

- Evan

On Sun, Nov 23, 2014 at 9:02 PM, Patrick Wendell  wrote:

> Hey Evan,
>
> It might be nice to merge this into existing documentation. In
> particular, a lot of this could serve to update the current tuning
> section and programming guides.
>
> It could also work to paste this wholesale as a reference for Spark
> users, but in that case it's less likely to get updated when other
> things change, or be found by users reading through the spark docs.
>
> - Patrick
>
> On Sun, Nov 23, 2014 at 8:27 PM, Inkyu Lee  wrote:
> > Very helpful!!
> >
> > thank you very much!
> >
> > 2014-11-24 2:17 GMT+09:00 Sam Bessalah :
> >
> >> Thanks Evan, this is great.
> >> On Nov 23, 2014 5:58 PM, "Evan R. Sparks" 
> wrote:
> >>
> >> > Hi all,
> >> >
> >> > Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been
> >> > working on a short document about writing high performance Spark
> >> > applications based on our experience developing MLlib, GraphX,
> ml-matrix,
> >> > pipelines, etc. It may be a useful document both for users and new
> Spark
> >> > developers - perhaps it should go on the wiki?
> >> >
> >> > The document itself is here:
> >> >
> >> >
> >>
> https://docs.google.com/document/d/1gEIawzRsOwksV_bq4je3ofnd-7Xu-u409mdW-RXTDnQ/edit?usp=sharing
> >> > and I've created SPARK-4565
> >> >  to track this.
> >> >
> >> > - Evan
> >> >
> >>
>


Re: Notes on writing complex spark applications

2014-11-23 Thread Patrick Wendell
Hey Evan,

It might be nice to merge this into existing documentation. In
particular, a lot of this could serve to update the current tuning
section and programming guides.

It could also work to paste this wholesale as a reference for Spark
users, but in that case it's less likely to get updated when other
things change, or be found by users reading through the spark docs.

- Patrick

On Sun, Nov 23, 2014 at 8:27 PM, Inkyu Lee  wrote:
> Very helpful!!
>
> thank you very much!
>
> 2014-11-24 2:17 GMT+09:00 Sam Bessalah :
>
>> Thanks Evan, this is great.
>> On Nov 23, 2014 5:58 PM, "Evan R. Sparks"  wrote:
>>
>> > Hi all,
>> >
>> > Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been
>> > working on a short document about writing high performance Spark
>> > applications based on our experience developing MLlib, GraphX, ml-matrix,
>> > pipelines, etc. It may be a useful document both for users and new Spark
>> > developers - perhaps it should go on the wiki?
>> >
>> > The document itself is here:
>> >
>> >
>> https://docs.google.com/document/d/1gEIawzRsOwksV_bq4je3ofnd-7Xu-u409mdW-RXTDnQ/edit?usp=sharing
>> > and I've created SPARK-4565
>> >  to track this.
>> >
>> > - Evan
>> >
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Notes on writing complex spark applications

2014-11-23 Thread Inkyu Lee
Very helpful!!

thank you very much!

2014-11-24 2:17 GMT+09:00 Sam Bessalah :

> Thanks Evan, this is great.
> On Nov 23, 2014 5:58 PM, "Evan R. Sparks"  wrote:
>
> > Hi all,
> >
> > Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been
> > working on a short document about writing high performance Spark
> > applications based on our experience developing MLlib, GraphX, ml-matrix,
> > pipelines, etc. It may be a useful document both for users and new Spark
> > developers - perhaps it should go on the wiki?
> >
> > The document itself is here:
> >
> >
> https://docs.google.com/document/d/1gEIawzRsOwksV_bq4je3ofnd-7Xu-u409mdW-RXTDnQ/edit?usp=sharing
> > and I've created SPARK-4565
> >  to track this.
> >
> > - Evan
> >
>


Re: Notes on writing complex spark applications

2014-11-23 Thread Sam Bessalah
Thanks Evan, this is great.
On Nov 23, 2014 5:58 PM, "Evan R. Sparks"  wrote:

> Hi all,
>
> Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been
> working on a short document about writing high performance Spark
> applications based on our experience developing MLlib, GraphX, ml-matrix,
> pipelines, etc. It may be a useful document both for users and new Spark
> developers - perhaps it should go on the wiki?
>
> The document itself is here:
>
> https://docs.google.com/document/d/1gEIawzRsOwksV_bq4je3ofnd-7Xu-u409mdW-RXTDnQ/edit?usp=sharing
> and I've created SPARK-4565
>  to track this.
>
> - Evan
>


Re: Notes on writing complex spark applications

2014-11-23 Thread andy petrella
Cool!

On Sun Nov 23 2014 at 5:58:03 PM Evan R. Sparks 
wrote:

> Hi all,
>
> Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been
> working on a short document about writing high performance Spark
> applications based on our experience developing MLlib, GraphX, ml-matrix,
> pipelines, etc. It may be a useful document both for users and new Spark
> developers - perhaps it should go on the wiki?
>
> The document itself is here:
> https://docs.google.com/document/d/1gEIawzRsOwksV_
> bq4je3ofnd-7Xu-u409mdW-RXTDnQ/edit?usp=sharing
> and I've created SPARK-4565
>  to track this.
>
> - Evan
>


Notes on writing complex spark applications

2014-11-23 Thread Evan R. Sparks
Hi all,

Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been
working on a short document about writing high performance Spark
applications based on our experience developing MLlib, GraphX, ml-matrix,
pipelines, etc. It may be a useful document both for users and new Spark
developers - perhaps it should go on the wiki?

The document itself is here:
https://docs.google.com/document/d/1gEIawzRsOwksV_bq4je3ofnd-7Xu-u409mdW-RXTDnQ/edit?usp=sharing
and I've created SPARK-4565
 to track this.

- Evan