Re: how can I write a language "wrapper"?

2015-06-29 Thread Justin Uang
My guess is that if you are just wrapping the spark sql APIs, you can get
away with not having to reimplement a lot of the complexities in Pyspark
like storing everything in RDDs as pickled byte arrays, pipelining RDDs,
doing aggregations and joins in the python interpreters, etc.

Since the canonical representation of objects in Spark SQL is in scala/jvm,
you're effectively just proxying calls to the java side. The only tricky
thing is UDFs, which naturally need to run in an interpreter of the wrapper
language. I'm currently thinking of redesigning the UDFs to be sent in a
language agnostic data format like protobufs or msgpack, so that all
language wrappers just need to implement the simple protocol of reading
those in, transforming it, then outputting it back as that language
agnostic format.

On Mon, Jun 29, 2015 at 6:39 AM Daniel Darabos <
daniel.dara...@lynxanalytics.com> wrote:

> Hi Vasili,
> It so happens that the entire SparkR code was merged to Apache Spark in a
> single pull request. So you can see at once all the required changes in
> https://github.com/apache/spark/pull/5096. It's 12,043 lines and took
> more than 20 people about a year to write as I understand it.
>
> On Mon, Jun 29, 2015 at 10:33 AM, Vasili I. Galchin 
> wrote:
>
>> Shivaram,
>>
>> Vis-a-vis Haskell support, I am reading DataFrame.R,
>> SparkRBackend*, context.R, et. al., am I headed in the correct
>> direction?/ Yes or no, please give more guidance. Thank you.
>>
>> Kind regards,
>>
>> Vasili
>>
>>
>>
>> On Tue, Jun 23, 2015 at 1:46 PM, Shivaram Venkataraman
>>  wrote:
>> > Every language has its own quirks / features -- so I don't think there
>> > exists a document on how to go about doing this for a new language. The
>> most
>> > related write up I know of is the wiki page on PySpark internals
>> > https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
>> written
>> > by Josh Rosen -- It covers some of the issues like closure capture,
>> > serialization, JVM communication that you'll need to handle for a new
>> > language.
>> >
>> > Thanks
>> > Shivaram
>> >
>> > On Tue, Jun 23, 2015 at 1:35 PM, Vasili I. Galchin > >
>> > wrote:
>> >>
>> >> Hello,
>> >>
>> >>   I want to add language support for another language(other than
>> >> Scala, Java et. al.). Where is documentation that explains to provide
>> >> support for a new language?
>> >>
>> >> Thank you,
>> >>
>> >> Vasili
>> >
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>


Re: how can I write a language "wrapper"?

2015-06-29 Thread Daniel Darabos
Hi Vasili,
It so happens that the entire SparkR code was merged to Apache Spark in a
single pull request. So you can see at once all the required changes in
https://github.com/apache/spark/pull/5096. It's 12,043 lines and took more
than 20 people about a year to write as I understand it.

On Mon, Jun 29, 2015 at 10:33 AM, Vasili I. Galchin 
wrote:

> Shivaram,
>
> Vis-a-vis Haskell support, I am reading DataFrame.R,
> SparkRBackend*, context.R, et. al., am I headed in the correct
> direction?/ Yes or no, please give more guidance. Thank you.
>
> Kind regards,
>
> Vasili
>
>
>
> On Tue, Jun 23, 2015 at 1:46 PM, Shivaram Venkataraman
>  wrote:
> > Every language has its own quirks / features -- so I don't think there
> > exists a document on how to go about doing this for a new language. The
> most
> > related write up I know of is the wiki page on PySpark internals
> > https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
> written
> > by Josh Rosen -- It covers some of the issues like closure capture,
> > serialization, JVM communication that you'll need to handle for a new
> > language.
> >
> > Thanks
> > Shivaram
> >
> > On Tue, Jun 23, 2015 at 1:35 PM, Vasili I. Galchin 
> > wrote:
> >>
> >> Hello,
> >>
> >>   I want to add language support for another language(other than
> >> Scala, Java et. al.). Where is documentation that explains to provide
> >> support for a new language?
> >>
> >> Thank you,
> >>
> >> Vasili
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: how can I write a language "wrapper"?

2015-06-29 Thread Vasili I. Galchin
Shivaram,

Vis-a-vis Haskell support, I am reading DataFrame.R,
SparkRBackend*, context.R, et. al., am I headed in the correct
direction?/ Yes or no, please give more guidance. Thank you.

Kind regards,

Vasili



On Tue, Jun 23, 2015 at 1:46 PM, Shivaram Venkataraman
 wrote:
> Every language has its own quirks / features -- so I don't think there
> exists a document on how to go about doing this for a new language. The most
> related write up I know of is the wiki page on PySpark internals
> https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals written
> by Josh Rosen -- It covers some of the issues like closure capture,
> serialization, JVM communication that you'll need to handle for a new
> language.
>
> Thanks
> Shivaram
>
> On Tue, Jun 23, 2015 at 1:35 PM, Vasili I. Galchin 
> wrote:
>>
>> Hello,
>>
>>   I want to add language support for another language(other than
>> Scala, Java et. al.). Where is documentation that explains to provide
>> support for a new language?
>>
>> Thank you,
>>
>> Vasili
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: how can I write a language "wrapper"?

2015-06-24 Thread Shivaram Venkataraman
The SparkR code is in the `R` directory i.e.
https://github.com/apache/spark/tree/master/R

Shivaram

On Wed, Jun 24, 2015 at 8:45 AM, Vasili I. Galchin 
wrote:

> Matei,
>
>  Last night I downloaded the Spark bundle.
> In order to save me time, can you give me the name of the SparkR example
> is and where it is in the Sparc tree?
>
> Thanks,
>
> Bill
>
> On Tuesday, June 23, 2015, Matei Zaharia  wrote:
>
>> Just FYI, it would be easiest to follow SparkR's example and add the
>> DataFrame API first. Other APIs will be designed to work on DataFrames
>> (most notably machine learning pipelines), and the surface of this API is
>> much smaller than of the RDD API. This API will also give you great
>> performance as we continue to optimize Spark SQL.
>>
>> Matei
>>
>> On Jun 23, 2015, at 1:46 PM, Shivaram Venkataraman <
>> shiva...@eecs.berkeley.edu> wrote:
>>
>> Every language has its own quirks / features -- so I don't think there
>> exists a document on how to go about doing this for a new language. The
>> most related write up I know of is the wiki page on PySpark internals
>> https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
>> written by Josh Rosen -- It covers some of the issues like closure capture,
>> serialization, JVM communication that you'll need to handle for a new
>> language.
>>
>> Thanks
>> Shivaram
>>
>> On Tue, Jun 23, 2015 at 1:35 PM, Vasili I. Galchin 
>> wrote:
>>
>>> Hello,
>>>
>>>   I want to add language support for another language(other than
>>> Scala, Java et. al.). Where is documentation that explains to provide
>>> support for a new language?
>>>
>>> Thank you,
>>>
>>> Vasili
>>>
>>
>>
>>


Re: how can I write a language "wrapper"?

2015-06-24 Thread Vasili I. Galchin
Matei,

 Last night I downloaded the Spark bundle.
In order to save me time, can you give me the name of the SparkR example is
and where it is in the Sparc tree?

Thanks,

Bill

On Tuesday, June 23, 2015, Matei Zaharia  wrote:

> Just FYI, it would be easiest to follow SparkR's example and add the
> DataFrame API first. Other APIs will be designed to work on DataFrames
> (most notably machine learning pipelines), and the surface of this API is
> much smaller than of the RDD API. This API will also give you great
> performance as we continue to optimize Spark SQL.
>
> Matei
>
> On Jun 23, 2015, at 1:46 PM, Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu
> > wrote:
>
> Every language has its own quirks / features -- so I don't think there
> exists a document on how to go about doing this for a new language. The
> most related write up I know of is the wiki page on PySpark internals
> https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
> written by Josh Rosen -- It covers some of the issues like closure capture,
> serialization, JVM communication that you'll need to handle for a new
> language.
>
> Thanks
> Shivaram
>
> On Tue, Jun 23, 2015 at 1:35 PM, Vasili I. Galchin  > wrote:
>
>> Hello,
>>
>>   I want to add language support for another language(other than
>> Scala, Java et. al.). Where is documentation that explains to provide
>> support for a new language?
>>
>> Thank you,
>>
>> Vasili
>>
>
>
>


Re: how can I write a language "wrapper"?

2015-06-23 Thread Matei Zaharia
Just FYI, it would be easiest to follow SparkR's example and add the DataFrame 
API first. Other APIs will be designed to work on DataFrames (most notably 
machine learning pipelines), and the surface of this API is much smaller than 
of the RDD API. This API will also give you great performance as we continue to 
optimize Spark SQL.

Matei

> On Jun 23, 2015, at 1:46 PM, Shivaram Venkataraman 
>  wrote:
> 
> Every language has its own quirks / features -- so I don't think there exists 
> a document on how to go about doing this for a new language. The most related 
> write up I know of is the wiki page on PySpark internals 
> https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals 
>  written 
> by Josh Rosen -- It covers some of the issues like closure capture, 
> serialization, JVM communication that you'll need to handle for a new 
> language. 
> 
> Thanks
> Shivaram
> 
> On Tue, Jun 23, 2015 at 1:35 PM, Vasili I. Galchin  > wrote:
> Hello,
> 
>   I want to add language support for another language(other than Scala, 
> Java et. al.). Where is documentation that explains to provide support for a 
> new language?
> 
> Thank you,
> 
> Vasili
> 



Re: how can I write a language "wrapper"?

2015-06-23 Thread Shivaram Venkataraman
Every language has its own quirks / features -- so I don't think there
exists a document on how to go about doing this for a new language. The
most related write up I know of is the wiki page on PySpark internals
https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals written
by Josh Rosen -- It covers some of the issues like closure capture,
serialization, JVM communication that you'll need to handle for a new
language.

Thanks
Shivaram

On Tue, Jun 23, 2015 at 1:35 PM, Vasili I. Galchin 
wrote:

> Hello,
>
>   I want to add language support for another language(other than
> Scala, Java et. al.). Where is documentation that explains to provide
> support for a new language?
>
> Thank you,
>
> Vasili
>


how can I write a language "wrapper"?

2015-06-23 Thread Vasili I. Galchin
Hello,

  I want to add language support for another language(other than Scala,
Java et. al.). Where is documentation that explains to provide support for
a new language?

Thank you,

Vasili