Just FYI, it would be easiest to follow SparkR's example and add the DataFrame 
API first. Other APIs will be designed to work on DataFrames (most notably 
machine learning pipelines), and the surface of this API is much smaller than 
of the RDD API. This API will also give you great performance as we continue to 
optimize Spark SQL.

Matei

> On Jun 23, 2015, at 1:46 PM, Shivaram Venkataraman 
> <shiva...@eecs.berkeley.edu> wrote:
> 
> Every language has its own quirks / features -- so I don't think there exists 
> a document on how to go about doing this for a new language. The most related 
> write up I know of is the wiki page on PySpark internals 
> https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals 
> <https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals> written 
> by Josh Rosen -- It covers some of the issues like closure capture, 
> serialization, JVM communication that you'll need to handle for a new 
> language. 
> 
> Thanks
> Shivaram
> 
> On Tue, Jun 23, 2015 at 1:35 PM, Vasili I. Galchin <vigalc...@gmail.com 
> <mailto:vigalc...@gmail.com>> wrote:
> Hello,
> 
>       I want to add language support for another language(other than Scala, 
> Java et. al.). Where is documentation that explains to provide support for a 
> new language?
> 
> Thank you,
> 
> Vasili
> 

Reply via email to