Re: Question about Data Sources API

2015-03-24 Thread Ashish Mukherjee
Hello Michael,

Thanks for your quick reply.

My question wrt Java/Scala was related to extending the classes to support
new custom data sources, so was wondering if those could be written in
Java, since our company is a Java shop.

The additional push downs I am looking for are aggregations with grouping
and sorting.

Essentially, I am trying to evaluate if this API can give me much of what
is possible with the Apache MetaModel project.

Regards,
Ashish

On Tue, Mar 24, 2015 at 1:57 PM, Michael Armbrust mich...@databricks.com
wrote:

 On Tue, Mar 24, 2015 at 12:57 AM, Ashish Mukherjee 
 ashish.mukher...@gmail.com wrote:

 1. Is the Data Source API stable as of Spark 1.3.0?


 It is marked DeveloperApi, but in general we do not plan to change even
 these APIs unless there is a very compelling reason to.


 2. The Data Source API seems to be available only in Scala. Is there any
 plan to make it available for Java too?


 We tried to make all the suggested interfaces (other than CatalystScan
 which exposes internals and is only for experimentation) usable from Java.
 Is there something in particular you are having trouble with?


 3.  Are only filters and projections pushed down to the data source and
 all the data pulled into Spark for other processing?


 For now, this is all that is provided by the public stable API.  We left a
 hook for more powerful push downs
 (sqlContext.experimental.extraStrategies), and would be interested in
 feedback on other operations we should push down as we expand the API.



Re: Question about Data Sources API

2015-03-24 Thread Michael Armbrust

 My question wrt Java/Scala was related to extending the classes to support
 new custom data sources, so was wondering if those could be written in
 Java, since our company is a Java shop.


Yes, you should be able to extend the required interfaces using Java.

The additional push downs I am looking for are aggregations with grouping
 and sorting.
 Essentially, I am trying to evaluate if this API can give me much of what
 is possible with the Apache MetaModel project.


We don't currently push those down today as our initial focus is on getting
data into Spark so that you can join with other sources and then do such
processing.  Its possible we will extend the pushdown API though in the
future.


Re: Question about Data Sources API

2015-03-24 Thread Michael Armbrust
On Tue, Mar 24, 2015 at 12:57 AM, Ashish Mukherjee 
ashish.mukher...@gmail.com wrote:

 1. Is the Data Source API stable as of Spark 1.3.0?


It is marked DeveloperApi, but in general we do not plan to change even
these APIs unless there is a very compelling reason to.


 2. The Data Source API seems to be available only in Scala. Is there any
 plan to make it available for Java too?


We tried to make all the suggested interfaces (other than CatalystScan
which exposes internals and is only for experimentation) usable from Java.
Is there something in particular you are having trouble with?


 3.  Are only filters and projections pushed down to the data source and
 all the data pulled into Spark for other processing?


For now, this is all that is provided by the public stable API.  We left a
hook for more powerful push downs
(sqlContext.experimental.extraStrategies), and would be interested in
feedback on other operations we should push down as we expand the API.