Creating a SchemaRDD from an existing API

2014-11-27 Thread Niranda Perera
Hi,

I am evaluating Spark for an analytic component where we do batch
processing of data using SQL.

So, I am particularly interested in Spark SQL and in creating a SchemaRDD
from an existing API [1].

This API exposes elements in a database as datasources. Using the methods
allowed by this data source, we can access and edit data.

So, I want to create a custom SchemaRDD using the methods and provisions of
this API. I tried going through Spark documentation and the Java Docs, but
unfortunately, I was unable to come to a final conclusion if this was
actually possible.

I would like to ask the Spark Devs,
1. As of the current Spark release, can we make a custom SchemaRDD?
2. What is the extension point to a custom SchemaRDD? or are there
particular interfaces?
3. Could you please point me the specific docs regarding this matter?

Your help in this regard is highly appreciated.

Cheers

[1]
https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics

-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44 <https://twitter.com/N1R44>


Re: Creating a SchemaRDD from an existing API

2014-11-28 Thread Michael Armbrust
You probably don't need to create a new kind of SchemaRDD.  Instead I'd
suggest taking a look at the data sources API that we are adding in Spark
1.2.  There is not a ton of documentation, but the test cases show how to
implement the various interfaces
<https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources>,
and there is an example library for reading Avro data
<https://github.com/databricks/spark-avro>.

On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera  wrote:

> Hi,
>
> I am evaluating Spark for an analytic component where we do batch
> processing of data using SQL.
>
> So, I am particularly interested in Spark SQL and in creating a SchemaRDD
> from an existing API [1].
>
> This API exposes elements in a database as datasources. Using the methods
> allowed by this data source, we can access and edit data.
>
> So, I want to create a custom SchemaRDD using the methods and provisions of
> this API. I tried going through Spark documentation and the Java Docs, but
> unfortunately, I was unable to come to a final conclusion if this was
> actually possible.
>
> I would like to ask the Spark Devs,
> 1. As of the current Spark release, can we make a custom SchemaRDD?
> 2. What is the extension point to a custom SchemaRDD? or are there
> particular interfaces?
> 3. Could you please point me the specific docs regarding this matter?
>
> Your help in this regard is highly appreciated.
>
> Cheers
>
> [1]
>
> https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics
>
> --
> *Niranda Perera*
> Software Engineer, WSO2 Inc.
> Mobile: +94-71-554-8430
> Twitter: @n1r44 <https://twitter.com/N1R44>
>


Re: Creating a SchemaRDD from an existing API

2014-12-01 Thread Niranda Perera
Hi Michael,

About this new data source API, what type of data sources would it support?
Does it have to be RDBMS necessarily?

Cheers

On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust 
wrote:

> You probably don't need to create a new kind of SchemaRDD.  Instead I'd
> suggest taking a look at the data sources API that we are adding in Spark
> 1.2.  There is not a ton of documentation, but the test cases show how to
> implement the various interfaces
> <https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources>,
> and there is an example library for reading Avro data
> <https://github.com/databricks/spark-avro>.
>
> On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera  wrote:
>
>> Hi,
>>
>> I am evaluating Spark for an analytic component where we do batch
>> processing of data using SQL.
>>
>> So, I am particularly interested in Spark SQL and in creating a SchemaRDD
>> from an existing API [1].
>>
>> This API exposes elements in a database as datasources. Using the methods
>> allowed by this data source, we can access and edit data.
>>
>> So, I want to create a custom SchemaRDD using the methods and provisions
>> of
>> this API. I tried going through Spark documentation and the Java Docs, but
>> unfortunately, I was unable to come to a final conclusion if this was
>> actually possible.
>>
>> I would like to ask the Spark Devs,
>> 1. As of the current Spark release, can we make a custom SchemaRDD?
>> 2. What is the extension point to a custom SchemaRDD? or are there
>> particular interfaces?
>> 3. Could you please point me the specific docs regarding this matter?
>>
>> Your help in this regard is highly appreciated.
>>
>> Cheers
>>
>> [1]
>>
>> https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics
>>
>> --
>> *Niranda Perera*
>> Software Engineer, WSO2 Inc.
>> Mobile: +94-71-554-8430
>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>
>
>


-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44 <https://twitter.com/N1R44>


Re: Creating a SchemaRDD from an existing API

2014-12-01 Thread Michael Armbrust
No, it should support any data source that has a schema and can produce
rows.

On Mon, Dec 1, 2014 at 1:34 AM, Niranda Perera  wrote:

> Hi Michael,
>
> About this new data source API, what type of data sources would it
> support? Does it have to be RDBMS necessarily?
>
> Cheers
>
> On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust  > wrote:
>
>> You probably don't need to create a new kind of SchemaRDD.  Instead I'd
>> suggest taking a look at the data sources API that we are adding in Spark
>> 1.2.  There is not a ton of documentation, but the test cases show how
>> to implement the various interfaces
>> <https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources>,
>> and there is an example library for reading Avro data
>> <https://github.com/databricks/spark-avro>.
>>
>> On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera 
>> wrote:
>>
>>> Hi,
>>>
>>> I am evaluating Spark for an analytic component where we do batch
>>> processing of data using SQL.
>>>
>>> So, I am particularly interested in Spark SQL and in creating a SchemaRDD
>>> from an existing API [1].
>>>
>>> This API exposes elements in a database as datasources. Using the methods
>>> allowed by this data source, we can access and edit data.
>>>
>>> So, I want to create a custom SchemaRDD using the methods and provisions
>>> of
>>> this API. I tried going through Spark documentation and the Java Docs,
>>> but
>>> unfortunately, I was unable to come to a final conclusion if this was
>>> actually possible.
>>>
>>> I would like to ask the Spark Devs,
>>> 1. As of the current Spark release, can we make a custom SchemaRDD?
>>> 2. What is the extension point to a custom SchemaRDD? or are there
>>> particular interfaces?
>>> 3. Could you please point me the specific docs regarding this matter?
>>>
>>> Your help in this regard is highly appreciated.
>>>
>>> Cheers
>>>
>>> [1]
>>>
>>> https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics
>>>
>>> --
>>> *Niranda Perera*
>>> Software Engineer, WSO2 Inc.
>>> Mobile: +94-71-554-8430
>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>
>>
>>
>
>
> --
> *Niranda Perera*
> Software Engineer, WSO2 Inc.
> Mobile: +94-71-554-8430
> Twitter: @n1r44 <https://twitter.com/N1R44>
>