Re: create a SchemaRDD from a custom datasource

Reynold Xin Tue, 13 Jan 2015 01:00:50 -0800

Depends on what the other side is doing. You can create your own RDD
implementation by subclassing RDD, or it might work if you use
sc.parallelize(1 to n, n).mapPartitionsWithIndex( /* code to read the data
and return an iterator */ ) where n is the number of partitions.


On Tue, Jan 13, 2015 at 12:51 AM, Niranda Perera <niranda.per...@gmail.com>
wrote:

> Hi,
>
> We have a custom datasources API, which connects to various data sources
> and exposes them out as a common API. We are now trying to implement the
> Spark datasources API released in 1.2.0 to connect Spark for analytics.
>
> Looking at the sources API, we figured out that we should extend a scan
> class (table scan etc). While doing so, we would have to implement the
> 'schema' and 'buildScan' methods.
>
> say, we can infer the schema of the underlying data and take data out as
> Row elements. Is there any way we could create RDD[Row] (needed in the
> buildScan method) using these Row elements?
>
> Cheers
> --
> Niranda
>

Re: create a SchemaRDD from a custom datasource

Reply via email to