If it is a small collection of them on the driver, you can just use sc.parallelize to create an RDD.
On Tue, Jan 13, 2015 at 7:56 AM, Malith Dhanushka <mmali...@gmail.com> wrote: > Hi Reynold, > > Thanks for the response. I am just wondering, lets say we have set of Row > objects. Isn't there a straightforward way of creating RDD[Row] out of it > without writing a custom RDD? > > ie - a utility method > > Thanks > Malith > > On Tue, Jan 13, 2015 at 2:29 PM, Reynold Xin <r...@databricks.com> wrote: > >> Depends on what the other side is doing. You can create your own RDD >> implementation by subclassing RDD, or it might work if you use >> sc.parallelize(1 to n, n).mapPartitionsWithIndex( /* code to read the data >> and return an iterator */ ) where n is the number of partitions. >> >> On Tue, Jan 13, 2015 at 12:51 AM, Niranda Perera < >> niranda.per...@gmail.com> wrote: >> >>> Hi, >>> >>> We have a custom datasources API, which connects to various data sources >>> and exposes them out as a common API. We are now trying to implement the >>> Spark datasources API released in 1.2.0 to connect Spark for analytics. >>> >>> Looking at the sources API, we figured out that we should extend a scan >>> class (table scan etc). While doing so, we would have to implement the >>> 'schema' and 'buildScan' methods. >>> >>> say, we can infer the schema of the underlying data and take data out as >>> Row elements. Is there any way we could create RDD[Row] (needed in the >>> buildScan method) using these Row elements? >>> >>> Cheers >>> -- >>> Niranda >>> >> >> > <email-mmali...@gmail.com> > >