Depends on what the other side is doing. You can create your own RDD implementation by subclassing RDD, or it might work if you use sc.parallelize(1 to n, n).mapPartitionsWithIndex( /* code to read the data and return an iterator */ ) where n is the number of partitions.
On Tue, Jan 13, 2015 at 12:51 AM, Niranda Perera <niranda.per...@gmail.com> wrote: > Hi, > > We have a custom datasources API, which connects to various data sources > and exposes them out as a common API. We are now trying to implement the > Spark datasources API released in 1.2.0 to connect Spark for analytics. > > Looking at the sources API, we figured out that we should extend a scan > class (table scan etc). While doing so, we would have to implement the > 'schema' and 'buildScan' methods. > > say, we can infer the schema of the underlying data and take data out as > Row elements. Is there any way we could create RDD[Row] (needed in the > buildScan method) using these Row elements? > > Cheers > -- > Niranda >