Re: Apache Spark - Custom structured streaming data source

2018-01-26 Thread M Singh
Thanks TD.  When will 2.3 scheduled for release ?   

On Thursday, January 25, 2018 11:32 PM, Tathagata Das  
wrote:
 

 Hello Mans,
The streaming DataSource APIs are still evolving and are not public yet. Hence 
there is no official documentation. In fact, there is a new DataSourceV2 API 
(in Spark 2.3) that we are migrating towards. So at this point of time, it's 
hard to make any concrete suggestion. You can take a look at the classes 
DataSourceV2, DataReader, MicroBatchDataReader in the spark source code, along 
with their implementations.
Hope this helps. 
TD

On Jan 25, 2018 8:36 PM, "M Singh"  wrote:

Hi:
I am trying to create a custom structured streaming source and would like to 
know if there is any example or documentation on the steps involved.
I've looked at the some methods available in the SparkSession but these are 
internal to the sql package:
  private[sql] def internalCreateDataFrame(      catalystRows: 
RDD[InternalRow],      schema: StructType,      isStreaming: Boolean = false): 
DataFrame = {    // TODO: use MutableProjection when rowRDD is another 
DataFrame and the applied    // schema differs from the existing schema on any 
field data type.    val logicalPlan = LogicalRDD(      schema.toAttributes,     
 catalystRows,      isStreaming = isStreaming)(self)    Dataset.ofRows(self, 
logicalPlan)  } 
Please let me know where I can find the appropriate API or documentation.
Thanks
Mans



   

Re: Apache Spark - Custom structured streaming data source

2018-01-25 Thread Tathagata Das
Hello Mans,

The streaming DataSource APIs are still evolving and are not public yet.
Hence there is no official documentation. In fact, there is a new
DataSourceV2 API (in Spark 2.3) that we are migrating towards. So at this
point of time, it's hard to make any concrete suggestion. You can take a
look at the classes DataSourceV2, DataReader, MicroBatchDataReader in the
spark source code, along with their implementations.

Hope this helps.

TD

On Jan 25, 2018 8:36 PM, "M Singh"  wrote:

Hi:

I am trying to create a custom structured streaming source and would like
to know if there is any example or documentation on the steps involved.

I've looked at the some methods available in the SparkSession but these are
internal to the sql package:

  *private**[sql]* def internalCreateDataFrame(
  catalystRows: RDD[InternalRow],
  schema: StructType,
  isStreaming: Boolean = false): DataFrame = {
// TODO: use MutableProjection when rowRDD is another DataFrame and the
applied
// schema differs from the existing schema on any field data type.
val logicalPlan = LogicalRDD(
  schema.toAttributes,
  catalystRows,
  isStreaming = isStreaming)(self)
Dataset.ofRows(self, logicalPlan)
  }

Please let me know where I can find the appropriate API or documentation.

Thanks

Mans