Hi Thanks for you started this discussion for adding spark streaming support. 1. Please try to utilize the current code(structured streaming), not adding separated logic code for spark streaming. 2. I suggest that by default is using structured streaming , please consider how to make configuration for enabling/switching to spark streaming.
Regards Liang xm_zzc wrote > Hi dev: > Currently CarbonData 1.3(will be released soon) just support to > integrate > with Spark Structured Streaming which requires Kafka's version must be >= > 0.10. I think there are still many users integrating Spark Streaming with > kafka 0.8, at least our cluster is, but the cost of upgrading kafka is too > much. So should CarbonData need to integrate with Spark Streaming too? > > I think there are two ways to integrate with Spark Streaming, as > following: > 1). CarbonData batch data loading + Auto compaction > Use CarbonSession.createDataFrame to convert rdd to DataFrame in > InputDStream.foreachRDD, and then save rdd data into CarbonData table > which > support auto compaction. In this way, it can support to create > pre-aggregate > tables on this main table too (Streaming table does not support to create > pre-aggregate tables on it). > > I can test with this way in our QA env and add example to CarbonData. > > 2). The same as integration with Structured Streaming > With this way, Structured Streaming append every mini-batch data into > stream segment which is row format, and then when the size of stream > segment > is greater than 'carbon.streaming.segment.max.size', it will auto convert > stream segment to batch segment(column format) at the begin of each batch > and create a new stream segment to append data. > However, I have no idea how to integrate with Spark Streaming yet, *any > suggestion for this*? > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/