subject:"Re\: Prefer Structured Streaming over Spark Streaming \(DStreams\)\?"

Re: Prefer Structured Streaming over Spark Streaming (DStreams)?

2018-02-02 Thread Biplob Biswas

Great to hear 2 different viewpoints, and thanks a lot for your input Michael. For now, our application perform an etl process where it reads data from kafka and stores it in HBase and then performs basic enhancement and pushes data out on a kafka topic. We have a conflict of opinion here as few

Re: Prefer Structured Streaming over Spark Streaming (DStreams)?

2018-01-31 Thread Michael Armbrust

At this point I recommend that new applications are built using structured streaming. The engine was GA-ed as of Spark 2.2 and I know of several very large (trillions of records) production jobs that are running in Structured Streaming. All of our production pipelines at databricks are written

Re: Prefer Structured Streaming over Spark Streaming (DStreams)?

2018-01-31 Thread vijay.bvp

here is my two cents, experts please correct me if wrong its important to understand why one over other and for what kind of use case. There might be sometime in future where low level API's are abstracted and become legacy but for now in Spark RDD API is the core and low level API, all higher