Re: Prefer Structured Streaming over Spark Streaming (DStreams)?

2018-02-02 Thread Biplob Biswas
Great to hear 2 different viewpoints, and thanks a lot for your input Michael. For now, our application perform an etl process where it reads data from kafka and stores it in HBase and then performs basic enhancement and pushes data out on a kafka topic. We have a conflict of opinion here as few

Re: Prefer Structured Streaming over Spark Streaming (DStreams)?

2018-01-31 Thread Michael Armbrust
At this point I recommend that new applications are built using structured streaming. The engine was GA-ed as of Spark 2.2 and I know of several very large (trillions of records) production jobs that are running in Structured Streaming. All of our production pipelines at databricks are written

Re: Prefer Structured Streaming over Spark Streaming (DStreams)?

2018-01-31 Thread vijay.bvp
here is my two cents, experts please correct me if wrong its important to understand why one over other and for what kind of use case. There might be sometime in future where low level API's are abstracted and become legacy but for now in Spark RDD API is the core and low level API, all higher

Prefer Structured Streaming over Spark Streaming (DStreams)?

2018-01-31 Thread Biplob Biswas
Hi, I read an article which recommended to use dataframes instead of rdd primitives. Now I read about the differences over using DStreams and Structured Streaming and structured streaming adds a lot of improvements like checkpointing, windowing, sessioning, fault tolerance etc. What I am