Great to hear 2 different viewpoints, and thanks a lot for your input
Michael. For now, our application perform an etl process where it reads
data from kafka and stores it in HBase and then performs basic enhancement
and pushes data out on a kafka topic.
We have a conflict of opinion here as few
At this point I recommend that new applications are built using structured
streaming. The engine was GA-ed as of Spark 2.2 and I know of several very
large (trillions of records) production jobs that are running in Structured
Streaming. All of our production pipelines at databricks are written
here is my two cents, experts please correct me if wrong
its important to understand why one over other and for what kind of use
case. There might be sometime in future where low level API's are abstracted
and become legacy but for now in Spark RDD API is the core and low level
API, all higher