I have streams of data coming in from various applications through Kafka.
These streams are converted into dataframes in Spark.  I would like to join
these dataframes on a common ID they all contain.

Since  joining streaming dataframes is currently not supported, what is the
current recommended way to join two dataFrames  in a streaming context.

Is it recommended to keep writing the streaming dataframes into some sink
to convert them into static dataframes which can then be joined?  Would
this guarantee end-to-end exactly once and fault tolerant guarantees?

Priyank

Reply via email to