I have streams of data coming in from various applications through Kafka. These streams are converted into dataframes in Spark. I would like to join these dataframes on a common ID they all contain.
Since joining streaming dataframes is currently not supported, what is the current recommended way to join two dataFrames in a streaming context. Is it recommended to keep writing the streaming dataframes into some sink to convert them into static dataframes which can then be joined? Would this guarantee end-to-end exactly once and fault tolerant guarantees? Priyank