Hi,
*Info (Using):Spark Streaming Kafka 0.8 package* *Spark 2.2.1* *Kafka 1.0.1* As of now, I am feeding paragraphs in Kafka console producer and my Spark, which is acting as a receiver is printing the flattened words, which is a complete RDD operation. *My motive is to read two tables continuously (being updated) as two distinct Kafka topics being read as two Spark Dataframes and join them based on a key and produce the output. *(I am from Spark-SQL background, pardon my Spark-SQL-ish writing) *It may happen, the first topic is receiving new data 15 mins prior to the second topic, in that scenario, how to proceed? I should not lose any data.* As of now, I want to simply pass paragraphs, read them as RDD, convert to DF and then join to get the common keys as the output. (Just for R&D). Started using Spark Streaming and Kafka today itself. Please help! Thanks, Aakash.