Hi Priyank I have a similar structure, although I am reading from Kafka and sinking to multiple MySQL tables. My input stream has multiple message types and each is headed for a different MySQL table.
I've looked for a solution for a few months, and have only come up with two alternatives: 1. Since I'm already using a ForeachSink, because there is no native MySQL sink, I could sink each batch to the different tables in one sink. But, having only one spark job doing all the sinking seems like it will be confusing, and the sink itself will be fairly complex. 2. The same as your second option: have one job sort through the stream and persist the sorted stream to HDFS. Read the sorted streams in individual jobs and sink in to the appropriate tables. I haven't implemented it yet, but it seems to me that the code for 2 will be simpler, and operationally things will be clearer. If a job fails, I have a better understanding of what state it is in. Reading Manning's Big Data book from Nathan Marz and James Warren has been influencing how I structure Spark jobs recently. They don't shy away from persisting intermediate data sets, and I am embracing that right now in my thinking. Cheers! Dave -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org