Hi,
I have an time critical spark application, which is taking sensor data from
kafka stream, storing in case class, applying transformations and then storing
in cassandra schema. The data needs to be stored in schema, in FIFO order.
The order is maintained at kafka queue but I am observing,
Spark Streaming will consumer and process data in parallel. So the order of
the output will depend not only on the order of the input but also in the
time it takes for each task to process. Different options, like
repartitions, sorts and shuffles at Spark level will also affect ordering,
so the