Hi, I have a situation where updates are coming from 2 different data sources, this data at times are arriving in the same batch defined in streaming context duration parameter of 500 ms (recommended in spark according to the documentation).
Now that is not the problem, the problem is that when the data is partitioned to different executors, the order in which it originally arrived, it's not processed in the same order, this I know because the event data which comes last should be used for the updated state. This kind of race condition exists and is not consistent. Has anyone any idea to fix this issue? I am not really sure if anyone faced this kind of any issue and if someone fixed anything like this? Thanks & Regards Biplob Biswas