Conflict resolution for data in spark streaming

Biplob Biswas Mon, 24 Jul 2017 06:31:34 -0700

Hi,

I have a situation where updates are coming from 2 different data sources,
this data at times are arriving in the same batch defined in streaming
context duration parameter of 500 ms  (recommended in spark according to
the documentation).


Now that is not the problem, the problem is that when the data is
partitioned to different executors, the order in which it originally
arrived, it's not processed in the same order, this I know because the
event data which comes last should be used for the updated state. This kind
of race condition exists and is not consistent.

Has anyone any idea to fix this issue? I am not really sure if anyone faced
this kind of any issue and if someone fixed anything like this?

Thanks & Regards
Biplob Biswas

Conflict resolution for data in spark streaming

Reply via email to