Re: ordered ingestion not guaranteed

2018-05-12 Thread ravidspark
Jorn, Thanks for the response. My downstream database is Kudu. 1. Yes. As you have suggested, I have been using a central caching mechanism that caches the rdd results and to make a comparison with the next batch to check for the latest timestamps and ignore the old timestamps. But, I see

Re: ordered ingestion not guaranteed

2018-05-11 Thread Jörn Franke
What DB do you have? You have some options, such as 1) use a key value store (they can be accessed very efficiently) to see if there has been a newer key already processed - if yes then ignore value if no then insert into database 2) redesign the key to include the timestamp and find out the

ordered ingestion not guaranteed

2018-05-11 Thread ravidspark
Hi All, I am using Spark 2.2.0 & I have below use case: *Reading from Kafka using Spark Streaming and updating(not just inserting) the records into downstream database* I understand that the way Spark read messages from Kafka will not be in a order of timestamp as stored in Kafka partitions