I have a case where my Flink job needs to consume multiple sources.  I have a 
topic in Kafka where the order of consuming is important.  Because the cost of 
S3 is much less than storage on Kafka, we have a job that sinks to S3.  The 
topic in Kafka can then retain just 3 days worth of data.  My job needs to 
first consume everything from the existing S3 file(s) and only then start 
consuming from the Kafka topic.  When using a union operator in Flink, the data 
comes in mixed from both sources.  Is there any way that I can control the 
ordering so that it first reads S3, then Kafka all in the same job?

Kurt

Reply via email to