Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-13 Thread Theodor Wübker
Hey Hector, thanks for your reply. Your assumption is entirely correct, I have a few Million datasets on the topic already to test a streaming use case. I am planning on testing it with a variety of settings, but the problems occur with any cluster-configuration. For example Parallelism 1 with

Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-13 Thread Hector Rios
Hi Theo In your initial email, you mentioned that you have "a bit of Data on it" when referring to your topic with ten partitions. Correct me if I'm wrong, but that sounds like the data in your topic is bounded and trying to test a streaming use-case. What kind of parallelism do you have

Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-12 Thread Theodor Wübker
Hey, so one more thing, the query looks like this: SELECT window_start, window_end, a, b, c, count(*) as x FROM TABLE(TUMBLE(TABLE data.v1, DESCRIPTOR(timeStampData), INTERVAL '1' HOUR)) GROUP BY window_start, window_end, a, b, c When the non-determinism occurs, the topic is not keyed at all.

Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-12 Thread Theodor Wübker
Hey Yuxia, thanks for your response. I figured too, that the events arrive in a (somewhat) random order and thus cause non-determinism. I used a Watermark like this:"timeStampData - INTERVAL '10' SECOND” . Increasing the Watermark Interval does not solve the problem though, the results are

Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-12 Thread yuxia
HI, Theo. I'm wondering what the Event-Time-Windowed Query you are using looks like. For example, how do you define the watermark? Considering you read records from the 10 partitions, and it may well that the records will arrive the window process operator out of order. Is it possible that the

Non-Determinism in Table-API with Kafka and Event Time

2023-02-12 Thread Theodor Wübker
Hey everyone, I experience non-determinism in my Table API Program at the moment and (as a relatively unexperienced Flink and Kafka user) I can’t really explain to myself why it happens. So, I have a topic with 10 Partitions and a bit of Data on it. Now I run a simple SELECT * query on this,