[ 
https://issues.apache.org/jira/browse/KAFKA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman reopened KAFKA-8940:
----------------------------------------
      Assignee:     (was: Guozhang Wang)

This failed again with
{code:java}
min fail: key=7-1006 actual=580 expected=7
{code}
The actual min output was
{code:java}
minEvents = 
ConsumerRecord(topic = min, partition = 0, leaderEpoch = 0, offset = 7, 
CreateTime = 1602115190769 key = 7-1006, value = 7) 
ConsumerRecord(topic = min, partition = 0, leaderEpoch = 0, offset = 17, 
CreateTime = 1602115193896, key = 7-1006, value = 7) 
ConsumerRecord(topic = min, partition = 0, leaderEpoch = 0, offset = 24, 
CreateTime = 1602115199973, key = 7-1006, value = 7) 
ConsumerRecord(topic = min, partition = 0, leaderEpoch = 0, offset = 33, 
CreateTime = 1602115204173, key = 7-1006, value = 580) 
ConsumerRecord(topic = min, partition = 0, leaderEpoch = 0, offset = 43, 
CreateTime = 1602115209606, key = 7-1006, value = 580)
{code}
So it expects to keep seeing 7 but jumps up to 580 at some point. Why? The 
relevant inputs are
{code:java}
CreateTime = 1602115186447, key = 7-1006, value = 7
...
CreateTime = 1602115200806, key = 7-1006, value = 580{code}
Let's say we convert these timestamps to days. The "7" record was created at 
18,542.9999 days past the epoch, while the "580" record was created at 18,543.0 
days.

So the "580" record was technically from the day after the "7" record. That 
definitely seems like a clue...

And sure enough, the SmokeTestClient's "min" aggregation is actually a windowed 
aggregation with tumbling windows of 1 day. And it strips out the start-time 
part of the windowed key to flatten the output back into the original keyspace, 
so the output verifier has no way of knowing that these values actually refer 
to different windows.

I haven't personally checked the timestamps of every other failure reported, 
but I'd be willing to bet that there's a pattern of input records spanning the 
24-mark and falling into different windows. The good news is this doesn't 
reflect any bug in Streams, but it's definitely a bug in the SmokeTest.

We could try to manipulate the input data to avoid this, but the better fix is 
to just account for the potentially varying time windows when verifying the 
output

> Flaky Test SmokeTestDriverIntegrationTest.shouldWorkWithRebalance
> -----------------------------------------------------------------
>
>                 Key: KAFKA-8940
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8940
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams, unit tests
>            Reporter: Guozhang Wang
>            Priority: Major
>              Labels: flaky-test
>
> I lost the screen shot unfortunately... it reports the set of expected 
> records does not match the received records.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to