[ 
https://issues.apache.org/jira/browse/BEAM-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891188#comment-15891188
 ] 

Amit Sela commented on BEAM-1582:
---------------------------------

Looks like the flake happens when the entire input is re-read.
We inject 4 elements to Kafka before the first run, and 2 more before the 
second. When all is well, printing the number of elements read by 
SparkUnboundedSource's {{readUnboundedStream}} JavaDStream says 4 (sometimes 1 
followed by 3) in the first run, and 2 in the second, but in failures, it reads 
6 in the second.
This would happen if the checkpoint of the readers are not persisted for some 
reason causing the KafkaIO to use the default "earliest" and so read everything.
This happens even though checkpoint interval is batch interval. I will check if 
there's a way to guarantee/block on checkpointing.

> ResumeFromCheckpointStreamingTest flakes with what appears as a second firing.
> ------------------------------------------------------------------------------
>
>                 Key: BEAM-1582
>                 URL: https://issues.apache.org/jira/browse/BEAM-1582
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Amit Sela
>            Assignee: Amit Sela
>
> See: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-spark/2788/testReport/junit/org.apache.beam.runners.spark.translation.streaming/ResumeFromCheckpointStreamingTest/testWithResume/
> After some digging in it appears that a second firing occurs (though only one 
> is expected) but it doesn't come from a stale state (state is empty before it 
> fires).
> Might be a retry happening for some reason, which is OK in terms of 
> fault-tolerance guarantees (at-least-once), but not so much in terms of flaky 
> tests. 
> I'm looking into this hoping to fix this ASAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to