[ https://issues.apache.org/jira/browse/BEAM-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041248#comment-17041248 ]
Kyle Weaver commented on BEAM-9225: ----------------------------------- The Flink uber jar job server has two streams, one for state and one for messages. They share history [1] and don't yield a message if it is a duplicate according to that history [2]. So if the message stream reads the terminal state first, the state history will never yield it, and the job will never register as complete. I avoided this in the Spark implementation by deferring de-duplication to later, so all messages are yielded regardless of whether they are duplicates or not. [1] https://github.com/apache/beam/blob/57ce5b966cfc4f549082a47f50c29d5f9caa2909/sdks/python/apache_beam/runners/portability/abstract_job_service.py#L252 [2] https://github.com/apache/beam/blob/57ce5b966cfc4f549082a47f50c29d5f9caa2909/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L205 > Flink uber jar job server hangs > ------------------------------- > > Key: BEAM-9225 > URL: https://issues.apache.org/jira/browse/BEAM-9225 > Project: Beam > Issue Type: Bug > Components: runner-flink > Reporter: Kyle Weaver > Assignee: Kyle Weaver > Priority: Major > Labels: portability-flink > Fix For: 2.20.0 > > > This was observed on Kubernetes. I suspect this behavior might also be the > reason beam_PostCommit_PortableJar_Flink is timing out. -- This message was sent by Atlassian Jira (v8.3.4#803005)