[ 
https://issues.apache.org/jira/browse/BEAM-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041248#comment-17041248
 ] 

Kyle Weaver commented on BEAM-9225:
-----------------------------------

The Flink uber jar job server has two streams, one for state and one for 
messages. They share history [1] and don't yield a message if it is a duplicate 
according to that history [2]. So if the message stream reads the terminal 
state first, the state history will never yield it, and the job will never 
register as complete.

I avoided this in the Spark implementation by deferring de-duplication to 
later, so all messages are yielded regardless of whether they are duplicates or 
not.

[1] 
https://github.com/apache/beam/blob/57ce5b966cfc4f549082a47f50c29d5f9caa2909/sdks/python/apache_beam/runners/portability/abstract_job_service.py#L252
[2] 
https://github.com/apache/beam/blob/57ce5b966cfc4f549082a47f50c29d5f9caa2909/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L205

> Flink uber jar job server hangs
> -------------------------------
>
>                 Key: BEAM-9225
>                 URL: https://issues.apache.org/jira/browse/BEAM-9225
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>            Reporter: Kyle Weaver
>            Assignee: Kyle Weaver
>            Priority: Major
>              Labels: portability-flink
>             Fix For: 2.20.0
>
>
> This was observed on Kubernetes. I suspect this behavior might also be the 
> reason beam_PostCommit_PortableJar_Flink is timing out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to