[ 
https://issues.apache.org/jira/browse/FLINK-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626190#comment-16626190
 ] 

Ufuk Celebi commented on FLINK-10292:
-------------------------------------

I understand that non-determinism may be an issue when generating the 
{{JobGraph}}, but do we have some data about how common that is for 
applications? Would it be possible to keep a fixed JobGraph in the image 
instead of persisting one in the {{SubmittedJobGraphStore}}?

I like our current approach, because it keeps the source of truth for the job 
in the image instead of the {{SubmittedJobGraphStore}}. I'm wondering about the 
following scenario:
 * A user creates a job cluster with high availability enabled (cluster ID for 
the logical application, e.g. myapp)
 ** This will persist the job with a fixed ID (after FLINK-10291) on first 
submission
 * The user kills the application *without* cancelling
 ** This will leave all data in the high availability store(s) such as job 
graphs or checkpoints
 * The user updates the image with a modified application and keeps the high 
availability configuration (e.g. cluster ID stays myapp)
 ** This will result in the job in the image to be ignored since we already 
have a job graph with the same (fixed) ID

I think in such a scenario it can be desirable to still have the checkpoints 
available, but it might be problematic if the job graph is recovered from the 
{{SubmittedJobGraphStore}} instead of using the job that is part of the image. 
What do you think about this scenario? Is it the responsibility of the user to 
handle this? If so, I think that the approach outlined in this ticket makes 
sense. If not, we may want to consider alternatives or ignore potential 
non-determinism.

> Generate JobGraph in StandaloneJobClusterEntrypoint only once
> -------------------------------------------------------------
>
>                 Key: FLINK-10292
>                 URL: https://issues.apache.org/jira/browse/FLINK-10292
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination
>    Affects Versions: 1.6.0, 1.7.0
>            Reporter: Till Rohrmann
>            Assignee: vinoyang
>            Priority: Major
>             Fix For: 1.7.0, 1.6.2
>
>
> Currently the {{StandaloneJobClusterEntrypoint}} generates the {{JobGraph}} 
> from the given user code every time it starts/is restarted. This can be 
> problematic if the the {{JobGraph}} generation has side effects. Therefore, 
> it would be better to generate the {{JobGraph}} only once and store it in HA 
> storage instead from where to retrieve.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to