Jeesmon Jacob created FLINK-28187:
-------------------------------------

             Summary: Duplicate job submission for FlinkSessionJob
                 Key: FLINK-28187
                 URL: https://issues.apache.org/jira/browse/FLINK-28187
             Project: Flink
          Issue Type: Bug
          Components: Kubernetes Operator
    Affects Versions: kubernetes-operator-1.0.0
            Reporter: Jeesmon Jacob
         Attachments: flink-operator-log.txt

During a session job submission if a deployment error (ex: 
concurrent.TimeoutException) is hit, operator will submit the job again. But 
first submission could have succeeded in jobManager side and second submission 
could result in duplicate job. Operator log attached.

Per [~gyfora]:

The problem is that in case a deployment error was hit, the SessionJobObserver 
will not be able to tell whether it has submitted the job or not. So it will 
simply try to submit it again. We have to find a mechanism to correlate Jobs on 
the cluster with the SessionJob CR itself. Maybe we could override the job name 
itself for this purpose or something like that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to