Zheren Yu created FLINK-24240:
---------------------------------

             Summary: HA JobGraph deserialization problem when migrate 1.12.4 
to 1.13.2
                 Key: FLINK-24240
                 URL: https://issues.apache.org/jira/browse/FLINK-24240
             Project: Flink
          Issue Type: Bug
          Components: Runtime / State Backends
    Affects Versions: 1.13.2
            Reporter: Zheren Yu


We are using HA with flink on k8s, which will create the configmap like 
`xxx-dispatcher-leader`, and put jobGraph inside it, once we update version 
from 1.12.4 to 1.13.2 without stopping the job, the jobGraph create from old 
version will be deserialized and lacking of the filed of jobType, which cause 
the below problem

```
Caused by: java.lang.NullPointerException
        at 
org.apache.flink.runtime.deployment.TaskDeploymentDescriptorFactory$PartitionLocationConstraint.fromJobType(TaskDeploymentDescriptorFactory.java:282)
 ~[flink-dist_2.12-1.13.2.jar:1.13.2]
        at 
org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:347)
 ~[flink-dist_2.12-1.13.2.jar:1.13.2]
        at 
org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:190) 
~[flink-dist_2.12-1.13.2.jar:1.13.2]
        at 
org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:122)
 ~[flink-dist_2.12-1.13.2.jar:1.13.2]
        at 
org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:132)
 ~[flink-dist_2.12-1.13.2.jar:1.13.2]
        at 
org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110)
 ~[flink-dist_2.12-1.13.2.jar:1.13.2]
        at 
org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:340)
 ~[flink-dist_2.12-1.13.2.jar:1.13.2]
        at 
org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:317) 
~[flink-dist_2.12-1.13.2.jar:1.13.2]
        at 
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:107)
 ~[flink-dist_2.12-1.13.2.jar:1.13.2]
        at 
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95)
 ~[flink-dist_2.12-1.13.2.jar:1.13.2]
        at 
org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
 ~[flink-dist_2.12-1.13.2.jar:1.13.2]
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
 ~[?:1.8.0_302]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_302]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_302]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 ~[?:1.8.0_302]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 ~[?:1.8.0_302]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_302]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_302]
        at java.lang.Thread.run(Thread.java:748)
```

I just wandering do we have any workaround with this?
(although I know manually stopping the job may work)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to