Zheren Yu created FLINK-24240: --------------------------------- Summary: HA JobGraph deserialization problem when migrate 1.12.4 to 1.13.2 Key: FLINK-24240 URL: https://issues.apache.org/jira/browse/FLINK-24240 Project: Flink Issue Type: Bug Components: Runtime / State Backends Affects Versions: 1.13.2 Reporter: Zheren Yu
We are using HA with flink on k8s, which will create the configmap like `xxx-dispatcher-leader`, and put jobGraph inside it, once we update version from 1.12.4 to 1.13.2 without stopping the job, the jobGraph create from old version will be deserialized and lacking of the filed of jobType, which cause the below problem ``` Caused by: java.lang.NullPointerException at org.apache.flink.runtime.deployment.TaskDeploymentDescriptorFactory$PartitionLocationConstraint.fromJobType(TaskDeploymentDescriptorFactory.java:282) ~[flink-dist_2.12-1.13.2.jar:1.13.2] at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:347) ~[flink-dist_2.12-1.13.2.jar:1.13.2] at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:190) ~[flink-dist_2.12-1.13.2.jar:1.13.2] at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:122) ~[flink-dist_2.12-1.13.2.jar:1.13.2] at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:132) ~[flink-dist_2.12-1.13.2.jar:1.13.2] at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110) ~[flink-dist_2.12-1.13.2.jar:1.13.2] at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:340) ~[flink-dist_2.12-1.13.2.jar:1.13.2] at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:317) ~[flink-dist_2.12-1.13.2.jar:1.13.2] at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:107) ~[flink-dist_2.12-1.13.2.jar:1.13.2] at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95) ~[flink-dist_2.12-1.13.2.jar:1.13.2] at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) ~[flink-dist_2.12-1.13.2.jar:1.13.2] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_302] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_302] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_302] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_302] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_302] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_302] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_302] at java.lang.Thread.run(Thread.java:748) ``` I just wandering do we have any workaround with this? (although I know manually stopping the job may work) -- This message was sent by Atlassian Jira (v8.3.4#803005)