[ https://issues.apache.org/jira/browse/FLINK-24240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yun Tang updated FLINK-24240: ----------------------------- Component/s: (was: Runtime / State Backends) Runtime / Coordination > HA JobGraph deserialization problem when migrate 1.12.4 to 1.13.2 > ----------------------------------------------------------------- > > Key: FLINK-24240 > URL: https://issues.apache.org/jira/browse/FLINK-24240 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.13.2 > Reporter: Zheren Yu > Priority: Major > > We are using HA with flink on k8s, which will create the configmap like > `xxx-dispatcher-leader`, and put jobGraph inside it, once we update version > from 1.12.4 to 1.13.2 without stopping the job, the jobGraph create from old > version will be deserialized and lacking of the filed of jobType, which cause > the below problem > {code:java} > Caused by: java.lang.NullPointerException > at > org.apache.flink.runtime.deployment.TaskDeploymentDescriptorFactory$PartitionLocationConstraint.fromJobType(TaskDeploymentDescriptorFactory.java:282) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:347) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:190) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:122) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:132) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:340) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:317) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:107) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) > ~[?:1.8.0_302] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_302] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_302] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > ~[?:1.8.0_302] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > ~[?:1.8.0_302] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_302] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_302] > at java.lang.Thread.run(Thread.java:748) > {code} > I just wandering do we have any workaround with this? > (although I know manually stopping the job may work) -- This message was sent by Atlassian Jira (v8.3.4#803005)