[jira] [Updated] (FLINK-24240) HA JobGraph deserialization problem when migrate 1.12.4 to 1.13.2

Yun Tang (Jira) Fri, 17 Sep 2021 06:15:35 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-24240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yun Tang updated FLINK-24240:
-----------------------------
    Component/s:     (was: Runtime / State Backends)
                 Runtime / Coordination

> HA JobGraph deserialization problem when migrate 1.12.4 to 1.13.2
> -----------------------------------------------------------------
>
>                 Key: FLINK-24240
>                 URL: https://issues.apache.org/jira/browse/FLINK-24240
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.13.2
>            Reporter: Zheren Yu
>            Priority: Major
>
> We are using HA with flink on k8s, which will create the configmap like 
> `xxx-dispatcher-leader`, and put jobGraph inside it, once we update version 
> from 1.12.4 to 1.13.2 without stopping the job, the jobGraph create from old 
> version will be deserialized and lacking of the filed of jobType, which cause 
> the below problem
> {code:java}
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.flink.runtime.deployment.TaskDeploymentDescriptorFactory$PartitionLocationConstraint.fromJobType(TaskDeploymentDescriptorFactory.java:282)
>  ~[flink-dist_2.12-1.13.2.jar:1.13.2]
>       at 
> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:347)
>  ~[flink-dist_2.12-1.13.2.jar:1.13.2]
>       at 
> org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:190)
>  ~[flink-dist_2.12-1.13.2.jar:1.13.2]
>       at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:122)
>  ~[flink-dist_2.12-1.13.2.jar:1.13.2]
>       at 
> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:132)
>  ~[flink-dist_2.12-1.13.2.jar:1.13.2]
>       at 
> org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110)
>  ~[flink-dist_2.12-1.13.2.jar:1.13.2]
>       at 
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:340)
>  ~[flink-dist_2.12-1.13.2.jar:1.13.2]
>       at 
> org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:317) 
> ~[flink-dist_2.12-1.13.2.jar:1.13.2]
>       at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:107)
>  ~[flink-dist_2.12-1.13.2.jar:1.13.2]
>       at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95)
>  ~[flink-dist_2.12-1.13.2.jar:1.13.2]
>       at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
>  ~[flink-dist_2.12-1.13.2.jar:1.13.2]
>       at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>  ~[?:1.8.0_302]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_302]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_302]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  ~[?:1.8.0_302]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  ~[?:1.8.0_302]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_302]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_302]
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> I just wandering do we have any workaround with this?
> (although I know manually stopping the job may work)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-24240) HA JobGraph deserialization problem when migrate 1.12.4 to 1.13.2

Reply via email to