Re: NPE when restoring from savepoint in Flink 1.13.1 application

2021-06-11 Thread 陳昌倬
On Thu, Jun 10, 2021 at 07:10:45PM +0200, Roman Khachatryan wrote:
> Hi ChangZhuo,
> 
> Thanks for reporting, it looks like a bug.
> I've opened a ticket for that [1].
> 
> [1]
> https://issues.apache.org/jira/browse/FLINK-22966

Thanks for the help.


-- 
ChangZhuo Chen (陳昌倬) czchen@{czchen,debian}.org
http://czchen.info/
Key fingerprint = BA04 346D C2E1 FE63 C790  8793 CC65 B0CD EC27 5D5B


signature.asc
Description: PGP signature


Re: NPE when restoring from savepoint in Flink 1.13.1 application

2021-06-10 Thread Roman Khachatryan
Hi ChangZhuo,

Thanks for reporting, it looks like a bug.
I've opened a ticket for that [1].

[1]
https://issues.apache.org/jira/browse/FLINK-22966

Regards,
Roman

On Wed, Jun 9, 2021 at 4:07 PM ChangZhuo Chen (陳昌倬)  wrote:
>
> Hi,
>
> We have NullPointerException when trying to restore from savepoint for
> the same jar, or different jar, or different parallelism.  We have
> workaround this issue by changing UIDs in almost all operators. We want
> to know if there is anyway to avoid this problem so that it will not
> cause service maintence problem, thanks.
>
>
> The following is redacted stack trace we can provide for now:
>
> 2021-06-09 13:08:59,849 WARN  
> org.apache.flink.client.deployment.application.DetachedApplicationRunner [] - 
> Could not execute application:
> org.apache.flink.client.program.ProgramInvocationException: The main 
> method caused an error: Failed to execute job ''.
> at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>  ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>  ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
> ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at 
> org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:84)
>  ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at 
> org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:70)
>  ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at 
> org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleRequest$0(JarRunHandler.java:102)
>  ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>  [?:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>  [?:?]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
> at java.lang.Thread.run(Thread.java:834) [?:?]
> Caused by: org.apache.flink.util.FlinkException: Failed to execute job 
> ''.
> at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1970)
>  ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at 
> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:135)
>  ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at 
> org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:76)
>  ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1834)
>  ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:801)
>  ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at  ~[?:?]
> at  ~[?:?]
> at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) ~[?:?]
> at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  ~[?:?]
> at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
> at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
> at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>  ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> ... 12 more
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
> at 
> org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:316) 
> ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
>  ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
>  ~[?:?]
> at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>  ~[?:?]
> ... 1 more
> Caused by: org.apache.flink.runtime.client.JobInitializationException: 
> Could not start the JobMaster.
> at 
> 

NPE when restoring from savepoint in Flink 1.13.1 application

2021-06-09 Thread 陳昌倬
Hi,

We have NullPointerException when trying to restore from savepoint for
the same jar, or different jar, or different parallelism.  We have
workaround this issue by changing UIDs in almost all operators. We want
to know if there is anyway to avoid this problem so that it will not
cause service maintence problem, thanks.


The following is redacted stack trace we can provide for now:

2021-06-09 13:08:59,849 WARN  
org.apache.flink.client.deployment.application.DetachedApplicationRunner [] - 
Could not execute application:
org.apache.flink.client.program.ProgramInvocationException: The main method 
caused an error: Failed to execute job ''.
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
 ~[flink-dist_2.12-1.13.1.jar:1.13.1]
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
 ~[flink-dist_2.12-1.13.1.jar:1.13.1]
at 
org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
~[flink-dist_2.12-1.13.1.jar:1.13.1]
at 
org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:84)
 ~[flink-dist_2.12-1.13.1.jar:1.13.1]
at 
org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:70)
 ~[flink-dist_2.12-1.13.1.jar:1.13.1]
at 
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleRequest$0(JarRunHandler.java:102)
 ~[flink-dist_2.12-1.13.1.jar:1.13.1]
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
 [?:?]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 [?:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: org.apache.flink.util.FlinkException: Failed to execute job 
''.
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1970)
 ~[flink-dist_2.12-1.13.1.jar:1.13.1]
at 
org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:135)
 ~[flink-dist_2.12-1.13.1.jar:1.13.1]
at 
org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:76)
 ~[flink-dist_2.12-1.13.1.jar:1.13.1]
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1834)
 ~[flink-dist_2.12-1.13.1.jar:1.13.1]
at 
org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:801)
 ~[flink-dist_2.12-1.13.1.jar:1.13.1]
at  ~[?:?]
at  ~[?:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) ~[?:?]
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 ~[?:?]
at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:?]
at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
 ~[flink-dist_2.12-1.13.1.jar:1.13.1]
... 12 more
Caused by: java.lang.RuntimeException: 
org.apache.flink.runtime.client.JobInitializationException: Could not start the 
JobMaster.
at 
org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:316) 
~[flink-dist_2.12-1.13.1.jar:1.13.1]
at 
org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
 ~[flink-dist_2.12-1.13.1.jar:1.13.1]
at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
 ~[?:?]
at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
 ~[?:?]
... 1 more
Caused by: org.apache.flink.runtime.client.JobInitializationException: 
Could not start the JobMaster.
at 
org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.lambda$new$0(DefaultJobMasterServiceProcess.java:97)
 ~[flink-dist_2.12-1.13.1.jar:1.13.1]
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
 ~[?:?]
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
 ~[?:?]
at