[jira] [Comment Edited] (FLINK-32212) Job restarting indefinitely after an IllegalStateException from BlobLibraryCacheManager

David Christle (Jira) Sat, 16 Dec 2023 19:28:05 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-32212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17797866#comment-17797866
 ]


David Christle edited comment on FLINK-32212 at 12/17/23 3:27 AM:
------------------------------------------------------------------

We also see this issue. In one case, the logs appear to show k8s scaling down 
the node containing the JobManager. When it restarts, it tries to redeploy the 
Flink application, but endlessly retries with "The library registration 
references a different set of library BLOBs" error. We are running version 1.7 
of the Flink Kubernetes Operator on Flink 1.17.2, with the jar contained in the 
container image under the `lib` directory,


was (Author: dchristle):
We also see this issue. In one case, the logs appear to show k8s scaling down 
the node containing the JobManager. When it restarts, it tries to redeploy the 
Flink application, but endlessly retries with "The library registration 
references a different set of library BLOBs" error. We are running version 1.7 
of the Flink Kubernetes Operator.

> Job restarting indefinitely after an IllegalStateException from 
> BlobLibraryCacheManager
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-32212
>                 URL: https://issues.apache.org/jira/browse/FLINK-32212
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Task
>    Affects Versions: 1.16.1
>         Environment: Apache Flink Kubernetes Operator 1.4
>            Reporter: Matheus Felisberto
>            Priority: Major
>
> After running for a few hours the job starts to throw IllegalStateException 
> and I can't figure out why. To restore the job, I need to manually delete the 
> FlinkDeployment to be recreated and redeploy everything.
> The jar is built-in into the docker image, hence is defined accordingly with 
> the Operator's documentation:
> {code:java}
> // jarURI: local:///opt/flink/usrlib/my-job.jar {code}
> I've tried to move it into /opt/flink/lib/my-job.jar but it didn't work 
> either. 
>  
> {code:java}
> // Source: my-topic (1/2)#30587 
> (b82d2c7f9696449a2d9f4dc298c0a008_bc764cd8ddf7a0cff126f51c16239658_0_30587) 
> switched from DEPLOYING to FAILED with failure cause: 
> java.lang.IllegalStateException: The library registration references a 
> different set of library BLOBs than previous registrations for this job:
> old:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-7237ecbb12b0b021934b0c81aef78396]
> new:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-943737c6790a3ec6870cecd652b956c2]
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.verifyClassLoader(BlobLibraryCacheManager.java:419)
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.access$500(BlobLibraryCacheManager.java:359)
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.getOrResolveClassLoader(BlobLibraryCacheManager.java:235)
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.access$1100(BlobLibraryCacheManager.java:202)
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$DefaultClassLoaderLease.getOrResolveClassLoader(BlobLibraryCacheManager.java:336)
>     at 
> org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:1024)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:612)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>     at java.base/java.lang.Thread.run(Unknown Source) {code}
> If there is any other information that can help to identify the problem, 
> please let me know.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-32212) Job restarting indefinitely after an IllegalStateException from BlobLibraryCacheManager

Reply via email to