[jira] [Updated] (FLINK-32212) Job restarting indefinitely after an IllegalStateException from BlobLibraryCacheManager

2023-05-29 Thread Matheus Felisberto (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matheus Felisberto updated FLINK-32212:
---
Environment: Apache Flink Kubernetes Operator 1.4  (was: I'm running my 
workload on Kubernetes using Operator v1.4 and Flink 1.16)

> Job restarting indefinitely after an IllegalStateException from 
> BlobLibraryCacheManager
> ---
>
> Key: FLINK-32212
> URL: https://issues.apache.org/jira/browse/FLINK-32212
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.16.1
> Environment: Apache Flink Kubernetes Operator 1.4
>Reporter: Matheus Felisberto
>Priority: Major
>
> After running for a few hours the job starts to throw IllegalStateException 
> and I can't figure out why. The jar is built-in into the docker image, hence 
> is defined accordingly with the Operator's documentation:
>  
> {code:java}
> // jarURI: local:///opt/flink/usrlib/my-job.jar {code}
>  
> I've tried to move it into /opt/flink/lib/my-job.jar but it didn't work 
> either. 
>  
> {code:java}
> // Source: my-topic (1/2)#30587 
> (b82d2c7f9696449a2d9f4dc298c0a008_bc764cd8ddf7a0cff126f51c16239658_0_30587) 
> switched from DEPLOYING to FAILED with failure cause: 
> java.lang.IllegalStateException: The library registration references a 
> different set of library BLOBs than previous registrations for this job:
> old:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-7237ecbb12b0b021934b0c81aef78396]
> new:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-943737c6790a3ec6870cecd652b956c2]
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.verifyClassLoader(BlobLibraryCacheManager.java:419)
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.access$500(BlobLibraryCacheManager.java:359)
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.getOrResolveClassLoader(BlobLibraryCacheManager.java:235)
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.access$1100(BlobLibraryCacheManager.java:202)
>     at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$DefaultClassLoaderLease.getOrResolveClassLoader(BlobLibraryCacheManager.java:336)
>     at 
> org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:1024)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:612)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>     at java.base/java.lang.Thread.run(Unknown Source) {code}
> If there is any other information that can help to identify the problem, 
> please let me know.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32212) Job restarting indefinitely after an IllegalStateException from BlobLibraryCacheManager

2023-05-29 Thread Matheus Felisberto (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matheus Felisberto updated FLINK-32212:
---
Description: 
After running for a few hours the job starts to throw IllegalStateException and 
I can't figure out why. To restore the job, I need to manually delete the 
FlinkDeployment to be recreated and re-deploy everything.
The jar is built-in into the docker image, hence is defined accordingly with 
the Operator's documentation:
{code:java}
// jarURI: local:///opt/flink/usrlib/my-job.jar {code}
I've tried to move it into /opt/flink/lib/my-job.jar but it didn't work either. 

 
{code:java}
// Source: my-topic (1/2)#30587 
(b82d2c7f9696449a2d9f4dc298c0a008_bc764cd8ddf7a0cff126f51c16239658_0_30587) 
switched from DEPLOYING to FAILED with failure cause: 
java.lang.IllegalStateException: The library registration references a 
different set of library BLOBs than previous registrations for this job:
old:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-7237ecbb12b0b021934b0c81aef78396]
new:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-943737c6790a3ec6870cecd652b956c2]
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.verifyClassLoader(BlobLibraryCacheManager.java:419)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.access$500(BlobLibraryCacheManager.java:359)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.getOrResolveClassLoader(BlobLibraryCacheManager.java:235)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.access$1100(BlobLibraryCacheManager.java:202)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$DefaultClassLoaderLease.getOrResolveClassLoader(BlobLibraryCacheManager.java:336)
    at 
org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:1024)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:612)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
    at java.base/java.lang.Thread.run(Unknown Source) {code}
If there is any other information that can help to identify the problem, please 
let me know.

 

  was:
After running for a few hours the job starts to throw IllegalStateException and 
I can't figure out why. The jar is built-in into the docker image, hence is 
defined accordingly with the Operator's documentation:

 
{code:java}
// jarURI: local:///opt/flink/usrlib/my-job.jar {code}
 

I've tried to move it into /opt/flink/lib/my-job.jar but it didn't work either. 

 
{code:java}
// Source: my-topic (1/2)#30587 
(b82d2c7f9696449a2d9f4dc298c0a008_bc764cd8ddf7a0cff126f51c16239658_0_30587) 
switched from DEPLOYING to FAILED with failure cause: 
java.lang.IllegalStateException: The library registration references a 
different set of library BLOBs than previous registrations for this job:
old:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-7237ecbb12b0b021934b0c81aef78396]
new:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-943737c6790a3ec6870cecd652b956c2]
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.verifyClassLoader(BlobLibraryCacheManager.java:419)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.access$500(BlobLibraryCacheManager.java:359)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.getOrResolveClassLoader(BlobLibraryCacheManager.java:235)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.access$1100(BlobLibraryCacheManager.java:202)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$DefaultClassLoaderLease.getOrResolveClassLoader(BlobLibraryCacheManager.java:336)
    at 
org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:1024)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:612)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
    at java.base/java.lang.Thread.run(Unknown Source) {code}
If there is any other information that can help to identify the problem, please 
let me know.

 


> Job restarting indefinitely after an IllegalStateException from 
> BlobLibraryCacheManager
> ---
>
> Key: FLINK-32212
> URL: https://issues.apache.org/jira/browse/FLINK-32212
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.16.1
> Environment: Apache Flink Kubernetes Operator 1.4
>Reporter: Matheus Felisberto
>Priority: Major
>
> After running for a few hours the job starts to throw IllegalStateException 
> and I can't figure ou

[jira] [Updated] (FLINK-32212) Job restarting indefinitely after an IllegalStateException from BlobLibraryCacheManager

2023-05-29 Thread Matheus Felisberto (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matheus Felisberto updated FLINK-32212:
---
Description: 
After running for a few hours the job starts to throw IllegalStateException and 
I can't figure out why. To restore the job, I need to manually delete the 
FlinkDeployment to be recreated and redeploy everything.
The jar is built-in into the docker image, hence is defined accordingly with 
the Operator's documentation:
{code:java}
// jarURI: local:///opt/flink/usrlib/my-job.jar {code}
I've tried to move it into /opt/flink/lib/my-job.jar but it didn't work either. 

 
{code:java}
// Source: my-topic (1/2)#30587 
(b82d2c7f9696449a2d9f4dc298c0a008_bc764cd8ddf7a0cff126f51c16239658_0_30587) 
switched from DEPLOYING to FAILED with failure cause: 
java.lang.IllegalStateException: The library registration references a 
different set of library BLOBs than previous registrations for this job:
old:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-7237ecbb12b0b021934b0c81aef78396]
new:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-943737c6790a3ec6870cecd652b956c2]
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.verifyClassLoader(BlobLibraryCacheManager.java:419)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.access$500(BlobLibraryCacheManager.java:359)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.getOrResolveClassLoader(BlobLibraryCacheManager.java:235)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.access$1100(BlobLibraryCacheManager.java:202)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$DefaultClassLoaderLease.getOrResolveClassLoader(BlobLibraryCacheManager.java:336)
    at 
org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:1024)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:612)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
    at java.base/java.lang.Thread.run(Unknown Source) {code}
If there is any other information that can help to identify the problem, please 
let me know.

 

  was:
After running for a few hours the job starts to throw IllegalStateException and 
I can't figure out why. To restore the job, I need to manually delete the 
FlinkDeployment to be recreated and re-deploy everything.
The jar is built-in into the docker image, hence is defined accordingly with 
the Operator's documentation:
{code:java}
// jarURI: local:///opt/flink/usrlib/my-job.jar {code}
I've tried to move it into /opt/flink/lib/my-job.jar but it didn't work either. 

 
{code:java}
// Source: my-topic (1/2)#30587 
(b82d2c7f9696449a2d9f4dc298c0a008_bc764cd8ddf7a0cff126f51c16239658_0_30587) 
switched from DEPLOYING to FAILED with failure cause: 
java.lang.IllegalStateException: The library registration references a 
different set of library BLOBs than previous registrations for this job:
old:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-7237ecbb12b0b021934b0c81aef78396]
new:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-943737c6790a3ec6870cecd652b956c2]
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.verifyClassLoader(BlobLibraryCacheManager.java:419)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.access$500(BlobLibraryCacheManager.java:359)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.getOrResolveClassLoader(BlobLibraryCacheManager.java:235)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.access$1100(BlobLibraryCacheManager.java:202)
    at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$DefaultClassLoaderLease.getOrResolveClassLoader(BlobLibraryCacheManager.java:336)
    at 
org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:1024)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:612)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
    at java.base/java.lang.Thread.run(Unknown Source) {code}
If there is any other information that can help to identify the problem, please 
let me know.

 


> Job restarting indefinitely after an IllegalStateException from 
> BlobLibraryCacheManager
> ---
>
> Key: FLINK-32212
> URL: https://issues.apache.org/jira/browse/FLINK-32212
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.16.1
> Environment: Apache Flink Kubernetes Operator 1.4
>Reporter: Matheus Felisberto
>Priority: Major
>