adnanhemani commented on PR #1765: URL: https://github.com/apache/polaris/pull/1765#issuecomment-2942620712
@adutra - I've reproduced the issue on a branch in my fork: https://github.com/adnanhemani/polaris/tree/ahemani/show_failure_1765 You can read the full diff [here](https://github.com/apache/polaris/compare/main...adnanhemani:polaris:ahemani/show_failure_1765), but I made a really simple case here that creates a task when you create a catalog. The task only tries to get the `BasePersistence` object - which is where the call blows up due to the poisoned cache. Feel free to stick a debugger in there and you'll be able to see, it is because of the lazy loading of the `JdbcBasePersistenceImpl` class and that the cache poisoning happened due to the creation of the TokenBroker (RequestScoped) bean. *Steps to reproduce the error using the code linked above:* 1. [This can only be reproduced using JDBC or EclipseLink.] Create a Persistence instance and set `application.properties` to the right set of configurations. 6. Run: `./polaris --client-id <CLIENT_ID> --client-secret <CLIENT_SECRET> catalogs create polaris1 --storage-type FILE --default-base-location "/var/tmp/polaris1/" ` (you must try this 7. Wait for the Task to execute. This will fail and retry - until it runs out of retries altogether and then will print out to logs that the task cannot be successfully completed. A call trace will also be able to be seen here. You can then apply this PR on top of that code and retry these steps and see that you will no longer see this issue. More on how the TokenBroker creates the poisoned cache: 1. `tokenBrokerFactory.apply(realmContext)`: https://github.com/apache/polaris/blob/main/quarkus/service/src/main/java/org/apache/polaris/service/quarkus/config/QuarkusProducers.java#L289. Note this is a `RequestScoped` bean - and so is `realmContext`. 2. `createTokenBroker(realmContext)`: https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java#L53 3. `metaStoreManagerFactory.getOrCreateMetaStoreManager(realmContext)`: https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java#L65-L66 4. `initializeForRealm(realmContext, null, false);`: https://github.com/apache/polaris/blob/main/persistence/relational-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/JdbcMetaStoreManagerFactory.java#L177 And that call is where the `sessionSupplierMap` stores the poisoned lambda that creates `JdbcBasePersistenceImpl`. At no point in this call trace was realmContext reset with a materialized version of the realmIdentifier - which is why it remains as a RequestScoped bean that made its way into the `sessionSupplierMap`. Again, your suggestion above to change this behavior by materializing the `realmContext` (perhaps from the `tokenBroker` itself) _will_ solve this issue. But I have no idea how to make a test that ensures something like that cannot happen again. If you have an idea, then glad to change the approach to that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
