adnanhemani commented on PR #1765:
URL: https://github.com/apache/polaris/pull/1765#issuecomment-2942620712

   @adutra - I've reproduced the issue on a branch in my fork: 
https://github.com/adnanhemani/polaris/tree/ahemani/show_failure_1765
   
   You can read the full diff 
[here](https://github.com/apache/polaris/compare/main...adnanhemani:polaris:ahemani/show_failure_1765),
 but I made a really simple case here that creates a task when you create a 
catalog. The task only tries to get the `BasePersistence` object - which is 
where the call blows up due to the poisoned cache. Feel free to stick a 
debugger in there and you'll be able to see, it is because of the lazy loading 
of the `JdbcBasePersistenceImpl` class and that the cache poisoning happened 
due to the creation of the TokenBroker (RequestScoped) bean.
   
   *Steps to reproduce the error using the code linked above:*
   
   1. [This can only be reproduced using JDBC or EclipseLink.] Create a 
Persistence instance and set `application.properties` to the right set of 
configurations.
   6. Run: `./polaris --client-id <CLIENT_ID> --client-secret <CLIENT_SECRET> 
catalogs create polaris1 --storage-type FILE --default-base-location 
"/var/tmp/polaris1/" ` (you must try this 
   7. Wait for the Task to execute. This will fail and retry - until it runs 
out of retries altogether and then will print out to logs that the task cannot 
be successfully completed. A call trace will also be able to be seen here.
   
   You can then apply this PR on top of that code and retry these steps and see 
that you will no longer see this issue.
   
   More on how the TokenBroker creates the poisoned cache:
   1. `tokenBrokerFactory.apply(realmContext)`: 
https://github.com/apache/polaris/blob/main/quarkus/service/src/main/java/org/apache/polaris/service/quarkus/config/QuarkusProducers.java#L289.
 Note this is a `RequestScoped` bean - and so is `realmContext`.
   2. `createTokenBroker(realmContext)`: 
https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java#L53
   3. `metaStoreManagerFactory.getOrCreateMetaStoreManager(realmContext)`: 
https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java#L65-L66
   4. `initializeForRealm(realmContext, null, false);`: 
https://github.com/apache/polaris/blob/main/persistence/relational-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/JdbcMetaStoreManagerFactory.java#L177
   
   And that call is where the `sessionSupplierMap` stores the poisoned lambda 
that creates `JdbcBasePersistenceImpl`. At no point in this call trace was 
realmContext reset with a materialized version of the realmIdentifier - which 
is why it remains as a RequestScoped bean that made its way into the 
`sessionSupplierMap`.
   
   Again, your suggestion above to change this behavior by materializing the 
`realmContext` (perhaps from the `tokenBroker` itself) _will_ solve this issue. 
But I have no idea how to make a test that ensures something like that cannot 
happen again. If you have an idea, then glad to change the approach to that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to