Re: [DISCUSS] Injecting RealmContext
Hi Dmitri, Thanks a lot for the clarification! I didn't realize PolarisMetaStoreManager is a request-scoped bean. I have to say that the whole implementation is kind of confusing. Although I do agree that those *MetastoreManagerImpl is not needed after the request ends, i am afraid things may not be that simple, I see the EntityManager is also created in a similar way, which is created per realm, and cached in a map. Code reference: https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/config/RealmEntityManagerFactory.java#L56 The EntityManager has a pointer to MetastoreManager, entityCache and credentialCache, so you might also need to turn EntityManager into a pure request scope also, and I am not sure if there are other places that also have similar situations. It seems that if we want to turn the Metastore Manager into a request scope, things could turn out to be more complicated than what we thought, and we probably need to be careful about this. Best Regards, Yun On Mon, Jun 9, 2025 at 3:32 PM Dmitri Bourlatchkov wrote: > Thanks for stating your concerns, Yun! > > Re: MetastoreManager, I believe the current state of that code is actually > the source of confusion and I'm trying to clarify that. > > For example, TransactionalMetaStoreManagerImpl is currently created per > realm. However, this class does not carry any state. It is an empty shell > that works off of the data passed as PolarisCallContext parameters. The > same holds for AtomicOperationMetaStoreManager. > > As such, PolarisMetaStoreManager can be request-scoped beans. I do not see > any reason to keep those objects after the request is done. > > WDYT? > > Thanks, > Dmitri. > > On Mon, Jun 9, 2025 at 5:57 PM yun zou wrote: > > > Hi Dmitri, > > > > Thanks for bringing that up! > > > > I am not an expert with CDI, based my recent experience with CDI, while > CDI > > does help making development > > simpler under many cases, it does also make things more complicated under > > some scenarios. Especially when > > injecting a request scoped object into application scoped class, it does > > make things harder to reason about when > > debugging, and it also seems we currently lack efficient ways of testing > to > > catch potential regressions. For those > > cases, i think explicitly passing the objects as parameters is actually > > more clear and robust. > > > > Furthermore, I think this could make things easier for new developers for > > OSS polaris. > > > > Also, regarding the metastore object, i don't think i fully understand > the > > proposal, are we proposing to make the > > MetastoreManager now request scoped? Currently, the metastore manager is > > created per realm, not per > > RealmContext, which means two requests with the same realmId can use the > > same metastore manager. If we are turning > > this class into a request scope, I am not sure if that would be a good > > idea, especially when we are dealing with Entity Caching > > optimizations, where caching per realm instead of per request probably > > makes more sense. > > > > Best Regards, > > Yun > > > > > > > > On Fri, Jun 6, 2025 at 2:47 PM Yufei Gu wrote: > > > > > Agreed with Alex. I think passing a realm context is a cleaner solution > > > than injecting a request scope realm context. CDI provides a lot of > > > benefits, but it doesn't mean that we should alway use it. > > > > > > In general, injecting a request-scoped bean into an application-scoped > > bean > > > is perfectly valid in CDI, but it does come with a few gotchas: > > > > > >1. Lifecycle surprises. An @ApplicationScoped method can run in > > contexts > > >where no request scope is active (e.g., async tasks, scheduled > jobs). > > In > > >those cases the injected proxy can’t resolve a real instance, > leading > > to > > >ContextNotActiveException or, worse, stale data. > > >2. Debugging pain, When failures depend on the presence or absence > of > > a > > >request context, stack traces alone rarely tell the full story. > > Tracking > > >down intermittent bugs becomes tricky. > > >3. Unnecessary complexity, which is a contradiction of using > > injection. > > >Injection is supposed to make a simpler and cleaner code base. If > you > > >really only need a bit of request data, passing it as a method > > parameter > > >keeps the code simpler, avoids proxy lookups, and makes unit tests > > > cleaner. > > > > > > As a rule of thumb, we may inject a narrower scope only when we must. > > > > > > Yufei > > > > > > > > > On Fri, Jun 6, 2025 at 11:15 AM Dmitri Bourlatchkov > > > wrote: > > > > > > > Thanks, Alex! Very good points! > > > > > > > > Re: metrics, I did a quick POC by adding `@Inject RealmContext rc` > > > > to QuarkusValueExpressionResolver and it appears to work. That > > > > is, QuarkusValueExpressionResolver can get the realm ID from the > > injected > > > > RealmContext as opposed to the API parameter. > > > >
Re: Polaris Event Persistence Schema
Hi Yufei, Thanks for the comments, I’ve addressed all of them. The second half of the document briefly outlines how we will transform the Polaris internal persistence events into Iceberg Events to return back as part of the Iceberg Event Endpoint. Best, Adnan Hemani > On Jun 8, 2025, at 4:02 PM, Yufei Gu wrote: > > Thanks Adnan for driving this. Left comments in the google doc. > > The current schema proposal is good if this is only for Polaris internal > persistence. It'd also be a good idea to make sure it is > compatible with the Iceberg Event Endpoints [1][2]. > > [1] > https://www.google.com/url?q=https://docs.google.com/document/d/1WtIsNGVX75-_MsQIOJhXLAWg6IbplV4-DkLllQEiFT8/edit?pli%3D1%26tab%3Dt.0%23heading%3Dh.7bty2c61dvd7&source=gmail-imap&ust=175002856900&usg=AOvVaw10E0e7Djbd1WYMY8dGRnNO > [2] > https://www.google.com/url?q=https://github.com/apache/iceberg/pull/12584/files&source=gmail-imap&ust=175002856900&usg=AOvVaw0kMs3bVaBXRtZCTvStT3JI > > Yufei > > > On Tue, Jun 3, 2025 at 10:19 PM Adnan Hemani > wrote: > >> Hi all, >> >> Wanted to start this thread on getting community feedback for the schema >> of a new `events` table, as described by my Iceberg Events API >> implementation thread < >> https://www.google.com/url?q=https://lists.apache.org/thread/nfp40hmwryybznxtgffvy4c5l5kk19ho&source=gmail-imap&ust=175002856900&usg=AOvVaw2zcBEWMfKLXrNZhXbETGB4>. >> >> Schema can be found on this Google Doc - please feel free to comment >> directly on the document and we can discuss in the comments section as >> well: >> https://www.google.com/url?q=https://docs.google.com/document/d/1AxaXy-DXE2g6rmFktA2foDI7k2zIjFzTKmI5mZnUjOs/edit?usp%3Dsharing&source=gmail-imap&ust=175002856900&usg=AOvVaw38W8LTWpVceP8bFhLVV_La >> >> Best, >> Adnan Hemani
Re: [DISCUSS] Injecting RealmContext
Hi Dmitri, Thanks for bringing that up! I am not an expert with CDI, based my recent experience with CDI, while CDI does help making development simpler under many cases, it does also make things more complicated under some scenarios. Especially when injecting a request scoped object into application scoped class, it does make things harder to reason about when debugging, and it also seems we currently lack efficient ways of testing to catch potential regressions. For those cases, i think explicitly passing the objects as parameters is actually more clear and robust. Furthermore, I think this could make things easier for new developers for OSS polaris. Also, regarding the metastore object, i don't think i fully understand the proposal, are we proposing to make the MetastoreManager now request scoped? Currently, the metastore manager is created per realm, not per RealmContext, which means two requests with the same realmId can use the same metastore manager. If we are turning this class into a request scope, I am not sure if that would be a good idea, especially when we are dealing with Entity Caching optimizations, where caching per realm instead of per request probably makes more sense. Best Regards, Yun On Fri, Jun 6, 2025 at 2:47 PM Yufei Gu wrote: > Agreed with Alex. I think passing a realm context is a cleaner solution > than injecting a request scope realm context. CDI provides a lot of > benefits, but it doesn't mean that we should alway use it. > > In general, injecting a request-scoped bean into an application-scoped bean > is perfectly valid in CDI, but it does come with a few gotchas: > >1. Lifecycle surprises. An @ApplicationScoped method can run in contexts >where no request scope is active (e.g., async tasks, scheduled jobs). In >those cases the injected proxy can’t resolve a real instance, leading to >ContextNotActiveException or, worse, stale data. >2. Debugging pain, When failures depend on the presence or absence of a >request context, stack traces alone rarely tell the full story. Tracking >down intermittent bugs becomes tricky. >3. Unnecessary complexity, which is a contradiction of using injection. >Injection is supposed to make a simpler and cleaner code base. If you >really only need a bit of request data, passing it as a method parameter >keeps the code simpler, avoids proxy lookups, and makes unit tests > cleaner. > > As a rule of thumb, we may inject a narrower scope only when we must. > > Yufei > > > On Fri, Jun 6, 2025 at 11:15 AM Dmitri Bourlatchkov > wrote: > > > Thanks, Alex! Very good points! > > > > Re: metrics, I did a quick POC by adding `@Inject RealmContext rc` > > to QuarkusValueExpressionResolver and it appears to work. That > > is, QuarkusValueExpressionResolver can get the realm ID from the injected > > RealmContext as opposed to the API parameter. > > > > With that, we could probably change the interpretation of the > > "realmIdentifier" expression to be the injected RealmContext, remove > > explicit realm parameters from the generated API classes and move the > > `@MeterTag(key="realm_id",expression="realmIdentifier")` annotation to > the > > method level. > > > > WDYT? > > > > Thanks, > > Dmitri. > > > > On Fri, Jun 6, 2025 at 12:41 PM Alex Dutra > > > wrote: > > > > > Hi Dmitri, > > > > > > Thanks for bringing up this important topic. > > > > > > I generally agree with your refactoring proposal, but with 2 caveats: > > > > > > 1) For an application-scoped bean exposing methods to retrieve > > realm-scoped > > > components (e.g., the "factory" beans that expose methods like > > > "getOrCreateXYZ") I think it's actually OK to receive the RealmContext > > as a > > > method parameter, as opposed to having the RealmContext injected. This > is > > > because the injection, while possible, would only work if the request > > > context is active at the moment where a method on that bean is invoked. > > > This is hard to enforce at compile time and also hard to reason about > > imho. > > > IOW I prefer to see this: > > > > > > @ApplicationScoped public class XYZFactory { > > > public XYZ getOrCreateXYZ(RealmContext ctx) { return > > > doSomeStuff(ctx.getRealmIdentifier()); } > > > } > > > > > > Rather than this: > > > > > > @ApplicationScoped public class XYZFactory { > > > @Inject RealmContext ctx; > > > public XYZ getOrCreateXYZ() { return > > > doSomeStuff(ctx.getRealmIdentifier()); } > > > } > > > > > > 2) We do need a RealmContext parameter in the API classes generated > from > > > Mustache templates [1], because the RealmContext parameter is annotated > > as > > > follows: > > > > > > @Context @MeterTag(key="realm_id",expression="realmIdentifier") > > > RealmContext realmContext > > > > > > This is so that we can produce "realm_id" tags for API metrics. Thus we > > > cannot remove it (unless there is another way to produce such tags, but > > I'm > > > not aware of any). > > > > > > But we could remov
Re: [DISCUSS] Injecting RealmContext
I'd prefer to avoid having the APIs reflect the scope of the implementation beans under the hood. On the other hand, I don't like forcing methods at the bottom of the call stack requiring parameters because they're necessary later on. I liked passing CallContext around because it didn't really matter if the bean was ApplicationScoped or RequestScoped - CallContext was just there. Passing the CallContext to the PolarisMetaStoreManager meant that the underlying implementation could use an ApplicationScoped bean, like the PolarisConfigurationStore and the current realm could be determined from the method parameter. It's true, we could make everything RequestScoped and then nothing needs to be passed in, but that feels inefficient - e.g., should the PolarisConfigurationStore re-parse the configuration map for every request? A downside is that CallContext becomes the "everything bean" where you just add properties because it's the easiest way to get a new parameter passed to some bean 3 layers down, but I honestly think that's less of a concern when the CDI context already allows for an easier way of injecting dependencies. Mike On Mon, Jun 9, 2025 at 3:54 PM Dmitri Bourlatchkov wrote: > Thanks for stating your concerns, Yun! > > Re: MetastoreManager, I believe the current state of that code is actually > the source of confusion and I'm trying to clarify that. > > For example, TransactionalMetaStoreManagerImpl is currently created per > realm. However, this class does not carry any state. It is an empty shell > that works off of the data passed as PolarisCallContext parameters. The > same holds for AtomicOperationMetaStoreManager. > > As such, PolarisMetaStoreManager can be request-scoped beans. I do not see > any reason to keep those objects after the request is done. > > WDYT? > > Thanks, > Dmitri. > > On Mon, Jun 9, 2025 at 5:57 PM yun zou wrote: > > > Hi Dmitri, > > > > Thanks for bringing that up! > > > > I am not an expert with CDI, based my recent experience with CDI, while > CDI > > does help making development > > simpler under many cases, it does also make things more complicated under > > some scenarios. Especially when > > injecting a request scoped object into application scoped class, it does > > make things harder to reason about when > > debugging, and it also seems we currently lack efficient ways of testing > to > > catch potential regressions. For those > > cases, i think explicitly passing the objects as parameters is actually > > more clear and robust. > > > > Furthermore, I think this could make things easier for new developers for > > OSS polaris. > > > > Also, regarding the metastore object, i don't think i fully understand > the > > proposal, are we proposing to make the > > MetastoreManager now request scoped? Currently, the metastore manager is > > created per realm, not per > > RealmContext, which means two requests with the same realmId can use the > > same metastore manager. If we are turning > > this class into a request scope, I am not sure if that would be a good > > idea, especially when we are dealing with Entity Caching > > optimizations, where caching per realm instead of per request probably > > makes more sense. > > > > Best Regards, > > Yun > > > > > > > > On Fri, Jun 6, 2025 at 2:47 PM Yufei Gu wrote: > > > > > Agreed with Alex. I think passing a realm context is a cleaner solution > > > than injecting a request scope realm context. CDI provides a lot of > > > benefits, but it doesn't mean that we should alway use it. > > > > > > In general, injecting a request-scoped bean into an application-scoped > > bean > > > is perfectly valid in CDI, but it does come with a few gotchas: > > > > > >1. Lifecycle surprises. An @ApplicationScoped method can run in > > contexts > > >where no request scope is active (e.g., async tasks, scheduled > jobs). > > In > > >those cases the injected proxy can’t resolve a real instance, > leading > > to > > >ContextNotActiveException or, worse, stale data. > > >2. Debugging pain, When failures depend on the presence or absence > of > > a > > >request context, stack traces alone rarely tell the full story. > > Tracking > > >down intermittent bugs becomes tricky. > > >3. Unnecessary complexity, which is a contradiction of using > > injection. > > >Injection is supposed to make a simpler and cleaner code base. If > you > > >really only need a bit of request data, passing it as a method > > parameter > > >keeps the code simpler, avoids proxy lookups, and makes unit tests > > > cleaner. > > > > > > As a rule of thumb, we may inject a narrower scope only when we must. > > > > > > Yufei > > > > > > > > > On Fri, Jun 6, 2025 at 11:15 AM Dmitri Bourlatchkov > > > wrote: > > > > > > > Thanks, Alex! Very good points! > > > > > > > > Re: metrics, I did a quick POC by adding `@Inject RealmContext rc` > > > > to QuarkusValueExpressionResolver and it appears to work. That > > > > is, QuarkusValueExpressionReso
Re: [DISCUSS] Injecting RealmContext
Thanks for stating your concerns, Yun! Re: MetastoreManager, I believe the current state of that code is actually the source of confusion and I'm trying to clarify that. For example, TransactionalMetaStoreManagerImpl is currently created per realm. However, this class does not carry any state. It is an empty shell that works off of the data passed as PolarisCallContext parameters. The same holds for AtomicOperationMetaStoreManager. As such, PolarisMetaStoreManager can be request-scoped beans. I do not see any reason to keep those objects after the request is done. WDYT? Thanks, Dmitri. On Mon, Jun 9, 2025 at 5:57 PM yun zou wrote: > Hi Dmitri, > > Thanks for bringing that up! > > I am not an expert with CDI, based my recent experience with CDI, while CDI > does help making development > simpler under many cases, it does also make things more complicated under > some scenarios. Especially when > injecting a request scoped object into application scoped class, it does > make things harder to reason about when > debugging, and it also seems we currently lack efficient ways of testing to > catch potential regressions. For those > cases, i think explicitly passing the objects as parameters is actually > more clear and robust. > > Furthermore, I think this could make things easier for new developers for > OSS polaris. > > Also, regarding the metastore object, i don't think i fully understand the > proposal, are we proposing to make the > MetastoreManager now request scoped? Currently, the metastore manager is > created per realm, not per > RealmContext, which means two requests with the same realmId can use the > same metastore manager. If we are turning > this class into a request scope, I am not sure if that would be a good > idea, especially when we are dealing with Entity Caching > optimizations, where caching per realm instead of per request probably > makes more sense. > > Best Regards, > Yun > > > > On Fri, Jun 6, 2025 at 2:47 PM Yufei Gu wrote: > > > Agreed with Alex. I think passing a realm context is a cleaner solution > > than injecting a request scope realm context. CDI provides a lot of > > benefits, but it doesn't mean that we should alway use it. > > > > In general, injecting a request-scoped bean into an application-scoped > bean > > is perfectly valid in CDI, but it does come with a few gotchas: > > > >1. Lifecycle surprises. An @ApplicationScoped method can run in > contexts > >where no request scope is active (e.g., async tasks, scheduled jobs). > In > >those cases the injected proxy can’t resolve a real instance, leading > to > >ContextNotActiveException or, worse, stale data. > >2. Debugging pain, When failures depend on the presence or absence of > a > >request context, stack traces alone rarely tell the full story. > Tracking > >down intermittent bugs becomes tricky. > >3. Unnecessary complexity, which is a contradiction of using > injection. > >Injection is supposed to make a simpler and cleaner code base. If you > >really only need a bit of request data, passing it as a method > parameter > >keeps the code simpler, avoids proxy lookups, and makes unit tests > > cleaner. > > > > As a rule of thumb, we may inject a narrower scope only when we must. > > > > Yufei > > > > > > On Fri, Jun 6, 2025 at 11:15 AM Dmitri Bourlatchkov > > wrote: > > > > > Thanks, Alex! Very good points! > > > > > > Re: metrics, I did a quick POC by adding `@Inject RealmContext rc` > > > to QuarkusValueExpressionResolver and it appears to work. That > > > is, QuarkusValueExpressionResolver can get the realm ID from the > injected > > > RealmContext as opposed to the API parameter. > > > > > > With that, we could probably change the interpretation of the > > > "realmIdentifier" expression to be the injected RealmContext, remove > > > explicit realm parameters from the generated API classes and move the > > > `@MeterTag(key="realm_id",expression="realmIdentifier")` annotation to > > the > > > method level. > > > > > > WDYT? > > > > > > Thanks, > > > Dmitri. > > > > > > On Fri, Jun 6, 2025 at 12:41 PM Alex Dutra > > > > > > wrote: > > > > > > > Hi Dmitri, > > > > > > > > Thanks for bringing up this important topic. > > > > > > > > I generally agree with your refactoring proposal, but with 2 caveats: > > > > > > > > 1) For an application-scoped bean exposing methods to retrieve > > > realm-scoped > > > > components (e.g., the "factory" beans that expose methods like > > > > "getOrCreateXYZ") I think it's actually OK to receive the > RealmContext > > > as a > > > > method parameter, as opposed to having the RealmContext injected. > This > > is > > > > because the injection, while possible, would only work if the request > > > > context is active at the moment where a method on that bean is > invoked. > > > > This is hard to enforce at compile time and also hard to reason about > > > imho. > > > > IOW I prefer to see this: > > > > > > > > @ApplicationScoped public clas