Re: [DISCUSS] Injecting RealmContext

2025-06-09 Thread yun zou
Hi Dmitri,

Thanks a lot for the clarification! I didn't realize
PolarisMetaStoreManager is a request-scoped bean.

I have to say that the whole implementation is kind of confusing. Although
I do agree that those
*MetastoreManagerImpl is not needed after the request ends, i am afraid
things may not be that simple, I see
the EntityManager is also created in a similar way, which is created per
realm, and cached in a map.
Code reference:
https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/config/RealmEntityManagerFactory.java#L56

The EntityManager has a pointer to MetastoreManager, entityCache and
credentialCache, so you might also need to turn
EntityManager into a pure request scope also, and I am not sure if there
are other places that also have similar situations.

It seems that if we want to turn the Metastore Manager into a request
scope, things could turn out to be more complicated than
what we thought, and we probably need to be careful about this.

Best Regards,
Yun

On Mon, Jun 9, 2025 at 3:32 PM Dmitri Bourlatchkov  wrote:

> Thanks for stating your concerns, Yun!
>
> Re: MetastoreManager, I believe the current state of that code is actually
> the source of confusion and I'm trying to clarify that.
>
> For example, TransactionalMetaStoreManagerImpl is currently created per
> realm. However, this class does not carry any state. It is an empty shell
> that works off of the data passed as PolarisCallContext parameters. The
> same holds for AtomicOperationMetaStoreManager.
>
> As such, PolarisMetaStoreManager can be request-scoped beans. I do not see
> any reason to keep those objects after the request is done.
>
> WDYT?
>
> Thanks,
> Dmitri.
>
> On Mon, Jun 9, 2025 at 5:57 PM yun zou  wrote:
>
> > Hi Dmitri,
> >
> > Thanks for bringing that up!
> >
> > I am not an expert with CDI, based my recent experience with CDI, while
> CDI
> > does help making development
> > simpler under many cases, it does also make things more complicated under
> > some scenarios. Especially when
> > injecting a request scoped object into application scoped class, it does
> > make things harder to reason about when
> > debugging, and it also seems we currently lack efficient ways of testing
> to
> > catch potential regressions. For those
> > cases, i think explicitly passing the objects as parameters is actually
> > more clear and robust.
> >
> > Furthermore, I think this could make things easier for new developers for
> > OSS polaris.
> >
> > Also, regarding the metastore object, i don't think i fully understand
> the
> > proposal, are we proposing to make the
> > MetastoreManager now request scoped? Currently, the metastore manager is
> > created per realm, not per
> > RealmContext, which means two requests with the same realmId can use the
> > same metastore manager. If we are turning
> > this class into a request scope, I am not sure if that would be a good
> > idea, especially when we are dealing with Entity Caching
> > optimizations, where caching per realm instead of per request probably
> > makes more sense.
> >
> > Best Regards,
> > Yun
> >
> >
> >
> > On Fri, Jun 6, 2025 at 2:47 PM Yufei Gu  wrote:
> >
> > > Agreed with Alex. I think passing a realm context is a cleaner solution
> > > than injecting a request scope realm context. CDI provides a lot of
> > > benefits, but it doesn't mean that we should alway use it.
> > >
> > > In general, injecting a request-scoped bean into an application-scoped
> > bean
> > > is perfectly valid in CDI, but it does come with a few gotchas:
> > >
> > >1. Lifecycle surprises. An @ApplicationScoped method can run in
> > contexts
> > >where no request scope is active (e.g., async tasks, scheduled
> jobs).
> > In
> > >those cases the injected proxy can’t resolve a real instance,
> leading
> > to
> > >ContextNotActiveException or, worse, stale data.
> > >2. Debugging pain, When failures depend on the presence or absence
> of
> > a
> > >request context, stack traces alone rarely tell the full story.
> > Tracking
> > >down intermittent bugs becomes tricky.
> > >3. Unnecessary complexity, which is a contradiction of using
> > injection.
> > >Injection is supposed to make a simpler and cleaner code base. If
> you
> > >really only need a bit of request data, passing it as a method
> > parameter
> > >keeps the code simpler, avoids proxy lookups, and makes unit tests
> > > cleaner.
> > >
> > > As a rule of thumb, we may inject a narrower scope only when we must.
> > >
> > > Yufei
> > >
> > >
> > > On Fri, Jun 6, 2025 at 11:15 AM Dmitri Bourlatchkov 
> > > wrote:
> > >
> > > > Thanks, Alex! Very good points!
> > > >
> > > > Re: metrics, I did a quick POC by adding `@Inject RealmContext rc`
> > > > to QuarkusValueExpressionResolver and it appears to work. That
> > > > is, QuarkusValueExpressionResolver can get the realm ID from the
> > injected
> > > > RealmContext as opposed to the API parameter.
> > > >

Re: Polaris Event Persistence Schema

2025-06-09 Thread Adnan Hemani
Hi Yufei,

Thanks for the comments, I’ve addressed all of them. The second half of the 
document briefly outlines how we will transform the Polaris internal 
persistence events into Iceberg Events to return back as part of the Iceberg 
Event Endpoint.

Best,
Adnan Hemani

> On Jun 8, 2025, at 4:02 PM, Yufei Gu  wrote:
> 
> Thanks Adnan for driving this. Left comments in the google doc.
> 
> The current schema proposal is good if this is only for Polaris internal
> persistence. It'd also be a good idea to make sure it is
> compatible with the Iceberg Event Endpoints [1][2].
> 
> [1]
> https://www.google.com/url?q=https://docs.google.com/document/d/1WtIsNGVX75-_MsQIOJhXLAWg6IbplV4-DkLllQEiFT8/edit?pli%3D1%26tab%3Dt.0%23heading%3Dh.7bty2c61dvd7&source=gmail-imap&ust=175002856900&usg=AOvVaw10E0e7Djbd1WYMY8dGRnNO
> [2] 
> https://www.google.com/url?q=https://github.com/apache/iceberg/pull/12584/files&source=gmail-imap&ust=175002856900&usg=AOvVaw0kMs3bVaBXRtZCTvStT3JI
> 
> Yufei
> 
> 
> On Tue, Jun 3, 2025 at 10:19 PM Adnan Hemani
>  wrote:
> 
>> Hi all,
>> 
>> Wanted to start this thread on getting community feedback for the schema
>> of a new `events` table, as described by my Iceberg Events API
>> implementation thread <
>> https://www.google.com/url?q=https://lists.apache.org/thread/nfp40hmwryybznxtgffvy4c5l5kk19ho&source=gmail-imap&ust=175002856900&usg=AOvVaw2zcBEWMfKLXrNZhXbETGB4>.
>> 
>> Schema can be found on this Google Doc - please feel free to comment
>> directly on the document and we can discuss in the comments section as
>> well:
>> https://www.google.com/url?q=https://docs.google.com/document/d/1AxaXy-DXE2g6rmFktA2foDI7k2zIjFzTKmI5mZnUjOs/edit?usp%3Dsharing&source=gmail-imap&ust=175002856900&usg=AOvVaw38W8LTWpVceP8bFhLVV_La
>> 
>> Best,
>> Adnan Hemani



Re: [DISCUSS] Injecting RealmContext

2025-06-09 Thread yun zou
Hi Dmitri,

Thanks for bringing that up!

I am not an expert with CDI, based my recent experience with CDI, while CDI
does help making development
simpler under many cases, it does also make things more complicated under
some scenarios. Especially when
injecting a request scoped object into application scoped class, it does
make things harder to reason about when
debugging, and it also seems we currently lack efficient ways of testing to
catch potential regressions. For those
cases, i think explicitly passing the objects as parameters is actually
more clear and robust.

Furthermore, I think this could make things easier for new developers for
OSS polaris.

Also, regarding the metastore object, i don't think i fully understand the
proposal, are we proposing to make the
MetastoreManager now request scoped? Currently, the metastore manager is
created per realm, not per
RealmContext, which means two requests with the same realmId can use the
same metastore manager. If we are turning
this class into a request scope, I am not sure if that would be a good
idea, especially when we are dealing with Entity Caching
optimizations, where caching per realm instead of per request probably
makes more sense.

Best Regards,
Yun



On Fri, Jun 6, 2025 at 2:47 PM Yufei Gu  wrote:

> Agreed with Alex. I think passing a realm context is a cleaner solution
> than injecting a request scope realm context. CDI provides a lot of
> benefits, but it doesn't mean that we should alway use it.
>
> In general, injecting a request-scoped bean into an application-scoped bean
> is perfectly valid in CDI, but it does come with a few gotchas:
>
>1. Lifecycle surprises. An @ApplicationScoped method can run in contexts
>where no request scope is active (e.g., async tasks, scheduled jobs). In
>those cases the injected proxy can’t resolve a real instance, leading to
>ContextNotActiveException or, worse, stale data.
>2. Debugging pain, When failures depend on the presence or absence of a
>request context, stack traces alone rarely tell the full story. Tracking
>down intermittent bugs becomes tricky.
>3. Unnecessary complexity, which is a contradiction of using injection.
>Injection is supposed to make a simpler and cleaner code base. If you
>really only need a bit of request data, passing it as a method parameter
>keeps the code simpler, avoids proxy lookups, and makes unit tests
> cleaner.
>
> As a rule of thumb, we may inject a narrower scope only when we must.
>
> Yufei
>
>
> On Fri, Jun 6, 2025 at 11:15 AM Dmitri Bourlatchkov 
> wrote:
>
> > Thanks, Alex! Very good points!
> >
> > Re: metrics, I did a quick POC by adding `@Inject RealmContext rc`
> > to QuarkusValueExpressionResolver and it appears to work. That
> > is, QuarkusValueExpressionResolver can get the realm ID from the injected
> > RealmContext as opposed to the API parameter.
> >
> > With that, we could probably change the interpretation of the
> > "realmIdentifier" expression to be the injected RealmContext, remove
> > explicit realm parameters from the generated API classes and move the
> > `@MeterTag(key="realm_id",expression="realmIdentifier")` annotation to
> the
> > method level.
> >
> > WDYT?
> >
> > Thanks,
> > Dmitri.
> >
> > On Fri, Jun 6, 2025 at 12:41 PM Alex Dutra  >
> > wrote:
> >
> > > Hi Dmitri,
> > >
> > > Thanks for bringing up this important topic.
> > >
> > > I generally agree with your refactoring proposal, but with 2 caveats:
> > >
> > > 1) For an application-scoped bean exposing methods to retrieve
> > realm-scoped
> > > components (e.g., the "factory" beans that expose methods like
> > > "getOrCreateXYZ") I think it's actually OK to receive the RealmContext
> > as a
> > > method parameter, as opposed to having the RealmContext injected. This
> is
> > > because the injection, while possible, would only work if the request
> > > context is active at the moment where a method on that bean is invoked.
> > > This is hard to enforce at compile time and also hard to reason about
> > imho.
> > > IOW I prefer to see this:
> > >
> > > @ApplicationScoped public class XYZFactory {
> > > public XYZ getOrCreateXYZ(RealmContext ctx) { return
> > > doSomeStuff(ctx.getRealmIdentifier()); }
> > > }
> > >
> > > Rather than this:
> > >
> > > @ApplicationScoped public class XYZFactory {
> > > @Inject RealmContext ctx;
> > > public XYZ getOrCreateXYZ() { return
> > > doSomeStuff(ctx.getRealmIdentifier()); }
> > > }
> > >
> > > 2) We do need a RealmContext parameter in the API classes generated
> from
> > > Mustache templates [1], because the RealmContext parameter is annotated
> > as
> > > follows:
> > >
> > > @Context @MeterTag(key="realm_id",expression="realmIdentifier")
> > > RealmContext realmContext
> > >
> > > This is so that we can produce "realm_id" tags for API metrics. Thus we
> > > cannot remove it (unless there is another way to produce such tags, but
> > I'm
> > > not aware of any).
> > >
> > > But we could remov

Re: [DISCUSS] Injecting RealmContext

2025-06-09 Thread Michael Collado
I'd prefer to avoid having the APIs reflect the scope of the implementation
beans under the hood. On the other hand, I don't like forcing methods at
the bottom of the call stack requiring parameters because they're necessary
later on. I liked passing CallContext around because it didn't really
matter if the bean was ApplicationScoped or RequestScoped - CallContext was
just there. Passing the CallContext to the PolarisMetaStoreManager meant
that the underlying implementation could use an ApplicationScoped bean,
like the PolarisConfigurationStore and the current realm could be
determined from the method parameter. It's true, we could make everything
RequestScoped and then nothing needs to be passed in, but that feels
inefficient - e.g., should the PolarisConfigurationStore re-parse the
configuration map for every request?

A downside is that CallContext becomes the "everything bean" where you just
add properties because it's the easiest way to get a new parameter passed
to some bean 3 layers down, but I honestly think that's less of a concern
when the CDI context already allows for an easier way of injecting
dependencies.

Mike

On Mon, Jun 9, 2025 at 3:54 PM Dmitri Bourlatchkov  wrote:

> Thanks for stating your concerns, Yun!
>
> Re: MetastoreManager, I believe the current state of that code is actually
> the source of confusion and I'm trying to clarify that.
>
> For example, TransactionalMetaStoreManagerImpl is currently created per
> realm. However, this class does not carry any state. It is an empty shell
> that works off of the data passed as PolarisCallContext parameters. The
> same holds for AtomicOperationMetaStoreManager.
>
> As such, PolarisMetaStoreManager can be request-scoped beans. I do not see
> any reason to keep those objects after the request is done.
>
> WDYT?
>
> Thanks,
> Dmitri.
>
> On Mon, Jun 9, 2025 at 5:57 PM yun zou  wrote:
>
> > Hi Dmitri,
> >
> > Thanks for bringing that up!
> >
> > I am not an expert with CDI, based my recent experience with CDI, while
> CDI
> > does help making development
> > simpler under many cases, it does also make things more complicated under
> > some scenarios. Especially when
> > injecting a request scoped object into application scoped class, it does
> > make things harder to reason about when
> > debugging, and it also seems we currently lack efficient ways of testing
> to
> > catch potential regressions. For those
> > cases, i think explicitly passing the objects as parameters is actually
> > more clear and robust.
> >
> > Furthermore, I think this could make things easier for new developers for
> > OSS polaris.
> >
> > Also, regarding the metastore object, i don't think i fully understand
> the
> > proposal, are we proposing to make the
> > MetastoreManager now request scoped? Currently, the metastore manager is
> > created per realm, not per
> > RealmContext, which means two requests with the same realmId can use the
> > same metastore manager. If we are turning
> > this class into a request scope, I am not sure if that would be a good
> > idea, especially when we are dealing with Entity Caching
> > optimizations, where caching per realm instead of per request probably
> > makes more sense.
> >
> > Best Regards,
> > Yun
> >
> >
> >
> > On Fri, Jun 6, 2025 at 2:47 PM Yufei Gu  wrote:
> >
> > > Agreed with Alex. I think passing a realm context is a cleaner solution
> > > than injecting a request scope realm context. CDI provides a lot of
> > > benefits, but it doesn't mean that we should alway use it.
> > >
> > > In general, injecting a request-scoped bean into an application-scoped
> > bean
> > > is perfectly valid in CDI, but it does come with a few gotchas:
> > >
> > >1. Lifecycle surprises. An @ApplicationScoped method can run in
> > contexts
> > >where no request scope is active (e.g., async tasks, scheduled
> jobs).
> > In
> > >those cases the injected proxy can’t resolve a real instance,
> leading
> > to
> > >ContextNotActiveException or, worse, stale data.
> > >2. Debugging pain, When failures depend on the presence or absence
> of
> > a
> > >request context, stack traces alone rarely tell the full story.
> > Tracking
> > >down intermittent bugs becomes tricky.
> > >3. Unnecessary complexity, which is a contradiction of using
> > injection.
> > >Injection is supposed to make a simpler and cleaner code base. If
> you
> > >really only need a bit of request data, passing it as a method
> > parameter
> > >keeps the code simpler, avoids proxy lookups, and makes unit tests
> > > cleaner.
> > >
> > > As a rule of thumb, we may inject a narrower scope only when we must.
> > >
> > > Yufei
> > >
> > >
> > > On Fri, Jun 6, 2025 at 11:15 AM Dmitri Bourlatchkov 
> > > wrote:
> > >
> > > > Thanks, Alex! Very good points!
> > > >
> > > > Re: metrics, I did a quick POC by adding `@Inject RealmContext rc`
> > > > to QuarkusValueExpressionResolver and it appears to work. That
> > > > is, QuarkusValueExpressionReso

Re: [DISCUSS] Injecting RealmContext

2025-06-09 Thread Dmitri Bourlatchkov
Thanks for stating your concerns, Yun!

Re: MetastoreManager, I believe the current state of that code is actually
the source of confusion and I'm trying to clarify that.

For example, TransactionalMetaStoreManagerImpl is currently created per
realm. However, this class does not carry any state. It is an empty shell
that works off of the data passed as PolarisCallContext parameters. The
same holds for AtomicOperationMetaStoreManager.

As such, PolarisMetaStoreManager can be request-scoped beans. I do not see
any reason to keep those objects after the request is done.

WDYT?

Thanks,
Dmitri.

On Mon, Jun 9, 2025 at 5:57 PM yun zou  wrote:

> Hi Dmitri,
>
> Thanks for bringing that up!
>
> I am not an expert with CDI, based my recent experience with CDI, while CDI
> does help making development
> simpler under many cases, it does also make things more complicated under
> some scenarios. Especially when
> injecting a request scoped object into application scoped class, it does
> make things harder to reason about when
> debugging, and it also seems we currently lack efficient ways of testing to
> catch potential regressions. For those
> cases, i think explicitly passing the objects as parameters is actually
> more clear and robust.
>
> Furthermore, I think this could make things easier for new developers for
> OSS polaris.
>
> Also, regarding the metastore object, i don't think i fully understand the
> proposal, are we proposing to make the
> MetastoreManager now request scoped? Currently, the metastore manager is
> created per realm, not per
> RealmContext, which means two requests with the same realmId can use the
> same metastore manager. If we are turning
> this class into a request scope, I am not sure if that would be a good
> idea, especially when we are dealing with Entity Caching
> optimizations, where caching per realm instead of per request probably
> makes more sense.
>
> Best Regards,
> Yun
>
>
>
> On Fri, Jun 6, 2025 at 2:47 PM Yufei Gu  wrote:
>
> > Agreed with Alex. I think passing a realm context is a cleaner solution
> > than injecting a request scope realm context. CDI provides a lot of
> > benefits, but it doesn't mean that we should alway use it.
> >
> > In general, injecting a request-scoped bean into an application-scoped
> bean
> > is perfectly valid in CDI, but it does come with a few gotchas:
> >
> >1. Lifecycle surprises. An @ApplicationScoped method can run in
> contexts
> >where no request scope is active (e.g., async tasks, scheduled jobs).
> In
> >those cases the injected proxy can’t resolve a real instance, leading
> to
> >ContextNotActiveException or, worse, stale data.
> >2. Debugging pain, When failures depend on the presence or absence of
> a
> >request context, stack traces alone rarely tell the full story.
> Tracking
> >down intermittent bugs becomes tricky.
> >3. Unnecessary complexity, which is a contradiction of using
> injection.
> >Injection is supposed to make a simpler and cleaner code base. If you
> >really only need a bit of request data, passing it as a method
> parameter
> >keeps the code simpler, avoids proxy lookups, and makes unit tests
> > cleaner.
> >
> > As a rule of thumb, we may inject a narrower scope only when we must.
> >
> > Yufei
> >
> >
> > On Fri, Jun 6, 2025 at 11:15 AM Dmitri Bourlatchkov 
> > wrote:
> >
> > > Thanks, Alex! Very good points!
> > >
> > > Re: metrics, I did a quick POC by adding `@Inject RealmContext rc`
> > > to QuarkusValueExpressionResolver and it appears to work. That
> > > is, QuarkusValueExpressionResolver can get the realm ID from the
> injected
> > > RealmContext as opposed to the API parameter.
> > >
> > > With that, we could probably change the interpretation of the
> > > "realmIdentifier" expression to be the injected RealmContext, remove
> > > explicit realm parameters from the generated API classes and move the
> > > `@MeterTag(key="realm_id",expression="realmIdentifier")` annotation to
> > the
> > > method level.
> > >
> > > WDYT?
> > >
> > > Thanks,
> > > Dmitri.
> > >
> > > On Fri, Jun 6, 2025 at 12:41 PM Alex Dutra
>  > >
> > > wrote:
> > >
> > > > Hi Dmitri,
> > > >
> > > > Thanks for bringing up this important topic.
> > > >
> > > > I generally agree with your refactoring proposal, but with 2 caveats:
> > > >
> > > > 1) For an application-scoped bean exposing methods to retrieve
> > > realm-scoped
> > > > components (e.g., the "factory" beans that expose methods like
> > > > "getOrCreateXYZ") I think it's actually OK to receive the
> RealmContext
> > > as a
> > > > method parameter, as opposed to having the RealmContext injected.
> This
> > is
> > > > because the injection, while possible, would only work if the request
> > > > context is active at the moment where a method on that bean is
> invoked.
> > > > This is hard to enforce at compile time and also hard to reason about
> > > imho.
> > > > IOW I prefer to see this:
> > > >
> > > > @ApplicationScoped public clas