Hello! I have created a PR https://github.com/apache/polaris/pull/3293 to address this proposal.
Thanks, Oleg On Tue, Nov 25, 2025 at 2:00 PM Oleg Soloviov <[email protected]> wrote: > Hi Adnan, > > As I see no further objections, I would like to start working on it. > > Regarding the attributes, the AttributeKey approach looks like a > reasonable compromise between flexibility and type-safety, but I need to > think it over. > > Thanks, > Oleg > > On Mon, Nov 17, 2025 at 9:24 PM Adnan Hemani > <[email protected]> wrote: > >> Hi Alex, >> >> > I'm actually leaning towards an AttributeKey approach, similar to Netty >> >> I'm not sure this helps address the dangers around using a free-form >> string >> as the attribute key. But as you said, this is more of an implementation >> detail - we can work through it together on any potential PR :) >> >> I think Alex and I are aligned - happy to hear any other community >> opinions >> on this topic, but I think we might be ready to start work on this within >> the next few days if there are no further opinions @Oleg. >> >> Best, >> Adnan Hemani >> >> On Mon, Nov 17, 2025 at 5:37 AM Alexandre Dutra <[email protected]> >> wrote: >> >> > Hi all, >> > >> > > I propose the following (building on Alex's proposal) to move this >> > conversation forward: the new method signature would be >> > `Map<PolarisEvent.EventPropertyType, Object> attributes()` >> > >> > I agree about the potential benefit of strongly-typed attribute keys. >> > While I initially suggested String for simplicity, I'm actually >> > leaning towards an AttributeKey approach, similar to Netty [1]. The >> > concern with using an enum is that it might restrict users from >> > defining their own custom attributes. But that's more an >> > implementation detail. >> > >> > > All other events that only generate an "after" metadata object should >> > store their metadata in "metadataAfter" and leave "metadataBefore" as >> > unset, just like any other unused property. >> > >> > I have no issues with that logic. >> > >> > (But I am surprised by the current design where "before" state >> > information is included in "after" events, and "after" state >> > information is included in "before" events. Given the substantial size >> > of objects like TableMetadata, this dual inclusion looks redundant. It >> > should be possible instead to correlate the before event with its >> > after counterpart and build a before/after diff of the change, if >> > desired. But that's a different topic.) >> > >> > Thanks, >> > Alex >> > >> > [1]: >> > >> https://github.com/netty/netty/blob/fc0d763ca983c8290d087ed2887f112963d812d2/common/src/main/java/io/netty/util/AttributeKey.java#L25 >> > >> > On Fri, Nov 14, 2025 at 6:18 PM Adnan Hemani >> > <[email protected]> wrote: >> > > >> > > Hi all, >> > > >> > > Very sorry for the late reply - this week has been busy. I was (still >> > > somewhat am) in favor of strongly-typed events. I had earlier >> informed my >> > > opinion on this given other systems which do use their events later >> > within >> > > their execution. It seems we do not have this use case yet - and not >> on >> > the >> > > near horizon yet either, as Dmitri has noted. >> > > >> > > However, my one remaining concern with keeping PolarisEvents as a >> > flattened >> > > "bag of properties" is, unless we have comprehensive per-event testing >> > > (which defeats the whole point of removing the strongly-typed events >> > > structure), we may be vulnerable to typos and inconsistent naming, >> which >> > > could effectively render the unified filtering/pruning mechanisms >> > useless. >> > > As a result, I propose the following (building on Alex's proposal) to >> > move >> > > this conversation forward: the new method signature would be >> > > `Map<PolarisEvent.EventPropertyType, Object> attributes()` where >> > > EventPropertyType is an enum defined within PolarisEvent and contains >> all >> > > the different types of properties an event could have. >> > > >> > > Edge case call-out: There will be special care needed for events such >> as >> > > (Before/After)CommitTableEvent, which have metadata objects for before >> > AND >> > > after - but these can be modeled using two separate EventPropertyType >> > > objects: one for metadataBefore and one for metadataAfter. All other >> > events >> > > that only generate an "after" metadata object should store their >> metadata >> > > in "metadataAfter" and leave "metadataBefore" as unset, just like any >> > other >> > > unused property. This may slightly complicate the unified >> > filtering/pruning >> > > logic - but this, IMO, is an acceptable balance. >> > > >> > > WDYT? >> > > >> > > Best, >> > > Adnan Hemani >> > > >> > > On Fri, Nov 14, 2025 at 1:48 AM Oleg Soloviov <[email protected]> >> wrote: >> > > >> > > > Hi all, >> > > > >> > > > It looks like we have a lazy consensus on this proposal. If that's >> the >> > case >> > > > and there are no further objections, I would like to work on this >> one. >> > > > >> > > > Thanks, >> > > > Oleg >> > > > >> > > > On Sat, Nov 8, 2025 at 12:13 AM Dmitri Bourlatchkov < >> [email protected]> >> > > > wrote: >> > > > >> > > > > Hi Alex, >> > > > > >> > > > > I agree that using a flat (single class?) type hierarchy for >> events >> > on >> > > > the >> > > > > server side is reasonable. Polaris Server itself does not appear >> to >> > > > "read" >> > > > > the events it produces, so maintaining the multitude of getters >> does >> > seem >> > > > > like an unnecessary overhead. At the same time producing >> > well-structured >> > > > > payloads for delivering events to external systems (including >> > persistence >> > > > > in the Polaris database) can be achieved without a verbose type >> > > > hierarchy. >> > > > > >> > > > > Cheers, >> > > > > Dmitri. >> > > > > >> > > > > On Fri, Nov 7, 2025 at 11:30 AM Alexandre Dutra < >> [email protected]> >> > > > wrote: >> > > > > >> > > > > > Hi all, >> > > > > > >> > > > > > I'm writing to express my concerns about the current state of >> the >> > > > > > PolarisEvent API and to propose a solution. >> > > > > > >> > > > > > Current challenges: >> > > > > > >> > > > > > 1) Excessive complexity: the PolarisEvent interface currently >> has >> > over >> > > > > > 150 concrete subtypes, with a corresponding number of methods in >> > the >> > > > > > PolarisEventListener interface. This forces each concrete >> listener >> > to >> > > > > > implement all 150+ methods, even when the logic is similar or >> > > > > > identical, leading to significant boilerplate (see example [1] >> > from a >> > > > > > recent PR). >> > > > > > >> > > > > > 2) Manual processes: afaik the current plan for event pruning >> > (e.g., >> > > > > > removing sensitive or large data) is to implement this event by >> > event. >> > > > > > This has been a slow process so far. We only have 2-3 events >> > > > > > implemented, we still have 147 more to go. >> > > > > > >> > > > > > While I generally advocate for strongly typed APIs, I believe >> that >> > in >> > > > > > this specific context, the PolarisEvent hierarchy is slowing >> down >> > the >> > > > > > development of event-related features. >> > > > > > >> > > > > > Do we need so many subtypes? Events are very short-lived >> objects; >> > they >> > > > > > are created, immediately passed to a listener, and then >> > > > > > garbage-collected. Besides, most listeners will likely apply the >> > same >> > > > > > logic to all events (basically: serialize and dispatch). This >> > hints at >> > > > > > a type hierarchy that isn't being useful to its main consumers. >> > > > > > >> > > > > > My proposal is to completely flatten the PolarisEvent hierarchy. >> > > > > > Instead of numerous concrete types, we would have a single >> > > > > > implementation. This implementation would expose the methods I'm >> > > > > > adding in [2], including type() which allows distinguishing >> events >> > by >> > > > > > type ID. >> > > > > > >> > > > > > It would also expose a new method: Map<String, Object> >> > attributes(). >> > > > > > >> > > > > > An event factory would be responsible for creating events and >> > > > > > populating these attributes using a common set of well-defined, >> > typed >> > > > > > attribute keys such as "catalog_name", "table_identifier", >> > > > > > "table_metadata", etc. >> > > > > > >> > > > > > This creates a schemaless-ish view of the event, which is ideal >> for >> > > > > > pruning and serialization. It would enable us to apply common >> rules >> > > > > > more efficiently. For example: >> > > > > > >> > > > > > 1) All events containing the "table_metdata" attribute could >> > > > > > automatically apply a pruning logic to reduce its size. >> > > > > > >> > > > > > 2) All events containing a specific attribute could >> automatically >> > have >> > > > > > sensitive data removed from its value. >> > > > > > >> > > > > > I'm curious to hear what the community thinks of this proposal. >> > > > > > >> > > > > > Thanks, >> > > > > > Alex >> > > > > > >> > > > > > [1]: >> > > > > > >> > > > > >> > > > >> > >> https://github.com/vchag/polaris/blob/4c0aef587e63d5e60d657561a0a53701417f324b/runtime/service/src/main/java/org/apache/polaris/service/events/listeners/AllEventsForwardingListener.java >> > > > > > [2]: https://github.com/apache/polaris/pull/2998 >> > > > > > >> > > > > >> > > > >> > >> >
