Thank you for starting the thread Dmitri ! Thank you Peirre for the response, I certainly missed this section of the design document.
I believe I was expecting a design doc explaining why we want to selectively retain the entities which are not the current version as if NoSQL implementation cares about this, is there any design for this ? secondly as proposed in the doc we should just be cleaning all the entities that are not current so I am unsure why we want to have age>=30 days kind of retention ? If we selectively want to retain, we need to have a design doc for it to explain use cases, agree on user facing constructs and other, for example a possible interpretation is can i go back to the state of the catalog as of 30 days ago ? I don't think Polaris supports undrop or time travel, and I don;t JDBC will be able to support it, so I believe NoSQL's *default* behaviour should be delete everything that's not current. I can see the admin tool mentioned, but what I can't see in the presentation is this whole module, design trade off of sync vs async maintenance, user specific constructs, for example retention expression, why is it required. I believe those things warrant a design for themselves is my take. With that being said I totally understand NoSQL requires maintenance, what I fail to understand is why does NoSQL require retention expressions ? why can't everything that's not currently marked as a GC candidate, if the issue is we need this for debugging then we should just have a simple config saying keep the latest X commits. To me it feels like we are opening for cases such as time travel and undrop without border agreement with the community. If we want to do these additional things and expose these extra constructs which I think are good to do, they can't be part of the polaris repo but would be a good tool for polaris goodies. Hence was the request to open the discussion in the thread as well as have a debate on where this tool would be, because Admin tool presently just has bootstrap and purge which are supported by both the persistence but maintenance is just NoSQL specific and there is no way JDBC and IMHO it would be very confusing for end user to see i can't retain my catalog state as of 30 days in JDBC vs in NOSQL so leaking this to admin tool, IMHO is not a good idea, but am open to hearing others on why its is and how this concern is handled! Regarding the expression language introduction (I humbly disagree that we need one), I went till the 8th page of this projectnessie/cel-java [1] this has just done dependency update where as googles/cel-java is something google developers are actively working and cel-java is an google's spec so i would rather use google/cel-java rather than have a third party dependency of the same spec implementation which google owns. With that being said I am open to hearing from others as to why such constructs should be present in the NoSQL specially retained staff age <= 30 ? On an orthogonal note : It would have been better if we would have had these discussions before we merged the PR. Thank you again Dmitri for starting this conversation, I really appreciate it ! [1] https://github.com/apache/polaris/pull/3268#pullrequestreview-3576273215 Best, Prashant Singh On Thu, Jan 15, 2026 at 2:24 AM Pierre Laporte <[email protected]> wrote: > Hi Dmitri, thanks for the comprehensive recap. > > For "the newly added Maintenance module was not exposed in previous docs > related to NoSQL", I wonder whether this is just a misunderstanding. As > Prashant noted, in the NoSQL presentation that was run a couple of times by > Adam [1], there is a mention of "A maintenance task in the Admin CLI > Tool". And the original design doc [2] also contains an explanation as to > why this is necessary for NoSQL in the "Handling no longer needed objects" > section. Am I missing something? > > Regarding the repository choice, I would like to emphasize the potential > overhead in release management. Today, we have a manual release process > that only spans the `apache/polaris` repository. And we have a > semi-automated release process that is tighly coupled with the > `apache/polaris` repository. Tightly coupled because it is implemented as > Github workflows within that repository. Let's consider the potential > impacts on release process and cadence. > > [1] > > https://docs.google.com/presentation/d/1lX2EdvM0SeyuOdO_u1idlWfmnlH3hFE16JEyWo45Bdo/edit?slide=id.p24#slide=id.p24 > [2] > > https://docs.google.com/document/d/1POUWe0xMZOBoaJ6Rgiw35ziEoc6OEYCiW7Zk6bR9H6M/edit?tab=t.0#heading=h.ccj3ewbhhhhy > -- > > Pierre > > > On Wed, Jan 14, 2026 at 11:18 PM Dmitri Bourlatchkov <[email protected]> > wrote: > > > Hi All, > > > > As Prashant mentioned in GH [1], the newly added Maintenance module was > not > > exposed in previous docs related to NoSQL. Let's use this email thread to > > discuss it and possible concerns people may have. Below, I'm providing > > rationale for topics, of which I am aware. Please feel free to start new > > threads dedicated to other concerns. Let's keep this discussion focused > on > > the NoSQL maintenance functionality, though. > > > > * Why is this code necessary? > > > > NoSQL persistence is not transactional. Even normal commits leave some > > amount of historical data in the database. Failed commits may leave > > remnants of preparatory data in the database too. > > > > If not cleaned up, this will lead to virtually indefinite growth of > > persisted data over time. > > > > Therefore, some periodic async cleanup is necessary. The maintenance code > > in PR [3268] provides fundamental code for performing this cleanup. > > > > * Why does it have to be in the main repo? > > > > The code in PR [3268] has to align tightly with the actual NoSQL > > Persistence implementation. It has to evolve in sync with the data model > of > > stored data. > > > > Therefore, it is logical to keep it in the same repo as the mainstream > > NoSQL Persistence code. > > > > * Why is CEL required? > > > > CEL was chosen based on prior work when the NoSQL Persistence was > developed > > in private. It provides an efficient and expressive medium for admin > users > > to define NoSQL maintenance policies. > > > > * Why is the Nessie CEL java impl. used? > > > > The Nessie CEL java impl. predates the Google impl. and has been used in > > production for years under various projects (including Nessie itself). > The > > developers of the NoSQL persistence are more certain of the runtime > > behavior of the Nessie CEL impl. than of Google's. Switching to Google's > > CEL java requires additional work. > > > > * Can we express maintenance policies in some other, non-CEL way? > > > > Generally yes. However, this requires extra work and analysis of UX > impact. > > If anyone has a concrete proposal for non-CEL maintenance policies, > ideas / > > PRs are welcome for discussion, of course. > > > > * Why does the Admin Tool has to have maintenance commands [3395]? > > > > This is to allow users of Apache Polaris binary distributions to perform > > maintenance should they choose NoSQL Persistence. The Admin Tool is a > > natural home for the maintenance CLI because it is in fact intended to > > perform direct manipulation of the Polaris database, such as creating the > > schema and bootstrapping realms (existing functionality). > > > > * Can the maintenance command [3395] live in the polaris-tools repo? > > > > This would effectively require the Admin Tool to live in polaris-tools, > > which seems to be against the recent move to unify Admin and Service > > binaries [3340]. > > > > * Can the maintenance code be invoked in some other way (non-Admin-CLI)? > > > > Yes. For example, it is possible to build docker images dedicated to > > running the maintenance tasks without using the Admin CLI. This is not > > implemented in Apache Polaris yet. The Admin CLI appears to offer the > best > > UX for admin users with minimal developer effort. > > > > [1] > > https://github.com/apache/polaris/pull/3268#pullrequestreview-3576273215 > > > > [3340] https://github.com/apache/polaris/pull/3340 > > > > [3268] https://github.com/apache/polaris/pull/3268 > > > > [3395] https://github.com/apache/polaris/pull/3395 > > > > Thought? Comments? > > > > Cheers, > > Dmitri. > > >
