Hi Dmitri, thanks for the comprehensive recap. For "the newly added Maintenance module was not exposed in previous docs related to NoSQL", I wonder whether this is just a misunderstanding. As Prashant noted, in the NoSQL presentation that was run a couple of times by Adam [1], there is a mention of "A maintenance task in the Admin CLI Tool". And the original design doc [2] also contains an explanation as to why this is necessary for NoSQL in the "Handling no longer needed objects" section. Am I missing something?
Regarding the repository choice, I would like to emphasize the potential overhead in release management. Today, we have a manual release process that only spans the `apache/polaris` repository. And we have a semi-automated release process that is tighly coupled with the `apache/polaris` repository. Tightly coupled because it is implemented as Github workflows within that repository. Let's consider the potential impacts on release process and cadence. [1] https://docs.google.com/presentation/d/1lX2EdvM0SeyuOdO_u1idlWfmnlH3hFE16JEyWo45Bdo/edit?slide=id.p24#slide=id.p24 [2] https://docs.google.com/document/d/1POUWe0xMZOBoaJ6Rgiw35ziEoc6OEYCiW7Zk6bR9H6M/edit?tab=t.0#heading=h.ccj3ewbhhhhy -- Pierre On Wed, Jan 14, 2026 at 11:18 PM Dmitri Bourlatchkov <[email protected]> wrote: > Hi All, > > As Prashant mentioned in GH [1], the newly added Maintenance module was not > exposed in previous docs related to NoSQL. Let's use this email thread to > discuss it and possible concerns people may have. Below, I'm providing > rationale for topics, of which I am aware. Please feel free to start new > threads dedicated to other concerns. Let's keep this discussion focused on > the NoSQL maintenance functionality, though. > > * Why is this code necessary? > > NoSQL persistence is not transactional. Even normal commits leave some > amount of historical data in the database. Failed commits may leave > remnants of preparatory data in the database too. > > If not cleaned up, this will lead to virtually indefinite growth of > persisted data over time. > > Therefore, some periodic async cleanup is necessary. The maintenance code > in PR [3268] provides fundamental code for performing this cleanup. > > * Why does it have to be in the main repo? > > The code in PR [3268] has to align tightly with the actual NoSQL > Persistence implementation. It has to evolve in sync with the data model of > stored data. > > Therefore, it is logical to keep it in the same repo as the mainstream > NoSQL Persistence code. > > * Why is CEL required? > > CEL was chosen based on prior work when the NoSQL Persistence was developed > in private. It provides an efficient and expressive medium for admin users > to define NoSQL maintenance policies. > > * Why is the Nessie CEL java impl. used? > > The Nessie CEL java impl. predates the Google impl. and has been used in > production for years under various projects (including Nessie itself). The > developers of the NoSQL persistence are more certain of the runtime > behavior of the Nessie CEL impl. than of Google's. Switching to Google's > CEL java requires additional work. > > * Can we express maintenance policies in some other, non-CEL way? > > Generally yes. However, this requires extra work and analysis of UX impact. > If anyone has a concrete proposal for non-CEL maintenance policies, ideas / > PRs are welcome for discussion, of course. > > * Why does the Admin Tool has to have maintenance commands [3395]? > > This is to allow users of Apache Polaris binary distributions to perform > maintenance should they choose NoSQL Persistence. The Admin Tool is a > natural home for the maintenance CLI because it is in fact intended to > perform direct manipulation of the Polaris database, such as creating the > schema and bootstrapping realms (existing functionality). > > * Can the maintenance command [3395] live in the polaris-tools repo? > > This would effectively require the Admin Tool to live in polaris-tools, > which seems to be against the recent move to unify Admin and Service > binaries [3340]. > > * Can the maintenance code be invoked in some other way (non-Admin-CLI)? > > Yes. For example, it is possible to build docker images dedicated to > running the maintenance tasks without using the Admin CLI. This is not > implemented in Apache Polaris yet. The Admin CLI appears to offer the best > UX for admin users with minimal developer effort. > > [1] > https://github.com/apache/polaris/pull/3268#pullrequestreview-3576273215 > > [3340] https://github.com/apache/polaris/pull/3340 > > [3268] https://github.com/apache/polaris/pull/3268 > > [3395] https://github.com/apache/polaris/pull/3395 > > Thought? Comments? > > Cheers, > Dmitri. >
