Hi All,

As Prashant mentioned in GH [1], the newly added Maintenance module was not
exposed in previous docs related to NoSQL. Let's use this email thread to
discuss it and possible concerns people may have. Below, I'm providing
rationale for topics, of which I am aware. Please feel free to start new
threads dedicated to other concerns. Let's keep this discussion focused on
the NoSQL maintenance functionality, though.

* Why is this code necessary?

NoSQL persistence is not transactional. Even normal commits leave some
amount of historical data in the database. Failed commits may leave
remnants of preparatory data in the database too.

If not cleaned up, this will lead to virtually indefinite growth of
persisted data over time.

Therefore, some periodic async cleanup is necessary. The maintenance code
in PR [3268] provides fundamental code for performing this cleanup.

* Why does it have to be in the main repo?

The code in PR [3268] has to align tightly with the actual NoSQL
Persistence implementation. It has to evolve in sync with the data model of
stored data.

Therefore, it is logical to keep it in the same repo as the mainstream
NoSQL Persistence code.

* Why is CEL required?

CEL was chosen based on prior work when the NoSQL Persistence was developed
in private. It provides an efficient and expressive medium for admin users
to define NoSQL maintenance policies.

* Why is the Nessie CEL java impl. used?

The Nessie CEL java impl. predates the Google impl. and has been used in
production for years under various projects (including Nessie itself). The
developers of the NoSQL persistence are more certain of the runtime
behavior of the Nessie CEL impl. than of Google's. Switching to Google's
CEL java requires additional work.

* Can we express maintenance policies in some other, non-CEL way?

Generally yes. However, this requires extra work and analysis of UX impact.
If anyone has a concrete proposal for non-CEL maintenance policies, ideas /
PRs are welcome for discussion, of course.

* Why does the Admin Tool has to have maintenance commands [3395]?

This is to allow users of Apache Polaris binary distributions to perform
maintenance should they choose NoSQL Persistence. The Admin Tool is a
natural home for the maintenance CLI because it is in fact intended to
perform direct manipulation of the Polaris database, such as creating the
schema and bootstrapping realms (existing functionality).

* Can the maintenance command [3395] live in the polaris-tools repo?

This would effectively require the Admin Tool to live in polaris-tools,
which seems to be against the recent move to unify Admin and Service
binaries [3340].

* Can the maintenance code be invoked in some other way (non-Admin-CLI)?

Yes. For example, it is possible to build docker images dedicated to
running the maintenance tasks without using the Admin CLI. This is not
implemented in Apache Polaris yet. The Admin CLI appears to offer the best
UX for admin users with minimal developer effort.

[1] https://github.com/apache/polaris/pull/3268#pullrequestreview-3576273215

[3340] https://github.com/apache/polaris/pull/3340

[3268] https://github.com/apache/polaris/pull/3268

[3395] https://github.com/apache/polaris/pull/3395

Thought? Comments?

Cheers,
Dmitri.

Reply via email to