Hi all, I'm fine with either donating the library to Polaris or reusing the original Nessie one, although I have a slight preference for option 1 (accept the donation), as I believe this would provide a faster time-to-market if we ever need to bring changes or new features to the library.
Thanks, Alex On Tue, Jun 2, 2026 at 10:56 AM Robert Stupp <[email protected]> wrote: > > Hi Yufei, > > I think there are two separate things here. > > The immediate use case is test infrastructure for Polaris tests that must > go through the real SDK/FileIO HTTP paths while still letting the test > control the object store's behavior. > > For the object-storage-ops / purge-table tests, this means things like > generated objects, synthetic listings, metadata, conditional responses, > intercepted writes/deletes, targeted failures, and using roughly the same > fixture setup for S3/GCS/ADLS. > > That does not mean the mock should become a full object-store emulator. > I would keep it test-only and only add protocol behavior when a concrete > Polaris test needs it. > > The other question is ownership. > > If Polaris only wants to reuse the current behavior, and people are fine > with a Nessie test dependency, then using the Nessie test artifacts > directly is a reasonable option. > > If we expect these tests to drive Polaris-specific behavior over time, then > I think having the code in Polaris is cleaner. > The test utility would live next to the tests that need it, and future > additions would need concrete Polaris test cases. > > So I still think the choices are: > > 1. accept `object-storage-mock` into Polaris as test-only infrastructure; > 2. use the Nessie test artifacts directly; > 3. name another concrete library, or combination of libraries, and check it > against the requirements above; or > 4. defer the object-storage-ops / purge-table work that depends on this > level of > testing. > > If there is another use case or tradeoff for this test-infra decision, > please spell it out. > Otherwise I think we should pick one of these paths and keep > the object-storage-ops API discussion separate. > > Robert > > > On Tue, Jun 2, 2026 at 3:02 AM Yufei Gu <[email protected]> wrote: > > > Thanks Robert and Dmitri for raising this. > > > > One thing I'm still trying to understand better is the use case. Could you > > share a bit more about the use cases you have in mind for consuming the > > Nessie test artifacts directly? > > > > In particular, I'm interested in whether the expectation is simply to reuse > > them as stable test infrastructure, or use alternatives, or whether Polaris > > would potentially need to influence, extend, or evolve the behavior of > > those test utilities independently over time. Understanding the anticipated > > use cases would help evaluate the tradeoffs. > > > > Yufei > > > > > > On Mon, Jun 1, 2026 at 8:33 AM Dmitri Bourlatchkov <[email protected]> > > wrote: > > > > > Hi Robert, > > > > > > Thanks for the recap of previous emails! > > > > > > My personal preference would be to reuse the Nessie Object Storage Mock > > > jars in the "test" scope in Polaris (as dependencies). I believe this > > > approach requires less work. > > > > > > However, your proposal for copying that code to Polaris also sounds good > > to > > > me. > > > > > > In general, I second your point that these testing tools are distinct > > from > > > Adobe S3Mock in that they provide emulation/validation/assertion > > > capabilities more natural to the JUnit context. > > > > > > Cheers, > > > Dmitri. > > > > > > On Mon, Jun 1, 2026 at 6:33 AM Robert Stupp <[email protected]> wrote: > > > > > > > Hi all, > > > > > > > > I’d like to restart the object-storage-mock discussion. > > > > The PR discussion has gone in a few directions, and I think we should > > > > decide the test-infra question explicitly. > > > > > > > > A quick recap of where we are: > > > > > > > > - The earlier `[DISCUSS] Object store functionality` [1] thread was > > about > > > > the > > > > broader object-storage-ops and purge-table work. > > > > - In review of that broader work [3], there was concern about depending > > > on > > > > Nessie > > > > test artifacts directly. > > > > - So the test utilities were split out into a separate PR [4]. > > > > - Review of that split-out PR then raised the other question: should > > > > Polaris > > > > accept and maintain that copied code, or should we use existing > > > libraries > > > > such > > > > as Adobe S3Mock instead? > > > > - The current `object-storage-mock` PR [2] is narrower than both > > earlier > > > > PRs. It > > > > is only about the object-storage mock test utility. > > > > > > > > So the question here is not whether to approve the full > > > object-storage-ops > > > > work. > > > > The question is what test infrastructure Polaris wants for object-store > > > > behavior. > > > > > > > > For the object-storage-ops and purge-table work, we need tests that go > > > > through real SDK/FileIO HTTP interactions, > > > > but where the test can still control and check object-store behavior > > > > precisely. > > > > For example: generated objects, synthetic listings, metadata, > > conditional > > > > responses, intercepted writes/deletes, and targeted failures. > > > > > > > > A filesystem fixture, a Map-backed fixture, or a normal local S3 > > emulator > > > > are all useful for other tests, but they do not give that level of > > > > operation-level control. > > > > > > > > Adobe S3Mock is useful when a test needs a local S3-compatible service. > > > > The object-storage-mock is different: it exposes selected > > S3/GCS/ADLS/STS > > > > protocol surfaces while letting the test define bucket behavior per > > > > operation. > > > > That is what lets the current object-storage-ops and purge-table tests > > > > validate real client interactions without depending on cloud services. > > > > > > > > Across the reviews, two reasonable concerns came up: > > > > > > > > - avoiding a Nessie test dependency; > > > > - avoiding unnecessary copied code. > > > > > > > > However, we need to choose a path, because the object-storage-ops and > > > > purge-table work depend on this level of testing. > > > > > > > > I see at least these options: > > > > > > > > 1. accept `object-storage-mock` into Polaris as test-only > > infrastructure, > > > > subject to the normal ASF provenance/license checks > > > > 2. use the Nessie test artifacts directly > > > > 3. identify existing libraries that satisfy the same requirements > > > > 4. defer the object-storage-ops / purge-table work that depends on this > > > > testing > > > > until the test-infra question is resolved. > > > > > > > > My preference is option 1: keep it test-only, limit it to protocol > > > behavior > > > > needed by Polaris tests, and require future protocol additions to come > > > with > > > > concrete Polaris test cases. > > > > > > > > If option 1 or 2 is not acceptable, then option 3 needs to name the > > > > specific library or combination of libraries and check it against the > > > > requirements above. > > > > If there is another path, I would like to understand it. > > > > Otherwise we are effectively choosing option 4 for the work that > > depends > > > on > > > > these tests. > > > > > > > > Robert > > > > > > > > [1] https://lists.apache.org/thread/0z8nb3w58zb9s617gsoyhzlnz53rt9zx > > > > ([DISCUSS] Object store functionality) > > > > [2] https://github.com/apache/polaris/pull/4570 (Add > > object-storage-mock > > > > test utility) > > > > [3] https://github.com/apache/polaris/pull/3256 (Object store > > > > functionality) > > > > [4] https://github.com/apache/polaris/pull/3513 (Test libraries for > > > > storage > > > > operations, closed) > > > > > > > > >
