As it was easy enough to set up, I installed rustfs and pointed the s3a test suites against it. This code tries to see how well a store can act like a real fs, how well its divergences match AWS S3, and have all corner cases found in production (and fixed!) stayed away. These tests invariably find multiple regressions per AWS SDK releases and even differences between the two AWS S3 implementations (example https://github.com/aws/aws-sdk-java-v2/issues/6459)
Tests to 6 minutes and cost $0 which makes them good for local work and for contributors without corporate funding. *1. Eventually consistently path listing on delete.* Looking into the docs https://deepwiki.com/rustfs/rustfs/5.5-concurrency-management-and-caching#cache-coherence-and-invalidation comes up with "Invalidation is performed asynchronously to avoid blocking the write path." This is not as bad as AWS S3 used to be where a 404 was cached after a failing HEAD request, and listings took time to even find a newly created object. Here it's just DELETE and LIST. *2. list multipart upload operation may not work* * ListMultipartUploads operation always returns an empty list even when lists are in progress. * some other multipart related test stack traces. *3. filesystem is case insensitive, at least on a mac* [ERROR] ITestS3AFileSystemContract>FileSystemContractBaseTest.testFilesystemIsCaseSensitive:643 File exists s3a://rustybucket/job-00-fork-0003/test/testfilesystemiscasesensitive ==> expected: <false> but was: <true> *4. getBucketMetadata operation unsupported; returns 501.* org.apache.hadoop.fs.s3a.AWSUnsupportedFeatureException: getBucketMetadata() on rustybucket: software.amazon.awssdk.services.s3.model.S3Exception: Not Implemented (Service: S3, Status Code: 501, Request ID: null):null: Not Implemented (Service: S3, Status Code: 501, Request ID: null) trivial and not unusual with third party stores. Oh, and you can't use rustfs as a bucket name. Overall, I would avoid using anywhere you require case sensitivity or list consistency, and would want to spend more time exploring the multipart test failures to be confident you can upload objects in sizes of many GB. That listing inconsistency brings back bad memories of the old S3, when hive and spark queries could miss data. However, Iceberg is designed for inconsistent stores isn't it? And because I'm only seeing delete listing inconsistencies, not 404 caching on object entries (which could trigger a file not found trying to read a freshly-created object), tests which read manifests or data immediately after creation will not fail. *For iceberg testing:* 1. Tests to verify bulk deletion through deleteFiles() are likely to fail if they use a list operation to probe for paths, not HEAD requests. As they use BaseS3File.exists()/getObjectMetadata(), HEAD is used, which is why the inconsistencies didn't surface. 2. Tests mustn't rely on case sensitivity in filenames. Fix; don't create files with same path but different case -Steve On Mon, 29 Dec 2025 at 11:41, Steve Loughran <[email protected]> wrote: > makes sense, though it isn't good for regression testing s3 code against > aws s3 itself. > > Are there any public iceberg tables hosted by amazon themselves that query > tests could be run against? I know of public parquet datasets (NOAA > datasets for example), but no parquet + iceberg ones. A good large table > free for all to read would be very useful > > steve > > On Mon, 29 Dec 2025 at 03:06, Jinghe Ma <[email protected]> wrote: > >> As discussed in this issue >> <https://github.com/apache/iceberg/issues/14638>, since the MinIO repo >> is under maintenance mode, we want to replace MinIO with RustFS in iceberg >> quick start demo >> <https://iceberg.apache.org/spark-quickstart/#creating-a-table>. RustFS >> <https://github.com/rustfs/rustfs> is an open-source and S3-compatible >> distributed object storage system and is also a good alternative. We have >> tested the whole replacement configuration locally, and it worked fine. All >> the configuration changes and the test process are attached in this PR >> <https://github.com/apache/iceberg/pull/14928>. We want to move the >> replacement forward; if you guys have any concern about RustFS or any other >> confusion, please let us know. >> >> Looking forward to hearing from you. >> >
