Re: [DISCUSS] Replacing minio with RustFS in quick start demo.

Steve Loughran Tue, 30 Dec 2025 04:10:09 -0800

As it was easy enough to set up, I installed rustfs and pointed the s3a
test suites against it. This code tries to see how well a store can act
like a real fs, how well its divergences match AWS S3, and have all corner
cases found in production (and fixed!) stayed away. These tests invariably
find multiple regressions per AWS SDK releases and even differences between
the two AWS S3 implementations (example
https://github.com/aws/aws-sdk-java-v2/issues/6459)

Tests to 6 minutes and cost $0 which makes them good for local work and for
contributors without corporate funding.

*1. Eventually consistently path listing on delete.*

Looking into the docs
https://deepwiki.com/rustfs/rustfs/5.5-concurrency-management-and-caching#cache-coherence-and-invalidation
comes up with "Invalidation is performed asynchronously to avoid blocking
the write path."

This is not as bad as AWS S3 used to be where a 404 was cached after a
failing HEAD request, and listings took time to even find a newly created
object. Here it's just DELETE and LIST.

*2. list multipart upload operation may not work*

* ListMultipartUploads operation always returns an empty list even when
lists are in progress.
* some other multipart related test stack traces.

*3. filesystem is case insensitive, at least on a mac*

[ERROR]
ITestS3AFileSystemContract>FileSystemContractBaseTest.testFilesystemIsCaseSensitive:643
File exists
 s3a://rustybucket/job-00-fork-0003/test/testfilesystemiscasesensitive ==>
expected: <false> but was: <true>

*4. getBucketMetadata operation unsupported; returns 501.*

org.apache.hadoop.fs.s3a.AWSUnsupportedFeatureException:
getBucketMetadata() on rustybucket:
software.amazon.awssdk.services.s3.model.S3Exception: Not Implemented
(Service: S3, Status Code: 501, Request ID: null):null: Not Implemented
(Service: S3, Status Code: 501, Request ID: null)

trivial and not unusual with third party stores.

Oh, and you can't use rustfs as a bucket name.

Overall, I would avoid using anywhere you require case sensitivity or list
consistency, and would want to spend more time exploring the multipart test
failures to be confident you can upload objects in sizes of many GB. That
listing inconsistency brings back bad memories of the old S3, when hive and
spark queries could miss data.

However, Iceberg is designed for inconsistent stores isn't it? And because
I'm only seeing delete listing inconsistencies, not 404 caching on object
entries (which could trigger a file not found trying to read a
freshly-created object), tests which read manifests or data immediately
after creation will not fail.

*For iceberg testing:*

   1. Tests to verify bulk deletion through deleteFiles() are likely to
   fail if they use a list operation to probe for paths, not HEAD requests. As
   they use BaseS3File.exists()/getObjectMetadata(), HEAD is used, which is
   why the inconsistencies didn't surface.
   2. Tests mustn't rely on case sensitivity in filenames. Fix; don't
   create files with same path but different case

-Steve

On Mon, 29 Dec 2025 at 11:41, Steve Loughran <[email protected]> wrote:

> makes sense, though it isn't good for regression testing s3 code against
> aws s3 itself.
>
> Are there any public iceberg tables hosted by amazon themselves that query
> tests could be run against? I know of public parquet datasets (NOAA
> datasets for example), but no parquet + iceberg ones. A good large table
> free for all to read would be very useful
>
> steve
>
> On Mon, 29 Dec 2025 at 03:06, Jinghe Ma <[email protected]> wrote:
>
>> As discussed in this issue
>> <https://github.com/apache/iceberg/issues/14638>, since the MinIO repo
>> is under maintenance mode, we want to replace MinIO with RustFS in iceberg
>> quick start demo
>> <https://iceberg.apache.org/spark-quickstart/#creating-a-table>. RustFS
>> <https://github.com/rustfs/rustfs> is an open-source and S3-compatible
>> distributed object storage system and is also a good alternative. We have
>> tested the whole replacement configuration locally, and it worked fine. All
>> the configuration changes and the test process are attached in this PR
>> <https://github.com/apache/iceberg/pull/14928>. We want to move the
>> replacement forward; if you guys have any concern about RustFS or any other
>> confusion, please let us know.
>>
>> Looking forward to hearing from you.
>>
>

Re: [DISCUSS] Replacing minio with RustFS in quick start demo.

Reply via email to