Martijn Visser created FLINK-39499:
--------------------------------------
Summary: Replace MinIO in e2e tests with an Apache 2.0-licensed S3
alternative
Key: FLINK-39499
URL: https://issues.apache.org/jira/browse/FLINK-39499
Project: Flink
Issue Type: Technical Debt
Components: Tests
Reporter: Martijn Visser
MinIO relicensed from Apache 2.0 to AGPLv3 in April 2021. While test-time
Docker usage doesn't strictly violate ASF's Category X policy (which governs
artifact inclusion), aligning dev/test tooling with Apache 2.0 is a reasonable
hygiene preference for an Apache project, and avoids ongoing friction with
contributors who want to minimize AGPL exposure in any form.
The current setup is already broken. common_s3_minio.sh pulls
minio/minio:latest, which dropped the FS backend around late 2022. Tests that
read pre-staged files (test_batch_wordcount.sh hadoop_minio / presto_minio) now
fail with FileNotFoundException on s3://test-data/words. Write-only tests
(test_file_sink.sh s3 *) still pass because they create objects via API. The
breakage went unnoticed because only write-only variants are in
run-nightly-tests.sh.
We need to investigate what's a proper replacement. There is
https://rmoff.net/2026/01/14/alternatives-to-minio-for-single-node-local-s3/
which makes a comparison already. From a quick check, we could consider using
S3Proxy (Apache 2.0) preserves the current "file on disk = S3 object" semantics
via its jclouds filesystem backend. SeaweedFS (Apache 2.0) is an alternative if
we're willing to restructure the test to upload via S3 API.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)