Martijn Visser created FLINK-39499:
--------------------------------------

             Summary: Replace MinIO in e2e tests with an Apache 2.0-licensed S3 
alternative
                 Key: FLINK-39499
                 URL: https://issues.apache.org/jira/browse/FLINK-39499
             Project: Flink
          Issue Type: Technical Debt
          Components: Tests
            Reporter: Martijn Visser


MinIO relicensed from Apache 2.0 to AGPLv3 in April 2021. While test-time 
Docker usage doesn't strictly violate ASF's Category X policy (which governs 
artifact inclusion), aligning dev/test tooling with Apache 2.0 is a reasonable 
hygiene preference for an Apache project, and avoids ongoing friction with 
contributors who want to minimize AGPL exposure in any form.

The current setup is already broken. common_s3_minio.sh pulls 
minio/minio:latest, which dropped the FS backend around late 2022. Tests that 
read pre-staged files (test_batch_wordcount.sh hadoop_minio / presto_minio) now 
fail with FileNotFoundException on s3://test-data/words. Write-only tests 
(test_file_sink.sh s3 *) still pass because they create objects via API. The 
breakage went unnoticed because only write-only variants are in 
run-nightly-tests.sh.

We need to investigate what's a proper replacement. There is 
https://rmoff.net/2026/01/14/alternatives-to-minio-for-single-node-local-s3/ 
which makes a comparison already. From a quick check, we could consider using 
S3Proxy (Apache 2.0) preserves the current "file on disk = S3 object" semantics 
via its jclouds filesystem backend. SeaweedFS (Apache 2.0) is an alternative if 
we're willing to restructure the test to upload via S3 API. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to