Steve Loughran created HADOOP-19388:
---------------------------------------
Summary: S3A: Validate bulk delete through Iceberg HadoopFileIO
Key: HADOOP-19388
URL: https://issues.apache.org/jira/browse/HADOOP-19388
Project: Hadoop Common
Issue Type: Sub-task
Components: fs/s3, test
Affects Versions: 3.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
Now Hadoop 3.4.1 has shipped we can link up Iceberg to it
through reflection: https://github.com/apache/iceberg/pull/10233
However, we can't put a test in there, even just to talk to
the minio docker image which S3FileIO tests with, because
the tests would only work with hadoop 3.4.1+
Proposed: add a validation test here, initially just with a JAR built from the
PR.
Initially this just says "it works as expected".
However, it will go on to become the regression tests "it still works",
so there's no need to wait for test downstream to be run and failures to be
reported back.
We need a test suite which
* Adds a test-time dependency on iceberg JAR with bulk delete through the
HadoopFileIO class.
* Runs compliance tests, single/multi delete, complex names, directories,
missing paths
* Parameterized on single/multi delete enables in s3a, iceberg to use/not use
bulk delete
* includes IOStats assertions to verify bulk delete was actually used.
* mixes in some local file:// files to so as to validate multiple stores with
different page sizes.
I had started this within HADOOP-19385, with iceberg jar one of the formats and
the new test module to include the base contract test suite.
But as the iceberg JAR is java17+, it rapidly becomes unworkable.
Instead, it will all go into s3a with a new java17 profile which will
* add iceberg jar dependency
* add a new src/test/java17 test source tree.
* contain a minimal abstract base test
* s3a implementation
Once Hadoop is java17 then it can be moved into to the main branch.
Note also: until iceberg actually ships with the PR in, this cannot be
merged.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]