yyanyy commented on pull request #2807: URL: https://github.com/apache/iceberg/pull/2807#issuecomment-912112920
> Before we start to split the PR into smaller PRs, I think we iceberg community need to reach the consistence about the public/private vendor integration contribution. The iceberg-aws module is a great example, it provides independent mock unit tests for the small feature, the most important point is : Adobe has provided the s3 integration test utility : [com.adobe.testing:s3mock-junit4](https://github.com/adobe/S3Mock), it could just launch a local mini s3 cluster for accessing the HTTP API (the S3Mock pretend as a real S3 http server by implementing the S3 API under a local fs directory). The S3Mock simulator have fully covered test cases to guarantee the local S3 has the same semantics as the [aws s3](https://aws.amazon.com/cn/s3/). > > When I implement [the aliyun OSS integration](https://github.com/apache/iceberg/pull/2230/files), I thought I should provide a similar object storage simulator to align between the local tests and public aliyun oss, so I provided a [OSSMockApplication](https://github.com/apache/iceberg/pull/2230/files#diff-cae7d6bade136ee5e97da24f979e6352929af6df9d244a3afc3a94770396c1bc) and [TestLocalOSS](https://github.com/apache/iceberg/pull/2230/files#diff-f8329e3691562000032033a485ecc5e30bf6d6a3b7e25e5f8cdd4f4e387b604aR53) to align the semantics. For my personal view, I would prefer to provide a fully tested simulator for private vendor integration so that we could build unit tests on top of it to verify the correctness. > > As we will introduce more and more public/private vendor integration in future, I think we should consider agreeing on the details of introducing the vendor as soon as possible, and provide a more complete guide for community contributors to follow and implement. > > FYI @rdblue & @danielcweeks . I think in the ideal world we should, but I'm not sure if we need to completely block new contributions for cloud vendor integration if there is no working backend library for storage services that are available for unit test. In aws module we have an [integration test](https://github.com/apache/iceberg/tree/master/aws/src/integration/java/org/apache/iceberg/aws) package that talks to the actual service. However we don't run them during PR submission and they are run manually before each release. I think we should try to integrate them as one of the auto tests to catch regression. With or without a library that provides full functionality for unit testing, I think this integration test is still valuable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
