karuppayya commented on pull request #1471: URL: https://github.com/apache/iceberg/pull/1471#issuecomment-947224930
In case of `S3FileIO`, default scheme is `s3://`. Writes happening from different clients, will have schemes based on `io-impl` property. The manifest might have mix of `s3://`, `s3a://` etc But the file listing(in DeleteOrphanFiles), will have only only a single prefix(which is determined by the Client Hadoop configuration). This will result in orphan files not being cleaned. When the user is aware that the scheme can be ignored, I think we should provide a configuration to do that. I am not able to come up with a concrete case for the authority(may be HDFS with and with authority), but that could also be a configuration. @RussellSpitzer @aokolnychyi @rdblue @flyrain @raptond Your thoughts on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
