ggershinsky commented on code in PR #5544: URL: https://github.com/apache/iceberg/pull/5544#discussion_r1699473244
########## api/src/main/java/org/apache/iceberg/encryption/EncryptingFileIO.java: ########## @@ -109,14 +111,19 @@ public InputFile newInputFile(ManifestFile manifest) { } } - public InputFile newDecryptingInputFile(String path, ByteBuffer buffer) { - return em.decrypt(wrap(io.newInputFile(path), buffer)); - } - public InputFile newDecryptingInputFile(String path, long length, ByteBuffer buffer) { - // TODO: is the length correct for the encrypted file? It may be the length of the plaintext - // stream - return em.decrypt(wrap(io.newInputFile(path, length), buffer)); + Preconditions.checkArgument( + length > 0, "Cannot safely decrypt table metadata file because its size is not specified"); + + InputFile inputFile = io.newInputFile(path, length); + + if (inputFile.getLength() != length) { Review Comment: > I don't see a test in TestGcmStreams so we should probably add one that validates truncated streams specifically. I'm not sure how. If a file is truncated by exactly 1 block, GCM Streams won't detect that. The upper layer (Avro or Json readers, etc) might detect that or might not, it's not guaranteed. That's why we've added an explicit [requirement](https://github.com/apache/iceberg/blob/main/format/gcm-stream-spec.md#file-length) in the GCM Stream spec to take the length from a trusted source. This is "out of band" from the spec point of view, meaning that we must make sure the length comes from a parent metadata (and not from the file system) everywhere in Iceberg where we decrypt a stream. However, `FileIO.newInputFile(path, length)` implementations are custom; some of them simply ignore the length parameter - and then indeed send a file system request upon `InputFile.getLength()` call. But we can prevent a security breach by making this check (`if (inputFile.getLength() != length)`). In most cases, this won't trigger a HEAD request, because most of FileIO implementations don't ignore the length parameter in `newInputFile(path, length)` and store it. For those few that do ignore it, we need to verify the file length wasn't truncated in the file system. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org