ggershinsky commented on code in PR #5544:
URL: https://github.com/apache/iceberg/pull/5544#discussion_r1699473244


##########
api/src/main/java/org/apache/iceberg/encryption/EncryptingFileIO.java:
##########
@@ -109,14 +111,19 @@ public InputFile newInputFile(ManifestFile manifest) {
     }
   }
 
-  public InputFile newDecryptingInputFile(String path, ByteBuffer buffer) {
-    return em.decrypt(wrap(io.newInputFile(path), buffer));
-  }
-
   public InputFile newDecryptingInputFile(String path, long length, ByteBuffer 
buffer) {
-    // TODO: is the length correct for the encrypted file? It may be the 
length of the plaintext
-    // stream
-    return em.decrypt(wrap(io.newInputFile(path, length), buffer));
+    Preconditions.checkArgument(
+        length > 0, "Cannot safely decrypt table metadata file because its 
size is not specified");
+
+    InputFile inputFile = io.newInputFile(path, length);
+
+    if (inputFile.getLength() != length) {

Review Comment:
   >  I don't see a test in TestGcmStreams so we should probably add one that 
validates truncated streams specifically.
   
   I'm not sure how. If a file is truncated by exactly 1 block, GCM Streams 
won't detect that. The upper layer (Avro or Json readers, etc) might detect 
that or might not, it's not guaranteed. That's why we've added an explicit 
[requirement](https://github.com/apache/iceberg/blob/main/format/gcm-stream-spec.md#file-length)
 in the GCM Stream spec to take the length from a trusted source. This is "out 
of band" from the spec point of view, meaning that we must make sure the length 
comes from a parent metadata (and not from the file system) everywhere in 
Iceberg where we decrypt a stream.
   
   However, `FileIO.newInputFile(path, length)` implementations are custom; 
some of them simply ignore the length parameter - and then indeed send a file 
system request upon `InputFile.getLength()` call. But we can prevent a security 
breach by making this check (`if (inputFile.getLength() != length)`). In most 
cases, this won't trigger a HEAD request, because most of FileIO 
implementations don't ignore the length parameter in `newInputFile(path, 
length)`. For those few that do, we need to verify the file length wasn't 
truncated in the file system.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to