rdblue commented on code in PR #9752:
URL: https://github.com/apache/iceberg/pull/9752#discussion_r1501943394
##########
core/src/main/java/org/apache/iceberg/encryption/AesGcmInputFile.java:
##########
@@ -20,39 +20,33 @@
import org.apache.iceberg.io.InputFile;
import org.apache.iceberg.io.SeekableInputStream;
-import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
public class AesGcmInputFile implements InputFile {
private final InputFile sourceFile;
private final byte[] dataKey;
private final byte[] fileAADPrefix;
private long plaintextLength;
+ /**
+ * Important: sourceFile.getLength() must return the verified plaintext
content length, not the
+ * physical file size after encryption. This protects against tampering with
the file size in
+ * untrusted storage systems.
+ */
public AesGcmInputFile(InputFile sourceFile, byte[] dataKey, byte[]
fileAADPrefix) {
this.sourceFile = sourceFile;
this.dataKey = dataKey;
this.fileAADPrefix = fileAADPrefix;
- this.plaintextLength = -1;
+ this.plaintextLength = sourceFile.getLength();
Review Comment:
I think the problem isn't just this path. It is with other valid uses of the
API.
For instance, if you write data into an encrypted Avro file and go to read
it, it is valid to open the Avro file without calling `newInputFile(DataFile)`.
You could use `newInputFile(String)` and that `InputFile` would return the file
length from the object store, which is the encrypted length. If the decryption
stream assumes here that the incoming `InputFile` reports the plaintext length,
there's no way to know or guarantee it.
I think this needs to assume that the incoming file uses the encrypted
length.
We still need to choose whether to store the plaintext length or the
encrypted length in table metadata. From your comment here, it looks like it
would be easier to store the plaintext length because that is what gets
returned by the writer. I think that makes sense, but we will need to be
specific in `EncryptingFileIO` what is passed through and where it needs to be
converted.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]