alamb commented on code in PR #9583:
URL: https://github.com/apache/arrow-rs/pull/9583#discussion_r2966233871
##########
parquet/src/compression.rs:
##########
@@ -231,10 +231,23 @@ mod snappy_codec {
None => decompress_len(input_buf)?,
};
let offset = output_buf.len();
- output_buf.resize(offset + len, 0);
- self.decoder
- .decompress(input_buf, &mut output_buf[offset..])
- .map_err(|e| e.into())
+ output_buf.reserve(len);
+ // SAFETY: we pass the spare capacity to snappy which will write
exactly
+ // `len` bytes on success. The `set_len` below is only reached when
+ // decompression succeeds. `MaybeUninit<u8>` has the same layout
as `u8`.
+ let spare = output_buf.spare_capacity_mut();
+ let spare_bytes = unsafe {
+
std::slice::from_raw_parts_mut(spare.as_mut_ptr().cast::<u8>(), spare.len())
+ };
+ let n = self
+ .decoder
+ .decompress(input_buf, &mut spare_bytes[..len])
+ .map_err(|e| -> ParquetError { e.into() })?;
Review Comment:
If this returns on error before setting len, will the buffer be left in an
inconsistent state?
I think the use of the mut slice ensures that the call to decompress won't
overwrite the newly allocated bytes.
However, this also basically passes in uninitialized bytes to decompress --
how do we know that the decompress doesn't read them? Maybe we should add a
SAFETY warning to the decompress API that says it can't rely on initialized
bytes 🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]