alamb commented on code in PR #9583:
URL: https://github.com/apache/arrow-rs/pull/9583#discussion_r2966233871


##########
parquet/src/compression.rs:
##########
@@ -231,10 +231,23 @@ mod snappy_codec {
                 None => decompress_len(input_buf)?,
             };
             let offset = output_buf.len();
-            output_buf.resize(offset + len, 0);
-            self.decoder
-                .decompress(input_buf, &mut output_buf[offset..])
-                .map_err(|e| e.into())
+            output_buf.reserve(len);
+            // SAFETY: we pass the spare capacity to snappy which will write 
exactly
+            // `len` bytes on success. The `set_len` below is only reached when
+            // decompression succeeds. `MaybeUninit<u8>` has the same layout 
as `u8`.
+            let spare = output_buf.spare_capacity_mut();
+            let spare_bytes = unsafe {
+                
std::slice::from_raw_parts_mut(spare.as_mut_ptr().cast::<u8>(), spare.len())
+            };
+            let n = self
+                .decoder
+                .decompress(input_buf, &mut spare_bytes[..len])
+                .map_err(|e| -> ParquetError { e.into() })?;

Review Comment:
   If this returns on error before setting len, will the buffer be left in an 
inconsistent state?
   
   I think the use of the mut slice ensures that  the call to decompress won't 
overwrite the newly allocated bytes. 
   
   However, this also basically passes in uninitialized bytes to decompress  -- 
how do we know that the decompress doesn't read them? Maybe we should add a 
SAFETY warning to the decompress API that says it can't rely on initialized 
bytes 🤔 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to