saethlin opened a new issue #877:
URL: https://github.com/apache/arrow-rs/issues/877


   **Describe the bug**
   The parquet `bit_packing` module does misaligned reads through a raw 
pointer, and `murmur_hash2_64a` constructs a misaligned reference. It's unclear 
to me if these qualify as security problems. They're definitely UB, and the 
fact that the UB is hit in tests suggests it's quite easy to hit in real code, 
but I can't imagine how one could construct an exploit with these.
   
   ```
   running 470 tests
   test 
arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels ... 
error: Undefined Behavior: accessing memory with alignment 1, but alignment 4 
is required
       --> parquet/src/util/bit_packing.rs:150:12
        |
   150  |     *out = (*in_buf) % (1u32 << 2);
        |            ^^^^^^^^^ accessing memory with alignment 1, but alignment 
4 is required
        |
        = help: this indicates a bug in the program: it performed an invalid 
operation, and caused Undefined Behavior
        = help: see 
https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html 
for further information
   
        = note: inside `util::bit_packing::unpack2_32` at 
parquet/src/util/bit_packing.rs:150:12
   ```
   After patching up that:
   ```
   test column::writer::tests::test_byte_array_statistics ... error: Undefined 
Behavior: accessing memory with alignment 2, but alignment 8 is required
       --> 
/home/ben/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/raw.rs:95:14
        |
   95   |     unsafe { &*ptr::slice_from_raw_parts(data, len) }
        |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ accessing memory 
with alignment 2, but alignment 8 is required
        |
        = help: this indicates a bug in the program: it performed an invalid 
operation, and caused Undefined Behavior
        = help: see 
https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html 
for further information
   
        = note: inside `std::slice::from_raw_parts::<u64>` at 
/home/ben/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/raw.rs:95:14
   note: inside `util::hash_util::murmur_hash2_64a` at 
parquet/src/util/hash_util.rs:56:25
       --> parquet/src/util/hash_util.rs:56:25
        |
   56   |       let data_bytes_64 = std::slice::from_raw_parts(
        |  _________________________^
   57   | |         &data_bytes[0..len_64] as *const [u8] as *const u64,
   58   | |         len / 8,
   59   | |     );
        | |_____^
   note: inside `util::hash_util::hash_` at parquet/src/util/hash_util.rs:32:13
   ```
   
   For what it's worth, this comment:
   ```
   /// SAFTETY Only safe on platforms which support unaligned loads (like 
x86_64)
   ```
   is inaccurate. [The `ptr` type documents 
this](https://doc.rust-lang.org/1.56.0/std/primitive.pointer.html):
   > Working with raw pointers in Rust is uncommon, typically limited to a few 
patterns. Raw pointers can be unaligned or null. However, when a raw pointer is 
dereferenced (using the * operator), it must be non-null and aligned.
   
   Misaligned access through a pointer is UB on all platforms.
   
   **To Reproduce**
   `MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test` (this 
requires a fair bit of patience, like most projects this crate's tests were not 
designed with a super-slow interpreter in mind).
   
   **Expected behavior**
   Miri does not detect any UB.
   
   **Additional context**
   It seems like the first of these has been known for some time, but actually 
fixing it was punted on in 
https://github.com/apache/arrow-rs/commit/e98efae6c6548682f3dfeb44dac5b695aeefbd90.
   
   I suspect there is other UB. The code surrounding these cases follows 
similar patterns, but I'm not yet sure if miri doesn't detect anything else 
because the code is fine or because the bad cases aren't covered by tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to