saethlin opened a new issue #877:
URL: https://github.com/apache/arrow-rs/issues/877
**Describe the bug**
The parquet `bit_packing` module does misaligned reads through a raw
pointer, and `murmur_hash2_64a` constructs a misaligned reference. It's unclear
to me if these qualify as security problems. They're definitely UB, and the
fact that the UB is hit in tests suggests it's quite easy to hit in real code,
but I can't imagine how one could construct an exploit with these.
```
running 470 tests
test
arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels ...
error: Undefined Behavior: accessing memory with alignment 1, but alignment 4
is required
--> parquet/src/util/bit_packing.rs:150:12
|
150 | *out = (*in_buf) % (1u32 << 2);
| ^^^^^^^^^ accessing memory with alignment 1, but alignment
4 is required
|
= help: this indicates a bug in the program: it performed an invalid
operation, and caused Undefined Behavior
= help: see
https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html
for further information
= note: inside `util::bit_packing::unpack2_32` at
parquet/src/util/bit_packing.rs:150:12
```
After patching up that:
```
test column::writer::tests::test_byte_array_statistics ... error: Undefined
Behavior: accessing memory with alignment 2, but alignment 8 is required
-->
/home/ben/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/raw.rs:95:14
|
95 | unsafe { &*ptr::slice_from_raw_parts(data, len) }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ accessing memory
with alignment 2, but alignment 8 is required
|
= help: this indicates a bug in the program: it performed an invalid
operation, and caused Undefined Behavior
= help: see
https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html
for further information
= note: inside `std::slice::from_raw_parts::<u64>` at
/home/ben/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/raw.rs:95:14
note: inside `util::hash_util::murmur_hash2_64a` at
parquet/src/util/hash_util.rs:56:25
--> parquet/src/util/hash_util.rs:56:25
|
56 | let data_bytes_64 = std::slice::from_raw_parts(
| _________________________^
57 | | &data_bytes[0..len_64] as *const [u8] as *const u64,
58 | | len / 8,
59 | | );
| |_____^
note: inside `util::hash_util::hash_` at parquet/src/util/hash_util.rs:32:13
```
For what it's worth, this comment:
```
/// SAFTETY Only safe on platforms which support unaligned loads (like
x86_64)
```
is inaccurate. [The `ptr` type documents
this](https://doc.rust-lang.org/1.56.0/std/primitive.pointer.html):
> Working with raw pointers in Rust is uncommon, typically limited to a few
patterns. Raw pointers can be unaligned or null. However, when a raw pointer is
dereferenced (using the * operator), it must be non-null and aligned.
Misaligned access through a pointer is UB on all platforms.
**To Reproduce**
`MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test` (this
requires a fair bit of patience, like most projects this crate's tests were not
designed with a super-slow interpreter in mind).
**Expected behavior**
Miri does not detect any UB.
**Additional context**
It seems like the first of these has been known for some time, but actually
fixing it was punted on in
https://github.com/apache/arrow-rs/commit/e98efae6c6548682f3dfeb44dac5b695aeefbd90.
I suspect there is other UB. The code surrounding these cases follows
similar patterns, but I'm not yet sure if miri doesn't detect anything else
because the code is fine or because the bad cases aren't covered by tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]