masumi-ryugo opened a new pull request, #9869:
URL: https://github.com/apache/arrow-rs/pull/9869
# Which issue does this PR close?
N/A — found via `cargo-fuzz` libFuzzer harness over `StreamReader::try_new`.
Happy to file a tracking issue first if maintainers prefer.
# Rationale for this change
`MessageReader::maybe_next` decodes the on-wire `meta_len` (after the
existing check that rejects negative values) and the FlatBuffer message's
`bodyLength`, and uses both directly for up-front allocations:
```rust
self.buf.resize(meta_len, 0); //
attacker-controlled
let mut buf = MutableBuffer::from_len_zeroed(message.bodyLength() as usize);
```
A 4-byte input — `00 1b 00 48` — claims a ~1.2 GiB metadata payload via
`meta_len = i32::from_le_bytes(...) = 0x4800_1b00`, driving a 1.2 GiB
`Vec::resize` before any short-read could fail. Roughly a 3×10⁸
amplification factor from input bytes to allocation; OOM-kills the
process on a 2 GiB-rss-limited fuzzer or a memory-constrained service.
Per `SECURITY.md` this is a bug, not a vulnerability (no RCE, no
information disclosure — only availability), but it is reachable from
the public `StreamReader::try_new` entrypoint and is the same shape of
bug that the recent `meta_len`-negative fix addressed.
# What changes are included in this PR?
- Read the metadata bytes via `(&mut
self.reader).take(meta_len).read_to_end(&mut self.buf)`
followed by an explicit length check, so the buffer grows as bytes
actually arrive instead of being eagerly resized to the (untrusted)
declared length.
- Add a `read_body_into_buffer` helper that fills a `MutableBuffer` in
64 KiB chunks via `extend_from_slice`. This preserves the cache-line
aligned allocation that downstream Arrow consumers rely on, while
keeping the high-water-mark allocation proportional to the bytes
actually delivered by the underlying reader.
- Validate `bodyLength` (`i64`) via `usize::try_from`, surfacing a
negative or out-of-range value as a `ParseError` instead of wrapping
silently into a huge `usize` on 64-bit and a different huge `usize`
on 32-bit.
- One regression test, `test_stream_reader_huge_meta_len_does_not_oom`,
that runs the 4-byte fuzzer repro through `StreamReader::try_new` and
asserts a clean `Err`.
# Are these changes tested?
Yes — new unit test as above. `cargo test -p arrow-ipc --release`
(112 unit + 17 integration tests), `cargo clippy -p arrow-ipc
--all-targets -- -D warnings`, and `cargo fmt --check` are all clean.
# Are there any user-facing changes?
Yes — malformed IPC streams that previously triggered a multi-GB
allocation now return an `ArrowError::ParseError` early. No behavior
change for well-formed streams; allocation is still cache-line aligned
and the final buffer shape (`MutableBuffer`) is unchanged.
# Reproducer
4 bytes:
```
0x00 0x1b 0x00 0x48
```
```rust
let bytes: [u8; 4] = [0x00, 0x1b, 0x00, 0x48];
let _ =
arrow_ipc::reader::StreamReader::try_new(std::io::Cursor::new(bytes), None);
```
Before this PR: a ~1.2 GiB allocation completes (or OOM-kills the
process under a memory limit) before `read_exact` discovers there are
0 bytes left and returns an EOF error.
After this PR: `Err(ArrowError::ParseError("Unexpected EOF reading
1207975168 bytes of message metadata, got 0"))`, with peak allocation
on the order of the 64 KiB read chunk plus a small flatbuffer scratch.
# Found via
`cargo-fuzz` libFuzzer harness wrapping `StreamReader::try_new`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]