masumi-ryugo opened a new pull request, #9869:
URL: https://github.com/apache/arrow-rs/pull/9869

   # Which issue does this PR close?
   
   N/A — found via `cargo-fuzz` libFuzzer harness over `StreamReader::try_new`.
   Happy to file a tracking issue first if maintainers prefer.
   
   # Rationale for this change
   
   `MessageReader::maybe_next` decodes the on-wire `meta_len` (after the
   existing check that rejects negative values) and the FlatBuffer message's
   `bodyLength`, and uses both directly for up-front allocations:
   
   ```rust
   self.buf.resize(meta_len, 0);                                    // 
attacker-controlled
   let mut buf = MutableBuffer::from_len_zeroed(message.bodyLength() as usize);
   ```
   
   A 4-byte input — `00 1b 00 48` — claims a ~1.2 GiB metadata payload via
   `meta_len = i32::from_le_bytes(...) = 0x4800_1b00`, driving a 1.2 GiB
   `Vec::resize` before any short-read could fail. Roughly a 3×10⁸
   amplification factor from input bytes to allocation; OOM-kills the
   process on a 2 GiB-rss-limited fuzzer or a memory-constrained service.
   
   Per `SECURITY.md` this is a bug, not a vulnerability (no RCE, no
   information disclosure — only availability), but it is reachable from
   the public `StreamReader::try_new` entrypoint and is the same shape of
   bug that the recent `meta_len`-negative fix addressed.
   
   # What changes are included in this PR?
   
   - Read the metadata bytes via `(&mut 
self.reader).take(meta_len).read_to_end(&mut self.buf)`
     followed by an explicit length check, so the buffer grows as bytes
     actually arrive instead of being eagerly resized to the (untrusted)
     declared length.
   - Add a `read_body_into_buffer` helper that fills a `MutableBuffer` in
     64 KiB chunks via `extend_from_slice`. This preserves the cache-line
     aligned allocation that downstream Arrow consumers rely on, while
     keeping the high-water-mark allocation proportional to the bytes
     actually delivered by the underlying reader.
   - Validate `bodyLength` (`i64`) via `usize::try_from`, surfacing a
     negative or out-of-range value as a `ParseError` instead of wrapping
     silently into a huge `usize` on 64-bit and a different huge `usize`
     on 32-bit.
   - One regression test, `test_stream_reader_huge_meta_len_does_not_oom`,
     that runs the 4-byte fuzzer repro through `StreamReader::try_new` and
     asserts a clean `Err`.
   
   # Are these changes tested?
   
   Yes — new unit test as above. `cargo test -p arrow-ipc --release`
   (112 unit + 17 integration tests), `cargo clippy -p arrow-ipc
   --all-targets -- -D warnings`, and `cargo fmt --check` are all clean.
   
   # Are there any user-facing changes?
   
   Yes — malformed IPC streams that previously triggered a multi-GB
   allocation now return an `ArrowError::ParseError` early. No behavior
   change for well-formed streams; allocation is still cache-line aligned
   and the final buffer shape (`MutableBuffer`) is unchanged.
   
   # Reproducer
   
   4 bytes:
   
   ```
   0x00 0x1b 0x00 0x48
   ```
   
   ```rust
   let bytes: [u8; 4] = [0x00, 0x1b, 0x00, 0x48];
   let _ = 
arrow_ipc::reader::StreamReader::try_new(std::io::Cursor::new(bytes), None);
   ```
   
   Before this PR: a ~1.2 GiB allocation completes (or OOM-kills the
   process under a memory limit) before `read_exact` discovers there are
   0 bytes left and returns an EOF error.
   
   After this PR: `Err(ArrowError::ParseError("Unexpected EOF reading
   1207975168 bytes of message metadata, got 0"))`, with peak allocation
   on the order of the 64 KiB read chunk plus a small flatbuffer scratch.
   
   # Found via
   
   `cargo-fuzz` libFuzzer harness wrapping `StreamReader::try_new`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to