youichi-uda commented on PR #9869: URL: https://github.com/apache/arrow-rs/pull/9869#issuecomment-4365643073
Independent confirmation from a fresh cargo-fuzz harness on this same code path — flagging this here because it's exactly the kind of evidence apache/arrow-rs#5332 was set up to produce. **Setup**: I added a `arrow-ipc/fuzz/ipc_stream_reader` cargo-fuzz target as part of the fuzz infrastructure proposed in #5332 (branch [`fuzz/initial-harnesses`](https://github.com/masumi-ryugo/arrow-rs/tree/fuzz/initial-harnesses) on my fork). The harness just feeds `&[u8]` straight into `StreamReader::try_new(Cursor::new(data), None)` and iterates batches. **Pre-fix (current `main`, with no seed corpus, no dictionary)**: - libFuzzer hits an OOM in well under 60 seconds of run time. - Smallest crasher it produces is **4 bytes**: `[0x30, 0x22, 0x32, 0x2f]`. Decoded as little-endian i32, that's a `meta_len` of 791,814,704 (~755 MiB), which goes straight into `self.buf.resize(meta_len, 0)` before any short-read can surface. - This is a *different* trigger from the `[0x00, 0x1b, 0x00, 0x48]` regression test in this PR, but the same root cause and the same code path. Two distinct 4-byte inputs hitting the same OOM is a good sign the regression test isn't over-fitted. **Post-fix (this PR's code)**: - The 4-byte repro `[0x30, 0x22, 0x32, 0x2f]` exits in 0 ms with a `ParseError`, no allocation spike. ✓ - 200,000 fuzz runs in 60 s under `-rss_limit_mb=512` (well below the libFuzzer default of 2.5 GiB): **0 OOMs, 0 crashes, peak RSS 121 MiB, 246 edges / 288 features / 29 corpus entries.** Reasonable coverage even from an empty corpus, suggesting both `meta_len` and `bodyLength` paths are being exercised. So this PR cleanly defuses the entire class of "single u32 in the header drives a multi-GB allocation" for `StreamReader`, not just the one specific trigger in the regression test. Happy to keep a periodic libFuzzer run pointed at this once #5332 lands so we have a place to park future regressions like this in CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
