youichi-uda commented on PR #9869:
URL: https://github.com/apache/arrow-rs/pull/9869#issuecomment-4365643073

   Independent confirmation from a fresh cargo-fuzz harness on this same code 
path — flagging this here because it's exactly the kind of evidence 
apache/arrow-rs#5332 was set up to produce.
   
   **Setup**: I added a `arrow-ipc/fuzz/ipc_stream_reader` cargo-fuzz target as 
part of the fuzz infrastructure proposed in #5332 (branch 
[`fuzz/initial-harnesses`](https://github.com/masumi-ryugo/arrow-rs/tree/fuzz/initial-harnesses)
 on my fork). The harness just feeds `&[u8]` straight into 
`StreamReader::try_new(Cursor::new(data), None)` and iterates batches.
   
   **Pre-fix (current `main`, with no seed corpus, no dictionary)**:
   - libFuzzer hits an OOM in well under 60 seconds of run time.
   - Smallest crasher it produces is **4 bytes**: `[0x30, 0x22, 0x32, 0x2f]`. 
Decoded as little-endian i32, that's a `meta_len` of 791,814,704 (~755 MiB), 
which goes straight into `self.buf.resize(meta_len, 0)` before any short-read 
can surface.
   - This is a *different* trigger from the `[0x00, 0x1b, 0x00, 0x48]` 
regression test in this PR, but the same root cause and the same code path. Two 
distinct 4-byte inputs hitting the same OOM is a good sign the regression test 
isn't over-fitted.
   
   **Post-fix (this PR's code)**:
   - The 4-byte repro `[0x30, 0x22, 0x32, 0x2f]` exits in 0 ms with a 
`ParseError`, no allocation spike. ✓
   - 200,000 fuzz runs in 60 s under `-rss_limit_mb=512` (well below the 
libFuzzer default of 2.5 GiB): **0 OOMs, 0 crashes, peak RSS 121 MiB, 246 edges 
/ 288 features / 29 corpus entries.** Reasonable coverage even from an empty 
corpus, suggesting both `meta_len` and `bodyLength` paths are being exercised.
   
   So this PR cleanly defuses the entire class of "single u32 in the header 
drives a multi-GB allocation" for `StreamReader`, not just the one specific 
trigger in the regression test. Happy to keep a periodic libFuzzer run pointed 
at this once #5332 lands so we have a place to park future regressions like 
this in CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to