omertt27 opened a new pull request, #50154:
URL: https://github.com/apache/arrow/pull/50154
### Rationale for this change
When the `CappedMemoryPool` (2.2 GB limit, added in GH-48105) triggers an
OOM during encoding roundtrip verification, the resulting `Status::OutOfMemory`
propagates back to call sites that used `ARROW_CHECK_OK(...)` or
`.ValueOrDie()`. Both of these expand to `ARROW_LOG(FATAL)` → `std::abort()`,
which is not an exception — `BEGIN_PARQUET_CATCH_EXCEPTIONS` cannot intercept
it. OSS-Fuzz then sees a process crash instead of a resource-limit event.
### What changes are included in this PR?
Added `FuzzCheckOk(Status)` in the anonymous namespace of
`fuzz_encoding_internal.cc`:
```cpp
// OOM during fuzzing is an expected soft failure; any other non-OK status
// indicates a real bug and should abort so OSS-Fuzz can report it.
Status FuzzCheckOk(const Status& st) {
if (st.IsOutOfMemory()) return st;
ARROW_CHECK_OK(st);
return Status::OK();
}
```
Six call sites in `TypedFuzzEncoding::Fuzz()` replaced with
`RETURN_NOT_OK(FuzzCheckOk(...))`:
- `reference_array_->ValidateFull()`
- `DecodeArrow(...).ValueOrDie()` (replaced with status check +
`std::move(*result)`)
- `array->ValidateFull()` (on roundtrip result)
- `CompareAgainstReference(array)`
- `RunOnDecodedChunks(...)` × 2 (both `arrow_supported()` and non-Arrow
branches)
Three invariant checks intentionally left as hard aborts — they indicate
actual decoder/encoder bugs, not resource limits:
- `ARROW_CHECK_LE(values_read, read_size)` — decoder returning more values
than requested
- `ARROW_CHECK_EQ(acc.chunks.size(), 0)` — BinaryBuilder invariant
- `ARROW_CHECK_EQ(offset, total_data_size)` — byte count invariant in
`MakeArrow`
### Are these changes tested?
The OOM path is exercised by `fuzzing_memory_pool()` in `fuzz_internal.cc`
whenever the cumulative allocation exceeds the 2.2 GB cap. The existing
`parquet-encoding-test` suite covers the non-OOM code paths.
Closes #50149
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]