Hello,

We have been fuzzing the C++ Parquet reader for years as part of fuzzing
Arrow C++ on OSS-Fuzz (1). This has helped us find dozens of issues and
make the Parquet reader more robust against fringe cases, corrupt
or invalid files.

However, the fuzzing setup had remained relatively the same, despite the
Parquet reader accruing additional features and complexity.

Recently, my employer QuantStack secured some funding from the Sovereign
Tech Fund for various initiatives on the Arrow project (2). One of them
is to improve the fuzzing setup, and part of that is to improve the
Parquet fuzz target.

The work has already started and we have integrated a number of changes
to test more features and variations, and expand our seed corpus. For
example, we will now be able to fuzz the reading of Parquet encrypted
files (3).

We welcome any suggestions for further improvements on Parquet fuzzing.

Regards

Antoine.


(1) https://arrow.apache.org/docs/developers/cpp/fuzzing.html

(2)
https://medium.com/@QuantStack/sovereign-tech-agency-invests-in-apache-arrows-future-with-quantstack-d2f84c21c2cc

(3) https://github.com/apache/arrow/pull/48336


Reply via email to