Hello,
We have been fuzzing the C++ Parquet reader for years as part of fuzzing Arrow C++ on OSS-Fuzz (1). This has helped us find dozens of issues and make the Parquet reader more robust against fringe cases, corrupt or invalid files. However, the fuzzing setup had remained relatively the same, despite the Parquet reader accruing additional features and complexity. Recently, my employer QuantStack secured some funding from the Sovereign Tech Fund for various initiatives on the Arrow project (2). One of them is to improve the fuzzing setup, and part of that is to improve the Parquet fuzz target. The work has already started and we have integrated a number of changes to test more features and variations, and expand our seed corpus. For example, we will now be able to fuzz the reading of Parquet encrypted files (3). We welcome any suggestions for further improvements on Parquet fuzzing. Regards Antoine. (1) https://arrow.apache.org/docs/developers/cpp/fuzzing.html (2) https://medium.com/@QuantStack/sovereign-tech-agency-invests-in-apache-arrows-future-with-quantstack-d2f84c21c2cc (3) https://github.com/apache/arrow/pull/48336
