Since other parquet bombs are already known to exist (for example [1]) perhaps the best we can do is to craft such a file and add it to parquet-testing to help readers test against it
Andrew [1]: https://duckdb.org/2024/03/26/42-parquet-a-zip-bomb-for-the-big-data-age On Tue, Feb 3, 2026 at 10:17 AM Antoine Pitrou <[email protected]> wrote: > > Ok, I see that unfortunately parquet-java can emit such data. > > Regards > > Antoine. > > > Le 03/02/2026 à 15:47, Antoine Pitrou a écrit : > > > > Hello, > > > > Using dictionary encoding, it is very easy to create a compression bomb > > simply by setting bit width = 0. Then you can encode a virtually > > infinite number of values in a constant (very small) data size. This is > > an ideal payload for a potential denial of service, either through CPU > > or memory exhaustion. > > > > Looking at the dictionary encoder in Arrow C++, bit width == 0 is only > > emitted when there are 0 physical values to encode. Do other encoders > > have different policies? Would it be reasonable to state that bit width > > == 0 is only allowed if there are zero physical values in the page? > > > > Regards > > > > Antoine. > > > > > > > > >
