Since other parquet bombs are already known to exist (for example [1])
perhaps the best we can do is to craft such a file and add it to
parquet-testing to help readers test against it

Andrew

[1]:
https://duckdb.org/2024/03/26/42-parquet-a-zip-bomb-for-the-big-data-age

On Tue, Feb 3, 2026 at 10:17 AM Antoine Pitrou <[email protected]> wrote:

>
> Ok, I see that unfortunately parquet-java can emit such data.
>
> Regards
>
> Antoine.
>
>
> Le 03/02/2026 à 15:47, Antoine Pitrou a écrit :
> >
> > Hello,
> >
> > Using dictionary encoding, it is very easy to create a compression bomb
> > simply by setting bit width = 0. Then you can encode a virtually
> > infinite number of values in a constant (very small) data size. This is
> > an ideal payload for a potential denial of service, either through CPU
> > or memory exhaustion.
> >
> > Looking at the dictionary encoder in Arrow C++, bit width == 0 is only
> > emitted when there are 0 physical values to encode. Do other encoders
> > have different policies? Would it be reasonable to state that bit width
> > == 0 is only allowed if there are zero physical values in the page?
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
>
>
>

Reply via email to