I was thinking that the readers could use the files to test limits /
internal controls (like limits on allocation sizes). The value would be for
newer implementations which may not be aware of that these types of
potential compression bombs exist in the wild.

Andrew

On Wed, Feb 4, 2026 at 5:52 AM Antoine Pitrou <[email protected]> wrote:

>
> Le 03/02/2026 à 19:03, Andrew Lamb a écrit :
> > Since other parquet bombs are already known to exist (for example [1])
> > perhaps the best we can do is to craft such a file and add it to
> > parquet-testing to help readers test against it
>
> I guess we could do that (in this case I have a fuzz-generated file on
> hand), however I'm not sure what "testing" could imply, unless readers
> want to build in some kind of protection against compression bombs.
>
> Regards
>
> Antoine.
>
>
> >
> > On Tue, Feb 3, 2026 at 10:17 AM Antoine Pitrou <[email protected]>
> wrote:
> >
> >>
> >> Ok, I see that unfortunately parquet-java can emit such data.
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >> Le 03/02/2026 à 15:47, Antoine Pitrou a écrit :
> >>>
> >>> Hello,
> >>>
> >>> Using dictionary encoding, it is very easy to create a compression bomb
> >>> simply by setting bit width = 0. Then you can encode a virtually
> >>> infinite number of values in a constant (very small) data size. This is
> >>> an ideal payload for a potential denial of service, either through CPU
> >>> or memory exhaustion.
> >>>
> >>> Looking at the dictionary encoder in Arrow C++, bit width == 0 is only
> >>> emitted when there are 0 physical values to encode. Do other encoders
> >>> have different policies? Would it be reasonable to state that bit width
> >>> == 0 is only allowed if there are zero physical values in the page?
> >>>
> >>> Regards
> >>>
> >>> Antoine.
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >
>
>
>

Reply via email to