be some good test files for the relevant repo, plus something in the spec
to say "this is possible".

On Wed, 4 Feb 2026 at 11:32, Andrew Lamb <[email protected]> wrote:

> I was thinking that the readers could use the files to test limits /
> internal controls (like limits on allocation sizes). The value would be for
> newer implementations which may not be aware of that these types of
> potential compression bombs exist in the wild.
>
> Andrew
>
> On Wed, Feb 4, 2026 at 5:52 AM Antoine Pitrou <[email protected]> wrote:
>
> >
> > Le 03/02/2026 à 19:03, Andrew Lamb a écrit :
> > > Since other parquet bombs are already known to exist (for example [1])
> > > perhaps the best we can do is to craft such a file and add it to
> > > parquet-testing to help readers test against it
> >
> > I guess we could do that (in this case I have a fuzz-generated file on
> > hand), however I'm not sure what "testing" could imply, unless readers
> > want to build in some kind of protection against compression bombs.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > >
> > > On Tue, Feb 3, 2026 at 10:17 AM Antoine Pitrou <[email protected]>
> > wrote:
> > >
> > >>
> > >> Ok, I see that unfortunately parquet-java can emit such data.
> > >>
> > >> Regards
> > >>
> > >> Antoine.
> > >>
> > >>
> > >> Le 03/02/2026 à 15:47, Antoine Pitrou a écrit :
> > >>>
> > >>> Hello,
> > >>>
> > >>> Using dictionary encoding, it is very easy to create a compression
> bomb
> > >>> simply by setting bit width = 0. Then you can encode a virtually
> > >>> infinite number of values in a constant (very small) data size. This
> is
> > >>> an ideal payload for a potential denial of service, either through
> CPU
> > >>> or memory exhaustion.
> > >>>
> > >>> Looking at the dictionary encoder in Arrow C++, bit width == 0 is
> only
> > >>> emitted when there are 0 physical values to encode. Do other encoders
> > >>> have different policies? Would it be reasonable to state that bit
> width
> > >>> == 0 is only allowed if there are zero physical values in the page?
> > >>>
> > >>> Regards
> > >>>
> > >>> Antoine.
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > >>
> > >
> >
> >
> >
>

Reply via email to