be some good test files for the relevant repo, plus something in the spec to say "this is possible".
On Wed, 4 Feb 2026 at 11:32, Andrew Lamb <[email protected]> wrote: > I was thinking that the readers could use the files to test limits / > internal controls (like limits on allocation sizes). The value would be for > newer implementations which may not be aware of that these types of > potential compression bombs exist in the wild. > > Andrew > > On Wed, Feb 4, 2026 at 5:52 AM Antoine Pitrou <[email protected]> wrote: > > > > > Le 03/02/2026 à 19:03, Andrew Lamb a écrit : > > > Since other parquet bombs are already known to exist (for example [1]) > > > perhaps the best we can do is to craft such a file and add it to > > > parquet-testing to help readers test against it > > > > I guess we could do that (in this case I have a fuzz-generated file on > > hand), however I'm not sure what "testing" could imply, unless readers > > want to build in some kind of protection against compression bombs. > > > > Regards > > > > Antoine. > > > > > > > > > > On Tue, Feb 3, 2026 at 10:17 AM Antoine Pitrou <[email protected]> > > wrote: > > > > > >> > > >> Ok, I see that unfortunately parquet-java can emit such data. > > >> > > >> Regards > > >> > > >> Antoine. > > >> > > >> > > >> Le 03/02/2026 à 15:47, Antoine Pitrou a écrit : > > >>> > > >>> Hello, > > >>> > > >>> Using dictionary encoding, it is very easy to create a compression > bomb > > >>> simply by setting bit width = 0. Then you can encode a virtually > > >>> infinite number of values in a constant (very small) data size. This > is > > >>> an ideal payload for a potential denial of service, either through > CPU > > >>> or memory exhaustion. > > >>> > > >>> Looking at the dictionary encoder in Arrow C++, bit width == 0 is > only > > >>> emitted when there are 0 physical values to encode. Do other encoders > > >>> have different policies? Would it be reasonable to state that bit > width > > >>> == 0 is only allowed if there are zero physical values in the page? > > >>> > > >>> Regards > > >>> > > >>> Antoine. > > >>> > > >>> > > >>> > > >> > > >> > > >> > > > > > > > > > >
