Le 03/02/2026 à 19:03, Andrew Lamb a écrit :
Since other parquet bombs are already known to exist (for example [1])
perhaps the best we can do is to craft such a file and add it to
parquet-testing to help readers test against it
I guess we could do that (in this case I have a fuzz-generated file on
hand), however I'm not sure what "testing" could imply, unless readers
want to build in some kind of protection against compression bombs.
Regards
Antoine.
On Tue, Feb 3, 2026 at 10:17 AM Antoine Pitrou <[email protected]> wrote:
Ok, I see that unfortunately parquet-java can emit such data.
Regards
Antoine.
Le 03/02/2026 à 15:47, Antoine Pitrou a écrit :
Hello,
Using dictionary encoding, it is very easy to create a compression bomb
simply by setting bit width = 0. Then you can encode a virtually
infinite number of values in a constant (very small) data size. This is
an ideal payload for a potential denial of service, either through CPU
or memory exhaustion.
Looking at the dictionary encoder in Arrow C++, bit width == 0 is only
emitted when there are 0 physical values to encode. Do other encoders
have different policies? Would it be reasonable to state that bit width
== 0 is only allowed if there are zero physical values in the page?
Regards
Antoine.