Ok, I see that unfortunately parquet-java can emit such data.

Regards

Antoine.


Le 03/02/2026 à 15:47, Antoine Pitrou a écrit :

Hello,

Using dictionary encoding, it is very easy to create a compression bomb
simply by setting bit width = 0. Then you can encode a virtually
infinite number of values in a constant (very small) data size. This is
an ideal payload for a potential denial of service, either through CPU
or memory exhaustion.

Looking at the dictionary encoder in Arrow C++, bit width == 0 is only
emitted when there are 0 physical values to encode. Do other encoders
have different policies? Would it be reasonable to state that bit width
== 0 is only allowed if there are zero physical values in the page?

Regards

Antoine.





Reply via email to