Ok, I see that unfortunately parquet-java can emit such data. Regards Antoine. Le 03/02/2026 à 15:47, Antoine Pitrou a écrit :
Hello, Using dictionary encoding, it is very easy to create a compression bomb simply by setting bit width = 0. Then you can encode a virtually infinite number of values in a constant (very small) data size. This is an ideal payload for a potential denial of service, either through CPU or memory exhaustion. Looking at the dictionary encoder in Arrow C++, bit width == 0 is only emitted when there are 0 physical values to encode. Do other encoders have different policies? Would it be reasonable to state that bit width == 0 is only allowed if there are zero physical values in the page? Regards Antoine.
