Hello,
Using dictionary encoding, it is very easy to create a compression bomb
simply by setting bit width = 0. Then you can encode a virtually
infinite number of values in a constant (very small) data size. This is
an ideal payload for a potential denial of service, either through CPU
or memory exhaustion.
Looking at the dictionary encoder in Arrow C++, bit width == 0 is only
emitted when there are 0 physical values to encode. Do other encoders
have different policies? Would it be reasonable to state that bit width
== 0 is only allowed if there are zero physical values in the page?
Regards
Antoine.
- [DISCUSS] Is dictionary bit width == 0 allowed? Antoine Pitrou
-