I think the question Jianshi intended to ask is whether the null in {
'a': null } takes storage space, which I think is no. The key and value
parts of the map are treated as two separate columns in Parquet. So the
key 'a' takes space in the key column, while the value null doesn't take
space in the value column.
Cheng
On 3/7/15 1:28 AM, Ryan Blue wrote:
On 03/06/2015 12:27 AM, Jianshi Huang wrote:
Hi,
I understand that for columns where value is null, parquet will skip
it in
encoding, so it doesn't take storage space.
Does it also the case in a Map column? I think the key will always be
encoded even though values are null.
Is it correct?
That is correct. The keys must be encoded so that the map can be
reconstructed. { 'a': null } isn't the same as { } and we would have
to encode 'a' even if missing entries and null values are handled the
same in your application.
If you don't want to encode them, then you can remove the keys for
null values from your map before you store the map.
rb