[ https://issues.apache.org/jira/browse/PARQUET-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabor Szadovszky reassigned PARQUET-2256: ----------------------------------------- Assignee: Xuwei Fu > Adding Compression for BloomFilter > ---------------------------------- > > Key: PARQUET-2256 > URL: https://issues.apache.org/jira/browse/PARQUET-2256 > Project: Parquet > Issue Type: Improvement > Components: parquet-format > Affects Versions: format-2.9.0 > Reporter: Xuwei Fu > Assignee: Xuwei Fu > Priority: Major > > In Current Parquet implementions, if BloomFilter doesn't set the ndv, most > implementions will guess the 1M as the ndv. And use it for fpp. So, if fpp is > 0.01, the BloomFilter size may grows to 2M for each column, which is really > huge. Should we support compression for BloomFilter, like: > > ``` > /** > * The compression used in the Bloom filter. > **/ > struct Uncompressed {} > union BloomFilterCompression { > 1: Uncompressed UNCOMPRESSED; > +2: CompressionCodec COMPRESSION; > } > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)