Jinpeng Zhou created PARQUET-2323:
-------------------------------------
Summary: Use bit vector to store Prebuffered column chunk index
Key: PARQUET-2323
URL: https://issues.apache.org/jira/browse/PARQUET-2323
Project: Parquet
Issue Type: Improvement
Components: parquet-cpp
Reporter: Jinpeng Zhou
Fix For: cpp-13.0.0
In https://issues.apache.org/jira/browse/PARQUET-2316 we allow partial buffer
in parquet File Reader by storing prebuffered column chunk index in a hash set,
and make a copy of this hash set for each rowgroup reader
In extreme conditions where numerous columns are prebuffered and multiple
rowgroup readers are created for the same row group , the hash set would incur
significant overhead.
Using bit vector would be a reasonsable mitigation, taking 4KB for 32K columns.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)