Jinpeng Zhou created PARQUET-2323:
-------------------------------------

             Summary: Use bit vector to store Prebuffered column chunk index
                 Key: PARQUET-2323
                 URL: https://issues.apache.org/jira/browse/PARQUET-2323
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-cpp
            Reporter: Jinpeng Zhou
             Fix For: cpp-13.0.0


In https://issues.apache.org/jira/browse/PARQUET-2316 we allow partial buffer 
in parquet File Reader by storing prebuffered column chunk index in a hash set, 
and make a copy of this hash set for each rowgroup reader

In extreme conditions where numerous columns are prebuffered and multiple 
rowgroup readers are created for the same row group , the hash set would incur 
significant overhead. 

Using bit vector would be a reasonsable mitigation, taking 4KB for 32K columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to