yordan-pavlov commented on issue #171: URL: https://github.com/apache/arrow-rs/issues/171#issuecomment-991960434
@tustvold I should have read the blog post you linked earlier (https://arrow.apache.org/blog/2019/09/05/faster-strings-cpp-parquet/) before commenting; it appears that the C++ implementation of the arrow parquet reader converts plain-encoded fallback pages into a dictionary similar to the latest approach you described: > When decoding a ColumnChunk, we first append the dictionary values and indices into an Arrow DictionaryBuilder, and when we encounter the “fall back” portion we use a hash table to convert those values to dictionary-encoded form -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
