[ https://issues.apache.org/jira/browse/PARQUET-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gang Wu updated PARQUET-2221: ----------------------------- Fix Version/s: (was: format-2.10.0) > [Format] Encoding spec incorrect for dictionary fallback > -------------------------------------------------------- > > Key: PARQUET-2221 > URL: https://issues.apache.org/jira/browse/PARQUET-2221 > Project: Parquet > Issue Type: Bug > Components: parquet-format > Reporter: Antoine Pitrou > Priority: Critical > > The spec for DICTIONARY_ENCODING states that: > bq. If the dictionary grows too big, whether in size or number of distinct > values, the encoding will fall back to the plain encoding. > https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8 > However, the parquet-mr implementation was deliberately changed to a > different fallback mechanism in > https://issues.apache.org/jira/browse/PARQUET-52 > I'm assuming the parquet-mr implementation is authoritative here. But then > the spec is incorrect and should be fixed to reflect expected behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)