[ https://issues.apache.org/jira/browse/IMPALA-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Csaba Ringhofer updated IMPALA-9578: ------------------------------------ Labels: parquet (was: ) > Read/write support for BINARY in Parquet > ---------------------------------------- > > Key: IMPALA-9578 > URL: https://issues.apache.org/jira/browse/IMPALA-9578 > Project: IMPALA > Issue Type: Sub-task > Components: Backend > Reporter: Csaba Ringhofer > Priority: Major > Labels: parquet > > In Parquet both STRING and BINARY are stored using the same physical type, > BYTE_ARRAY. > There is a String annotation among logical types > (https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#string), > which means UTF-8 encoding (and is of course ignored by Impala). > Both reading and writing should occur the same way as with STRING. > There is one potential difference to consider during writing: in ORC > BinaryStatistics has no min/max stats (StringStatistics has them). My guess > for the reason is that binary values are often very large and "random", so it > is likely for the stats to need a lot of space while never being used > successfully for filtering. Note that Parquet is a bit different with its > per-page statistics and can be potentially need even more space for stats. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org