[ https://issues.apache.org/jira/browse/IMPALA-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Norbert Luksa reassigned IMPALA-9290: ------------------------------------- Assignee: Norbert Luksa > ORC scanner should support schema evolution between date and timestamp types > ---------------------------------------------------------------------------- > > Key: IMPALA-9290 > URL: https://issues.apache.org/jira/browse/IMPALA-9290 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 3.3.0 > Reporter: Gabor Kaszab > Assignee: Norbert Luksa > Priority: Major > Labels: orc > > *This is the desired use case:* > 1. Create an ORC table TBL1 with a DATE column. > 2. Create an ORC table TBL2 with a TIMESTAMP column that has the same > location as TBL1. > 3. Insert some DATE values into TBL1 and some TIMESTAMP values into TBL2. > 4. select from TBL1 returns both DATE and TIMESTAMP values (converted to > DATE). > 5. select from TBL2 returns both DATE and TIMESTAMPS values. The DATE values > are converted to TIMESTAMP. > Without this feature Impala return an error: > {code:java} > ERROR: Type mismatch: table column DATE is map to column timestamp in ORC > file 'hdfs://localhost:20500/test-warehouse/orc_date_tbl/000000_0_copy_1' > {code} > *Note:* > With https://issues.apache.org/jira/browse/IMPALA-8801 implementing Date type > for ORC it is possible to read date values in ORC format. However, writing is > still not supported and has to be done by Hive. > *Let me copy-paste a code review comment from IMPALA-8801 as a suggestion for > the implementation:* > We can modify OrcTimestampReader to support reading orc::TimestampVectorBatch > into Date type slots. In its constructor it knows which kind of slots > (timestamp or date) it's writting to. So in ReadValue() it can have different > behaviors based on different modes (timestamp values => timestamp slots / > timestamp values => date slots). We can do the same on OrcDateColumnReader to > let it support reading ORC Date values into Timestamp type slots. > Note that the life cycle of a OrcColumnReader is within the life cycle of the > HdfsOrcScanner which only reads a split of an ORC file, and an ORC file can't > have two types for one column (e.g. column1 is timestamp in stripe1 and is > date in stripe2). So we don't need to deal with different batch types in > UpdateInputBatch(). > BTW, It'd be better to add test coverage for this type compactibility check > in test_scanners.py (See TestOrc.test_type_conversions). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org