[ 
https://issues.apache.org/jira/browse/IMPALA-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Luksa reassigned IMPALA-9290:
-------------------------------------

    Assignee: Norbert Luksa

> ORC scanner should support schema evolution between date and timestamp types
> ----------------------------------------------------------------------------
>
>                 Key: IMPALA-9290
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9290
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.3.0
>            Reporter: Gabor Kaszab
>            Assignee: Norbert Luksa
>            Priority: Major
>              Labels: orc
>
> *This is the desired use case:*
> 1. Create an ORC table TBL1 with a DATE column.
> 2. Create an ORC table TBL2 with a TIMESTAMP column that has the same 
> location as TBL1.
> 3. Insert some DATE values into TBL1 and some TIMESTAMP values into TBL2.
> 4. select from TBL1 returns both DATE and TIMESTAMP values (converted to 
> DATE).
> 5. select from TBL2 returns both DATE and TIMESTAMPS values. The DATE values 
> are converted to TIMESTAMP.
> Without this feature Impala return an error:
> {code:java}
> ERROR: Type mismatch: table column DATE is map to column timestamp in ORC 
> file 'hdfs://localhost:20500/test-warehouse/orc_date_tbl/000000_0_copy_1'
> {code}
> *Note:*
> With https://issues.apache.org/jira/browse/IMPALA-8801 implementing Date type 
> for ORC it is possible to read date values in ORC format. However, writing is 
> still not supported and has to be done by Hive.
> *Let me copy-paste a code review comment from IMPALA-8801 as a suggestion for 
> the implementation:*
> We can modify OrcTimestampReader to support reading orc::TimestampVectorBatch 
> into Date type slots. In its constructor it knows which kind of slots 
> (timestamp or date) it's writting to. So in ReadValue() it can have different 
> behaviors based on different modes (timestamp values => timestamp slots / 
> timestamp values => date slots). We can do the same on OrcDateColumnReader to 
> let it support reading ORC Date values into Timestamp type slots.
> Note that the life cycle of a OrcColumnReader is within the life cycle of the 
> HdfsOrcScanner which only reads a split of an ORC file, and an ORC file can't 
> have two types for one column (e.g. column1 is timestamp in stripe1 and is 
> date in stripe2). So we don't need to deal with different batch types in 
> UpdateInputBatch().
> BTW, It'd be better to add test coverage for this type compactibility check 
> in test_scanners.py (See TestOrc.test_type_conversions).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to