Matt McCline created HIVE-8197:
----------------------------------
Summary: Tez and Vectorization Insert into ORC Table with
timestamp column erroneously repeats the last row's column value
Key: HIVE-8197
URL: https://issues.apache.org/jira/browse/HIVE-8197
Project: Hive
Issue Type: Bug
Environment: Tez and Vectorization.
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
In diagnosing why a only(?) a Tez and Vectorized query with min and max
aggregates was always returning the last row read's column value, discovered
the problem was in creating the test table....
{code}
CREATE TABLE alltypesorc_string STORED AS ORC AS SELECT
ctinyint as ctinyint,
to_utc_timestamp(ctimestamp1, 'America/Los_Angeles') as ctimestamp1,
CAST(to_utc_timestamp(ctimestamp1, 'America/Los_Angeles') AS STRING) as
stimestamp1
FROM alltypesorc WHERE ctinyint > 0
LIMIT 40;
{code}
I think it is related what Prasanth mentioned as a possibility: Saving a
Timestamp as a Writable object that gets overwritten. One suspect is the
Writable[] records array in VectorFileSinkOperator in the ProcessOp method.
Or, perhaps it is in VectorReduceSinkOperator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)