data corruption with multi-table insert
---------------------------------------
Key: HIVE-1968
URL: https://issues.apache.org/jira/browse/HIVE-1968
Project: Hive
Issue Type: Bug
Components: Query Processor
Affects Versions: 0.7.0
Reporter: Joydeep Sen Sarma
i had to run a conversion process to compute a checksum (sum(hash(all-columns))
of a table and convert it to a different compression format. trying to be
clever - i did both of them in a single pass by doing something to the
equivalent of:
from (select col1, col2, hash(col1, col2) as val from table_to_be_converted) i
insert overwrite table table_to_be_generated select i.col1, i.col2
insert overwrite table table_to_be_converted_checksum select sum(hash(i.val));
the plan looked correct. however - the data produced was erroneous - the
checksums and the data were both wrong (and consistent with each other). i know
this because:
- the checksum computed by the above query didn't match the checksum on the
input table when calculated separately
- the checksum of the data output by this query (first insert clause) didn't
match the input table's checksum (neither the one computed by the query above,
nor by the one computed separately)
later on - i broke up this query into two independent ones - and the data and
checksums were good (ie. they all matched up). so seems like there's some data
corruption happening in MTI.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira