LOAD DATA IF NOT EXISTS functionality
-------------------------------------
Key: HIVE-2889
URL: https://issues.apache.org/jira/browse/HIVE-2889
Project: Hive
Issue Type: Improvement
Components: Import/Export
Affects Versions: 0.8.1
Reporter: Sean McNamara
Fix For: 0.9.0
*Background:*
The behavior of LOAD DATA LOCAL INPATH has changed. It used to give you an
error when trying to copy in a log that already existed. Now it re-names the
file with copy_1 so the file always goes into hdfs.
*Original discussion:*
http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCB8D2849.14F69%25sean.mcnamara%40webtrends.com%3E
*Issue:*
There is no longer an atomic way to insert files into hive and guarantee that
the file won't go in twice. Using OVERWRITE will cause other logs in the
table/partition to be deleted.
*Example:*
{{usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_a.bz2' INTO TABLE
logs PARTITION(ds='2012-03-19', hr='23')"
/usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE
logs PARTITION(ds='2012-03-19', hr='23')"
/usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE
logs PARTITION(ds='2012-03-19', hr='23')"
/usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE
logs PARTITION(ds='2012-03-19', hr='23')"}}
*Result:*
{{test_a.bz2
test_b.bz2
test_b_copy_1.bz2
test_b_copy_2.bz2}}
_test_b data was inserted 3 times, which is not the desired behavior in this
instance._
*Proposal:*
Add _IF NOT EXISTS_ flag to indicate copy semantics. If the the log file does
not exist in the table/partition, the log would go in normally. If the log
does exist in the table/partition hive would return an error and return an exit
code.
*Proposed HiveQL Example:*
{{LOAD DATA LOCAL IF NOT EXISTS INPATH 'test_a.bz2' INTO TABLE logs
PARTITION(ds='2012-03-19', hr='23')}}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira