[
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747668#action_12747668
]
Todd Lipcon commented on HIVE-718:
----------------------------------
Not sure how that actually helps - if we use an algorithm like:
{code}
for each file to be moved:
while not successful:
come up with a random name
try to move src file to the random name
if it fails due to dst already existing, try again with a new random name
{code}
then we'd lose the atomicity/isolation - readers would see a partial load
during the middle of the operation.
We can't use that algorithm with atomic directory renames, since Hadoop has the
wacky behavior that move("srcdir", "dstdir") will create "dstdir/srcdir" if
dstdir already exists
> Load data inpath into a new partition without overwrite does not move the file
> ------------------------------------------------------------------------------
>
> Key: HIVE-718
> URL: https://issues.apache.org/jira/browse/HIVE-718
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Zheng Shao
> Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for
> partitioned tables. The select after the first load returns nothing, while
> the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string)
> > stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a 2009-08-01
> b 2009-08-01
> d 2009-08-01
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.