[
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739170#action_12739170
]
Todd Lipcon commented on HIVE-718:
----------------------------------
bq. I think it's not acceptable for a failed "insert" to corrupt the original
data of the table.
then we definitely have to move an entire directory of files in at once -
otherwise we can have an insert partially succeed
bq. We never have a table with sub directories (instead of files) inside. We
will need some testing to make sure it actually works.
This is going to be a necessity to do non-overwrite loads into a
table/partition, right?
bq. For unique name, maybe we can just prepend the job id.
This isn't always available (eg running LOAD DATA from the cli). I think we're
stuck with java.util.UUID, as ugly as it may be.
I've spent the last hour or so trying to figure out any other way of generating
a unique name inside a subdirectory. Because of the semantics of
FileSystem.mkdirs and FileSystem.rename, I don't believe there's any way of
doing this. mkdirs doesn't return false in the case that the directory already
exists, and if you rename(src, dst), and dst already exists as a directory, it
will move src *inside* of dst.
> Load data inpath into a new partition without overwrite does not move the file
> ------------------------------------------------------------------------------
>
> Key: HIVE-718
> URL: https://issues.apache.org/jira/browse/HIVE-718
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: Zheng Shao
> Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for
> partitioned tables. The select after the first load returns nothing, while
> the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string)
> > stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a 2009-08-01
> b 2009-08-01
> d 2009-08-01
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.