[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739170#action_12739170
 ] 

Todd Lipcon commented on HIVE-718:
----------------------------------

bq. I think it's not acceptable for a failed "insert" to corrupt the original 
data of the table. 

then we definitely have to move an entire directory of files in at once - 
otherwise we can have an insert partially succeed

bq. We never have a table with sub directories (instead of files) inside. We 
will need some testing to make sure it actually works.

This is going to be a necessity to do non-overwrite loads into a 
table/partition, right?

bq. For unique name, maybe we can just prepend the job id.

This isn't always available (eg running LOAD DATA from the cli). I think we're 
stuck with java.util.UUID, as ugly as it may be.

I've spent the last hour or so trying to figure out any other way of generating 
a unique name inside a subdirectory. Because of the semantics of 
FileSystem.mkdirs and FileSystem.rename, I don't believe there's any way of 
doing this. mkdirs doesn't return false in the case that the directory already 
exists, and if you rename(src, dst), and dst already exists as a directory, it 
will move src *inside* of dst.

> Load data inpath into a new partition without overwrite does not move the file
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-718
>                 URL: https://issues.apache.org/jira/browse/HIVE-718
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>         Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for 
> partitioned tables. The select after the first load returns nothing, while 
> the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string) 
> > stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a       2009-08-01
> b       2009-08-01
> d       2009-08-01
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to