[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

Zheng Shao (JIRA) Tue, 04 Aug 2009 15:06:30 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739197#action_12739197
 ]


Zheng Shao commented on HIVE-718:
---------------------------------

bq. Zheng, aren't buckets are separate subdirs? they work so sub-dirs should be 
fine.

I tried to add a directory into a table, and then run this. Apparently hadoop 
file format does not like the sub directory:
Buckets are files not directories.

{code}
> select * from zshao_tt;
OK
Failed with exception java.io.IOException:Not a file: 
hdfs://dfs1.data.facebook.com:9000/user/facebook/warehouse/zshao_tt/a
09/08/04 14:49:38 ERROR exec.FetchTask: Failed with exception 
java.io.IOException:Not a file: 
hdfs://dfs1.data.facebook.com:9000/user/facebook/warehouse/zshao_tt/a
java.io.IOException: Not a file: 
hdfs://dfs1.data.facebook.com:9000/user/facebook/warehouse/zshao_tt/a
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:231)
        at 
org.apache.hadoop.hive.ql.exec.FetchTask.getRecordReader(FetchTask.java:236)
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:291)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:368)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:306)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:166)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
{code}

I discussed with Ashish offline on this. I think we still want the atomic 
property of insert - as a result, we may need to manually expand the input 
directory into a bunch of files, and feed the files into the map/reduce jobs 
(instead of the directories).  That code is in ExecDriver.java and 
MapRedTask.java when we set the JobConf.

What do you think?


> Load data inpath into a new partition without overwrite does not move the file
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-718
>                 URL: https://issues.apache.org/jira/browse/HIVE-718
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>         Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for 
> partitioned tables. The select after the first load returns nothing, while 
> the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string) 
> > stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a       2009-08-01
> b       2009-08-01
> d       2009-08-01
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

Reply via email to