Sushanth Sowmyan created HCATALOG-580:
-----------------------------------------

             Summary: Optimizations in HCAT-538 break e2e tests
                 Key: HCATALOG-580
                 URL: https://issues.apache.org/jira/browse/HCATALOG-580
             Project: HCatalog
          Issue Type: Bug
    Affects Versions: 0.5
         Environment: RH 5.8 (on AWS)
Hadoop 1.1.2.17 (build)
HCat 0.5 (build)
            Reporter: Sushanth Sowmyan
             Fix For: 0.5


The optimizations brought in by HCATALOG-538 break dynamic partitioning in the 
e2e tests. The issue is that the assumption that if the first child in a 
directory structure is a directory, the rest are directories, and if the first 
child is a file, then the rest are files is an incorrect one.

(Admittedly, one part of that, that of assuming that if the first child is a 
file, the assumption that it is a leaf directory is not necessarily a bad one 
in premise, although still incorrect)

The issue with this is that underlying FileOutputCommitter and OutputFormat 
behaviour would affect whether or not you get files or directories, or whether 
there would be any _temporary directories still left behind, for eg.

In the case I tested, the issue is that there is a _temporary directory in a 
"leaf" directory, followed by part files. The optimization sees the _temporary 
directory, finds nothing inside it, so doesn't mkdir any parent, then decides 
that the rest are directories, then moves to the part file, and tries to rename 
it directly without mkdir-ing its parent directory.

The e2e test conf in question is Pig_Checkin_7
{code}
                                 'num' => 7
                                ,'hcat_prep'=>q\drop table if exists 
pig_checkin_7;
create table pig_checkin_7 (name string, age int) partitioned by (ds string) 
STORED AS TEXTFILE;\
                                ,'pig' => q\a = load 'studentparttab30k' using 
org.apache.hcatalog.pig.HCatLoader();
b = foreach a generate name, age, ds;
store b into 'pig_checkin_7' using org.apache.hcatalog.pig.HCatStorer();\,
                                ,'result_table' => 'pig_checkin_7',
                                ,'sql'   => "select name, age, ds from 
studentparttab30k;",
                                ,'floatpostprocess' => 1
                                ,'delimiter' => '       '
                                } 
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to