Sushanth Sowmyan created HCATALOG-580:
-----------------------------------------
Summary: Optimizations in HCAT-538 break e2e tests
Key: HCATALOG-580
URL: https://issues.apache.org/jira/browse/HCATALOG-580
Project: HCatalog
Issue Type: Bug
Affects Versions: 0.5
Environment: RH 5.8 (on AWS)
Hadoop 1.1.2.17 (build)
HCat 0.5 (build)
Reporter: Sushanth Sowmyan
Fix For: 0.5
The optimizations brought in by HCATALOG-538 break dynamic partitioning in the
e2e tests. The issue is that the assumption that if the first child in a
directory structure is a directory, the rest are directories, and if the first
child is a file, then the rest are files is an incorrect one.
(Admittedly, one part of that, that of assuming that if the first child is a
file, the assumption that it is a leaf directory is not necessarily a bad one
in premise, although still incorrect)
The issue with this is that underlying FileOutputCommitter and OutputFormat
behaviour would affect whether or not you get files or directories, or whether
there would be any _temporary directories still left behind, for eg.
In the case I tested, the issue is that there is a _temporary directory in a
"leaf" directory, followed by part files. The optimization sees the _temporary
directory, finds nothing inside it, so doesn't mkdir any parent, then decides
that the rest are directories, then moves to the part file, and tries to rename
it directly without mkdir-ing its parent directory.
The e2e test conf in question is Pig_Checkin_7
{code}
'num' => 7
,'hcat_prep'=>q\drop table if exists
pig_checkin_7;
create table pig_checkin_7 (name string, age int) partitioned by (ds string)
STORED AS TEXTFILE;\
,'pig' => q\a = load 'studentparttab30k' using
org.apache.hcatalog.pig.HCatLoader();
b = foreach a generate name, age, ds;
store b into 'pig_checkin_7' using org.apache.hcatalog.pig.HCatStorer();\,
,'result_table' => 'pig_checkin_7',
,'sql' => "select name, age, ds from
studentparttab30k;",
,'floatpostprocess' => 1
,'delimiter' => ' '
}
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira