[
https://issues.apache.org/jira/browse/HIVE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977699#comment-13977699
]
Prasanth J commented on HIVE-6958:
----------------------------------
The reason for this failure, is related to the behaviour of UNION. INSERT
queries with UNION ALL will create sub-directories under table/partition
directory. For example:
{code}
insert overwrite table outputTbl1
SELECT *
FROM (
SELECT key, count(1) as values from inputTbl1 group by key
UNION ALL
SELECT key, count(1) as values from inputTbl1 group by key
) a;
{code}
for the above query, the warehouse/outputTbl1 directory will have 2
sub-directories corresponding to each SELECT queries like
warehouse/outputTbl1/15/, warehouse/outputTbl1/16/. Here 15 and 16 are operator
identifiers
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/unionproc/UnionProcFactory.java#L223
This special case (having directory under table) happens only for union insert.
All other cases will have files underneath the table directory for
unpartitioned tables. But the metastore utils for updating the fast stats are
not aware of this directory structure (it expects files underneath table
directory). The Warehouse.getFileStatusesForUnpartitionedTable() recurses only
one level under table directory if it is unpartitioned table
https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java#L540.
For union insert, if only 1 level is recursed you will get only the folder
sizes and not the actual file sizes. Folder sizes are different for different
OSes. It looks like original diff was generated using Mac OS X and the new diff
was generated using Centos. Both the diffs are *wrong* as they return folder
size as opposed to file sizes.
1) One way to fix this is to change the recurse level to a value greater than
1.
2) Another way would be to fix UNION to create files instead of directories. To
resolve filename conflict it can append the operator id to filename.
[~ashutoshc]/[~jdere] do you guys have any thoughts about this?
> update union_remove_*, other tests for hadoop-2
> -----------------------------------------------
>
> Key: HIVE-6958
> URL: https://issues.apache.org/jira/browse/HIVE-6958
> Project: Hive
> Issue Type: Bug
> Components: Tests
> Reporter: Jason Dere
> Assignee: Jason Dere
> Attachments: HIVE-6958.1.patch
>
>
> Update q.out files to match totalSize for Linux platform.
--
This message was sent by Atlassian JIRA
(v6.2#6252)