Owen O'Malley created HIVE-12638:
------------------------------------
Summary: Hive should not create empty files in partitions
Key: HIVE-12638
URL: https://issues.apache.org/jira/browse/HIVE-12638
Project: Hive
Issue Type: Bug
Components: File Formats
Reporter: Owen O'Malley
Currently Hive creates empty files for buckets with no rows in a directory. I
believe this was originally because the SMB and bucket join require files to be
present to get InputSplits. There are customers where this behavior leads the
creation of more 200,000 empty ORC files per an hour on a cluster (with peaks
of more than 725,000 per an hour). We've also seen instances where a single
DataNode is involved in 5600 of these empty ORC files within a 2 minute period.
This causes significant stress on HDFS at both the NameNode and DataNode and is
completely unnecessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)