[ https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ning Zhang reassigned HIVE-1492: -------------------------------- Assignee: Ning Zhang > FileSinkOperator should remove duplicated files from the same task based on > file sizes > -------------------------------------------------------------------------------------- > > Key: HIVE-1492 > URL: https://issues.apache.org/jira/browse/HIVE-1492 > Project: Hadoop Hive > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: Ning Zhang > Assignee: Ning Zhang > Attachments: HIVE-1492.patch > > > FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to > retain only one file for each task. A task could produce multiple files due > to failed attempts or speculative runs. The largest file should be retained > rather than the first file for each task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.