[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes
[ https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1492: - Fix Version/s: (was: 0.7.0) Affects Version/s: (was: 0.7.0) Component/s: Query Processor > FileSinkOperator should remove duplicated files from the same task based on > file sizes > -- > > Key: HIVE-1492 > URL: https://issues.apache.org/jira/browse/HIVE-1492 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.6.0 > > Attachments: HIVE-1492.patch, HIVE-1492_branch-0.6.patch > > > FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to > retain only one file for each task. A task could produce multiple files due > to failed attempts or speculative runs. The largest file should be retained > rather than the first file for each task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes
[ https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1492: --- Fix Version/s: 0.6.0 > FileSinkOperator should remove duplicated files from the same task based on > file sizes > -- > > Key: HIVE-1492 > URL: https://issues.apache.org/jira/browse/HIVE-1492 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.6.0, 0.7.0 > > Attachments: HIVE-1492.patch, HIVE-1492_branch-0.6.patch > > > FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to > retain only one file for each task. A task could produce multiple files due > to failed attempts or speculative runs. The largest file should be retained > rather than the first file for each task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes
[ https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1492: - Attachment: HIVE-1492_branch-0.6.patch Uploading a patch for branch-0.6. > FileSinkOperator should remove duplicated files from the same task based on > file sizes > -- > > Key: HIVE-1492 > URL: https://issues.apache.org/jira/browse/HIVE-1492 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1492.patch, HIVE-1492_branch-0.6.patch > > > FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to > retain only one file for each task. A task could produce multiple files due > to failed attempts or speculative runs. The largest file should be retained > rather than the first file for each task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes
[ https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1492: --- Status: Resolved (was: Patch Available) Fix Version/s: 0.7.0 Resolution: Fixed I just committed. Thanks Ning! > FileSinkOperator should remove duplicated files from the same task based on > file sizes > -- > > Key: HIVE-1492 > URL: https://issues.apache.org/jira/browse/HIVE-1492 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1492.patch > > > FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to > retain only one file for each task. A task could produce multiple files due > to failed attempts or speculative runs. The largest file should be retained > rather than the first file for each task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes
[ https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1492: - Attachment: HIVE-1492.patch > FileSinkOperator should remove duplicated files from the same task based on > file sizes > -- > > Key: HIVE-1492 > URL: https://issues.apache.org/jira/browse/HIVE-1492 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Ning Zhang > Attachments: HIVE-1492.patch > > > FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to > retain only one file for each task. A task could produce multiple files due > to failed attempts or speculative runs. The largest file should be retained > rather than the first file for each task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes
[ https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1492: - Status: Patch Available (was: Open) Affects Version/s: 0.7.0 > FileSinkOperator should remove duplicated files from the same task based on > file sizes > -- > > Key: HIVE-1492 > URL: https://issues.apache.org/jira/browse/HIVE-1492 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Ning Zhang > Attachments: HIVE-1492.patch > > > FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to > retain only one file for each task. A task could produce multiple files due > to failed attempts or speculative runs. The largest file should be retained > rather than the first file for each task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.