[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes

2010-08-27 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1492:
-

Fix Version/s: (was: 0.7.0)
Affects Version/s: (was: 0.7.0)
  Component/s: Query Processor

> FileSinkOperator should remove duplicated files from the same task based on 
> file sizes
> --
>
> Key: HIVE-1492
> URL: https://issues.apache.org/jira/browse/HIVE-1492
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0
>
> Attachments: HIVE-1492.patch, HIVE-1492_branch-0.6.patch
>
>
> FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to 
> retain only one file for each task. A task could produce multiple files due 
> to failed attempts or speculative runs. The largest file should be retained 
> rather than the first file for each task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes

2010-07-29 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1492:
---

Fix Version/s: 0.6.0

> FileSinkOperator should remove duplicated files from the same task based on 
> file sizes
> --
>
> Key: HIVE-1492
> URL: https://issues.apache.org/jira/browse/HIVE-1492
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1492.patch, HIVE-1492_branch-0.6.patch
>
>
> FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to 
> retain only one file for each task. A task could produce multiple files due 
> to failed attempts or speculative runs. The largest file should be retained 
> rather than the first file for each task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes

2010-07-29 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1492:
-

Attachment: HIVE-1492_branch-0.6.patch

Uploading a patch for branch-0.6.

> FileSinkOperator should remove duplicated files from the same task based on 
> file sizes
> --
>
> Key: HIVE-1492
> URL: https://issues.apache.org/jira/browse/HIVE-1492
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1492.patch, HIVE-1492_branch-0.6.patch
>
>
> FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to 
> retain only one file for each task. A task could produce multiple files due 
> to failed attempts or speculative runs. The largest file should be retained 
> rather than the first file for each task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes

2010-07-29 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1492:
---

   Status: Resolved  (was: Patch Available)
Fix Version/s: 0.7.0
   Resolution: Fixed

I just committed. Thanks  Ning!

> FileSinkOperator should remove duplicated files from the same task based on 
> file sizes
> --
>
> Key: HIVE-1492
> URL: https://issues.apache.org/jira/browse/HIVE-1492
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1492.patch
>
>
> FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to 
> retain only one file for each task. A task could produce multiple files due 
> to failed attempts or speculative runs. The largest file should be retained 
> rather than the first file for each task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes

2010-07-28 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1492:
-

Attachment: HIVE-1492.patch

> FileSinkOperator should remove duplicated files from the same task based on 
> file sizes
> --
>
> Key: HIVE-1492
> URL: https://issues.apache.org/jira/browse/HIVE-1492
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
> Attachments: HIVE-1492.patch
>
>
> FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to 
> retain only one file for each task. A task could produce multiple files due 
> to failed attempts or speculative runs. The largest file should be retained 
> rather than the first file for each task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes

2010-07-28 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1492:
-

   Status: Patch Available  (was: Open)
Affects Version/s: 0.7.0

> FileSinkOperator should remove duplicated files from the same task based on 
> file sizes
> --
>
> Key: HIVE-1492
> URL: https://issues.apache.org/jira/browse/HIVE-1492
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
> Attachments: HIVE-1492.patch
>
>
> FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to 
> retain only one file for each task. A task could produce multiple files due 
> to failed attempts or speculative runs. The largest file should be retained 
> rather than the first file for each task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.