[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-03-10 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-4815:
-
Labels: perfomance  (was: )

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
  Labels: perfomance
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch, 
 MAPREDUCE-4815.v12.patch, MAPREDUCE-4815.v13.patch, MAPREDUCE-4815.v14.patch, 
 MAPREDUCE-4815.v15.patch, MAPREDUCE-4815.v16.patch, MAPREDUCE-4815.v17.patch, 
 MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, 
 MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, 
 MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-03-10 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-4815:
-
Issue Type: Improvement  (was: Bug)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
  Labels: perfomance
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch, 
 MAPREDUCE-4815.v12.patch, MAPREDUCE-4815.v13.patch, MAPREDUCE-4815.v14.patch, 
 MAPREDUCE-4815.v15.patch, MAPREDUCE-4815.v16.patch, MAPREDUCE-4815.v17.patch, 
 MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, 
 MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, 
 MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-03-06 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: MAPREDUCE-4815.v17.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch, 
 MAPREDUCE-4815.v12.patch, MAPREDUCE-4815.v13.patch, MAPREDUCE-4815.v14.patch, 
 MAPREDUCE-4815.v15.patch, MAPREDUCE-4815.v16.patch, MAPREDUCE-4815.v17.patch, 
 MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, 
 MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, 
 MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-02-26 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: MAPREDUCE-4815.v16.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch, 
 MAPREDUCE-4815.v12.patch, MAPREDUCE-4815.v13.patch, MAPREDUCE-4815.v14.patch, 
 MAPREDUCE-4815.v15.patch, MAPREDUCE-4815.v16.patch, MAPREDUCE-4815.v3.patch, 
 MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, 
 MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-02-24 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: MAPREDUCE-4815.v15.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch, 
 MAPREDUCE-4815.v12.patch, MAPREDUCE-4815.v13.patch, MAPREDUCE-4815.v14.patch, 
 MAPREDUCE-4815.v15.patch, MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, 
 MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, 
 MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-02-23 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: MAPREDUCE-4815.v14.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch, 
 MAPREDUCE-4815.v12.patch, MAPREDUCE-4815.v13.patch, MAPREDUCE-4815.v14.patch, 
 MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, 
 MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, 
 MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-02-20 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: MAPREDUCE-4815.v13.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch, 
 MAPREDUCE-4815.v12.patch, MAPREDUCE-4815.v13.patch, MAPREDUCE-4815.v3.patch, 
 MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, 
 MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-02-20 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: MAPREDUCE-4815.v13.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch, 
 MAPREDUCE-4815.v12.patch, MAPREDUCE-4815.v13.patch, MAPREDUCE-4815.v3.patch, 
 MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, 
 MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-02-20 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: (was: MAPREDUCE-4815.v13.patch)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch, 
 MAPREDUCE-4815.v12.patch, MAPREDUCE-4815.v13.patch, MAPREDUCE-4815.v3.patch, 
 MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, 
 MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-02-18 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: MAPREDUCE-4815.v12.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch, 
 MAPREDUCE-4815.v12.patch, MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, 
 MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, 
 MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-02-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: MAPREDUCE-4815.v11.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch, 
 MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, 
 MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, 
 MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-02-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: (was: MAPREDUCE-4815.v11.patch)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v3.patch, 
 MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, 
 MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-02-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: MAPREDUCE-4815.v11.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch, 
 MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, 
 MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, 
 MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-01-27 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: (was: MAPREDUCE-4815.v10.patch)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v3.patch, 
 MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, 
 MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-01-27 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: MAPREDUCE-4815.v10.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v3.patch, 
 MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, 
 MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-01-26 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: MAPREDUCE-4815.v10.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v3.patch, 
 MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, 
 MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-01-13 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: MAPREDUCE-4815.v9.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, 
 MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, 
 MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-11-11 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---
Attachment: MAPREDUCE-4815.v8.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, 
 MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, 
 MAPREDUCE-4815.v8.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-22 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Status: Open  (was: Patch Available)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, 
 MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-22 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Attachment: MAPREDUCE-4815.v7.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, 
 MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-22 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Status: Patch Available  (was: Open)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, 
 MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-21 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Attachment: MAPREDUCE-4815.v5.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, 
 MAPREDUCE-4815.v5.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-21 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Status: Open  (was: Patch Available)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, 
 MAPREDUCE-4815.v5.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-21 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Status: Patch Available  (was: Open)

In this patch, the first task recovery will recover all succeeded tasks. 
Therefore, the rest tasks recovery will not do anything. 

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, 
 MAPREDUCE-4815.v5.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-21 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Attachment: MAPREDUCE-4815.v6.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, 
 MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-21 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Status: Patch Available  (was: Open)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, 
 MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-21 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Status: Open  (was: Patch Available)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, 
 MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-19 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Status: Open  (was: Patch Available)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-19 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Attachment: MAPREDUCE-4815.v4.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-19 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Attachment: (was: MAPREDUCE-4815.v4.patch)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-19 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Attachment: MAPREDUCE-4815.v4.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-19 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Status: Patch Available  (was: Open)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-12 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Attachment: (was: MAPREDUCE-4815.v2.patch)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li

 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-12 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Attachment: (was: MAPREDUCE-4815.v1.patch)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li

 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-12 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Status: Open  (was: Patch Available)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-12 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Attachment: MAPREDUCE-4815.v3.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-12 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Status: Patch Available  (was: Open)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-05 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Status: Open  (was: Patch Available)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v1.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-05 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Attachment: MAPREDUCE-4815.v2.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v1.patch, MAPREDUCE-4815.v2.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-05 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Status: Patch Available  (was: Open)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v1.patch, MAPREDUCE-4815.v2.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-04 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

Attachment: MAPREDUCE-4815.v1.patch

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Arun C Murthy
 Attachments: MAPREDUCE-4815.v1.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2014-08-04 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-4815:
---

 Assignee: Siqi Li  (was: Arun C Murthy)
Affects Version/s: 2.4.1
   Status: Patch Available  (was: Open)

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v1.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)