[jira] [Commented] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524601#comment-14524601
 ] 

Hadoop QA commented on MAPREDUCE-4882:
--

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12566626/MAPREDUCE-4882.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5507/console |


This message was automatically generated.

> Error in estimating the length of the output file in Spill Phase
> 
>
> Key: MAPREDUCE-4882
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2, 1.0.3
> Environment: Any Environment
>Reporter: Lijie Xu
>Assignee: Jerry Chen
>  Labels: patch
> Attachments: MAPREDUCE-4882.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The sortAndSpill() method in MapTask.java has an error in estimating the 
> length of the output file. 
> The "long size" should be "(bufvoid - bufstart) + bufend" not "(bufvoid - 
> bufend) + bufstart" when "bufend < bufstart".
> Here is the original code in MapTask.java.
>  private void sortAndSpill() throws IOException, ClassNotFoundException,
>InterruptedException {
>   //approximate the length of the output file to be the length of the
>   //buffer + header lengths for the partitions
>   long size = (bufend >= bufstart
>   ? bufend - bufstart
>   : (bufvoid - bufend) + bufstart) +
>   partitions * APPROX_HEADER_LENGTH;
>   FSDataOutputStream out = null;
> --
> I had a test on "TeraSort". A snippet from mapper's log is as follows:
> MapTask: Spilling map output: record full = true
> MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440
> MapTask: kvstart = 262142; kvend = 131069; length = 655360
> MapTask: Finished spill 3
> In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 
> 52428700 (52 MB) because the number of spilled records is 524287 and each 
> record costs 100B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase

2013-01-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563472#comment-13563472
 ] 

Hadoop QA commented on MAPREDUCE-4882:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566626/MAPREDUCE-4882.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3281//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3281//console

This message is automatically generated.

> Error in estimating the length of the output file in Spill Phase
> 
>
> Key: MAPREDUCE-4882
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2, 1.0.3
> Environment: Any Environment
>Reporter: Lijie Xu
>Assignee: Jerry Chen
>  Labels: patch
> Attachments: MAPREDUCE-4882.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The sortAndSpill() method in MapTask.java has an error in estimating the 
> length of the output file. 
> The "long size" should be "(bufvoid - bufstart) + bufend" not "(bufvoid - 
> bufend) + bufstart" when "bufend < bufstart".
> Here is the original code in MapTask.java.
>  private void sortAndSpill() throws IOException, ClassNotFoundException,
>InterruptedException {
>   //approximate the length of the output file to be the length of the
>   //buffer + header lengths for the partitions
>   long size = (bufend >= bufstart
>   ? bufend - bufstart
>   : (bufvoid - bufend) + bufstart) +
>   partitions * APPROX_HEADER_LENGTH;
>   FSDataOutputStream out = null;
> --
> I had a test on "TeraSort". A snippet from mapper's log is as follows:
> MapTask: Spilling map output: record full = true
> MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440
> MapTask: kvstart = 262142; kvend = 131069; length = 655360
> MapTask: Finished spill 3
> In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 
> 52428700 (52 MB) because the number of spilled records is 524287 and each 
> record costs 100B.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase

2013-01-26 Thread Lijie Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563459#comment-13563459
 ] 

Lijie Xu commented on MAPREDUCE-4882:
-

[~jerrychenhf]
Thanks, I checked this patch and think it is correct. In fact, I had run many 
jobs under this change and found nothing abnormal. If I find more problems 
about this change, I will report.

> Error in estimating the length of the output file in Spill Phase
> 
>
> Key: MAPREDUCE-4882
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2, 1.0.3
> Environment: Any Environment
>Reporter: Lijie Xu
>Assignee: Jerry Chen
>  Labels: patch
> Attachments: MAPREDUCE-4882.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The sortAndSpill() method in MapTask.java has an error in estimating the 
> length of the output file. 
> The "long size" should be "(bufvoid - bufstart) + bufend" not "(bufvoid - 
> bufend) + bufstart" when "bufend < bufstart".
> Here is the original code in MapTask.java.
>  private void sortAndSpill() throws IOException, ClassNotFoundException,
>InterruptedException {
>   //approximate the length of the output file to be the length of the
>   //buffer + header lengths for the partitions
>   long size = (bufend >= bufstart
>   ? bufend - bufstart
>   : (bufvoid - bufend) + bufstart) +
>   partitions * APPROX_HEADER_LENGTH;
>   FSDataOutputStream out = null;
> --
> I had a test on "TeraSort". A snippet from mapper's log is as follows:
> MapTask: Spilling map output: record full = true
> MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440
> MapTask: kvstart = 262142; kvend = 131069; length = 655360
> MapTask: Finished spill 3
> In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 
> 52428700 (52 MB) because the number of spilled records is 524287 and each 
> record costs 100B.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase

2013-01-26 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563424#comment-13563424
 ] 

Jerry Chen commented on MAPREDUCE-4882:
---

[~gelesh]
Map task will choose the splill file dir on local disks according to the 
estimating size if there are mutliple local dirs configuraed. The wrong 
estimating size may cause a wrong decision such as choosing the smaller space 
dir according to the give size (the wrong one) while the actual spill is larger 
and thus cause disk full error, although there may be another disk dir with 
enough space available.


> Error in estimating the length of the output file in Spill Phase
> 
>
> Key: MAPREDUCE-4882
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2, 1.0.3
> Environment: Any Environment
>Reporter: Lijie Xu
>  Labels: patch
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The sortAndSpill() method in MapTask.java has an error in estimating the 
> length of the output file. 
> The "long size" should be "(bufvoid - bufstart) + bufend" not "(bufvoid - 
> bufend) + bufstart" when "bufend < bufstart".
> Here is the original code in MapTask.java.
>  private void sortAndSpill() throws IOException, ClassNotFoundException,
>InterruptedException {
>   //approximate the length of the output file to be the length of the
>   //buffer + header lengths for the partitions
>   long size = (bufend >= bufstart
>   ? bufend - bufstart
>   : (bufvoid - bufend) + bufstart) +
>   partitions * APPROX_HEADER_LENGTH;
>   FSDataOutputStream out = null;
> --
> I had a test on "TeraSort". A snippet from mapper's log is as follows:
> MapTask: Spilling map output: record full = true
> MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440
> MapTask: kvstart = 262142; kvend = 131069; length = 655360
> MapTask: Finished spill 3
> In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 
> 52428700 (52 MB) because the number of spilled records is 524287 and each 
> record costs 100B.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase

2013-01-25 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562621#comment-13562621
 ] 

Gelesh commented on MAPREDUCE-4882:
---

Could you please share how is it impacting ?

> Error in estimating the length of the output file in Spill Phase
> 
>
> Key: MAPREDUCE-4882
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2, 1.0.3
> Environment: Any Environment
>Reporter: Lijie Xu
>  Labels: patch
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The sortAndSpill() method in MapTask.java has an error in estimating the 
> length of the output file. 
> The "long size" should be "(bufvoid - bufstart) + bufend" not "(bufvoid - 
> bufend) + bufstart" when "bufend < bufstart".
> Here is the original code in MapTask.java.
>  private void sortAndSpill() throws IOException, ClassNotFoundException,
>InterruptedException {
>   //approximate the length of the output file to be the length of the
>   //buffer + header lengths for the partitions
>   long size = (bufend >= bufstart
>   ? bufend - bufstart
>   : (bufvoid - bufend) + bufstart) +
>   partitions * APPROX_HEADER_LENGTH;
>   FSDataOutputStream out = null;
> --
> I had a test on "TeraSort". A snippet from mapper's log is as follows:
> MapTask: Spilling map output: record full = true
> MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440
> MapTask: kvstart = 262142; kvend = 131069; length = 655360
> MapTask: Finished spill 3
> In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 
> 52428700 (52 MB) because the number of spilled records is 524287 and each 
> record costs 100B.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira