[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-11-06 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-5958:
--
   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

> Wrong reduce task progress if map output is compressed
> --
>
> Key: MAPREDUCE-5958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
>Reporter: Emilio Coppa
>Assignee: Emilio Coppa
>Priority: Minor
>  Labels: progress, reduce
> Fix For: 2.6.0
>
> Attachments: HADOOP-5958-v2.patch, MAPREDUCE-5958v3.patch
>
>
> If the map output is compressed (_mapreduce.map.output.compress_ set to 
> _true_) then the reduce task progress may be highly underestimated.
> In the reduce phase (but also in the merge phase), the progress of a reduce 
> task is computed as the ratio between the number of processed bytes and the 
> number of total bytes. But:
> - the number of total bytes is computed by summing up the uncompressed 
> segment sizes (_Merger.Segment.getRawDataLength()_)
> - the number of processed bytes is computed by exploiting the position of the 
> current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
> refer to the position in the underlying on disk file (which may be compressed)
> Thus, if the map outputs are compressed then the progress may be 
> underestimated (e.g., only 1 map output ondisk file, the compressed file is 
> 25% of its original size, then the reduce task progress during the reduce 
> phase will range between 0 and 0.25 and then artificially jump to 1.0).
> Attached there is a patch: the number of processed bytes is now computed by 
> exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
> _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-10-30 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5958:
--
Attachment: MAPREDUCE-5958v3.patch

I'd really like to see this fixed for 2.6.  I went ahead and fixed the existing 
unit tests so it really does a merge and monitors the progress as values are 
grabbed from the iterator.

> Wrong reduce task progress if map output is compressed
> --
>
> Key: MAPREDUCE-5958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
>Reporter: Emilio Coppa
>Assignee: Emilio Coppa
>Priority: Minor
>  Labels: progress, reduce
> Attachments: HADOOP-5958-v2.patch, MAPREDUCE-5958v3.patch
>
>
> If the map output is compressed (_mapreduce.map.output.compress_ set to 
> _true_) then the reduce task progress may be highly underestimated.
> In the reduce phase (but also in the merge phase), the progress of a reduce 
> task is computed as the ratio between the number of processed bytes and the 
> number of total bytes. But:
> - the number of total bytes is computed by summing up the uncompressed 
> segment sizes (_Merger.Segment.getRawDataLength()_)
> - the number of processed bytes is computed by exploiting the position of the 
> current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
> refer to the position in the underlying on disk file (which may be compressed)
> Thus, if the map outputs are compressed then the progress may be 
> underestimated (e.g., only 1 map output ondisk file, the compressed file is 
> 25% of its original size, then the reduce task progress during the reduce 
> phase will range between 0 and 0.25 and then artificially jump to 1.0).
> Attached there is a patch: the number of processed bytes is now computed by 
> exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
> _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-10-10 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5958:
--
Target Version/s: 2.6.0
   Fix Version/s: (was: 2.4.1)

> Wrong reduce task progress if map output is compressed
> --
>
> Key: MAPREDUCE-5958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
>Reporter: Emilio Coppa
>Assignee: Emilio Coppa
>Priority: Minor
>  Labels: progress, reduce
> Attachments: HADOOP-5958-v2.patch
>
>
> If the map output is compressed (_mapreduce.map.output.compress_ set to 
> _true_) then the reduce task progress may be highly underestimated.
> In the reduce phase (but also in the merge phase), the progress of a reduce 
> task is computed as the ratio between the number of processed bytes and the 
> number of total bytes. But:
> - the number of total bytes is computed by summing up the uncompressed 
> segment sizes (_Merger.Segment.getRawDataLength()_)
> - the number of processed bytes is computed by exploiting the position of the 
> current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
> refer to the position in the underlying on disk file (which may be compressed)
> Thus, if the map outputs are compressed then the progress may be 
> underestimated (e.g., only 1 map output ondisk file, the compressed file is 
> 25% of its original size, then the reduce task progress during the reduce 
> phase will range between 0 and 0.25 and then artificially jump to 1.0).
> Attached there is a patch: the number of processed bytes is now computed by 
> exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
> _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-07-05 Thread Emilio Coppa (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emilio Coppa updated MAPREDUCE-5958:


Attachment: HADOOP-5958-v2.patch

> Wrong reduce task progress if map output is compressed
> --
>
> Key: MAPREDUCE-5958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
>Reporter: Emilio Coppa
>Priority: Minor
>  Labels: progress, reduce
> Fix For: 2.4.1
>
> Attachments: HADOOP-5958-v2.patch
>
>
> If the map output is compressed (_mapreduce.map.output.compress_ set to 
> _true_) then the reduce task progress may be highly underestimated.
> In the reduce phase (but also in the merge phase), the progress of a reduce 
> task is computed as the ratio between the number of processed bytes and the 
> number of total bytes. But:
> - the number of total bytes is computed by summing up the uncompressed 
> segment sizes (_Merger.Segment.getRawDataLength()_)
> - the number of processed bytes is computed by exploiting the position of the 
> current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
> refer to the position in the underlying on disk file (which may be compressed)
> Thus, if the map outputs are compressed then the progress may be 
> underestimated (e.g., only 1 map output ondisk file, the compressed file is 
> 25% of its original size, then the reduce task progress during the reduce 
> phase will range between 0 and 0.25 and then artificially jump to 1.0).
> Attached there is a patch: the number of processed bytes is now computed by 
> exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
> _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-07-05 Thread Emilio Coppa (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emilio Coppa updated MAPREDUCE-5958:


Attachment: (was: HADOOP-5958.patch)

> Wrong reduce task progress if map output is compressed
> --
>
> Key: MAPREDUCE-5958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
>Reporter: Emilio Coppa
>Priority: Minor
>  Labels: progress, reduce
> Fix For: 2.4.1
>
> Attachments: HADOOP-5958-v2.patch
>
>
> If the map output is compressed (_mapreduce.map.output.compress_ set to 
> _true_) then the reduce task progress may be highly underestimated.
> In the reduce phase (but also in the merge phase), the progress of a reduce 
> task is computed as the ratio between the number of processed bytes and the 
> number of total bytes. But:
> - the number of total bytes is computed by summing up the uncompressed 
> segment sizes (_Merger.Segment.getRawDataLength()_)
> - the number of processed bytes is computed by exploiting the position of the 
> current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
> refer to the position in the underlying on disk file (which may be compressed)
> Thus, if the map outputs are compressed then the progress may be 
> underestimated (e.g., only 1 map output ondisk file, the compressed file is 
> 25% of its original size, then the reduce task progress during the reduce 
> phase will range between 0 and 0.25 and then artificially jump to 1.0).
> Attached there is a patch: the number of processed bytes is now computed by 
> exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
> _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-07-05 Thread Emilio Coppa (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emilio Coppa updated MAPREDUCE-5958:


Status: Patch Available  (was: Open)

> Wrong reduce task progress if map output is compressed
> --
>
> Key: MAPREDUCE-5958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.4.1, 2.2.1, 2.3.0, 2.2.0, 2.4.0
>Reporter: Emilio Coppa
>Priority: Minor
>  Labels: progress, reduce
> Fix For: 2.4.1
>
> Attachments: HADOOP-5958.patch
>
>
> If the map output is compressed (_mapreduce.map.output.compress_ set to 
> _true_) then the reduce task progress may be highly underestimated.
> In the reduce phase (but also in the merge phase), the progress of a reduce 
> task is computed as the ratio between the number of processed bytes and the 
> number of total bytes. But:
> - the number of total bytes is computed by summing up the uncompressed 
> segment sizes (_Merger.Segment.getRawDataLength()_)
> - the number of processed bytes is computed by exploiting the position of the 
> current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
> refer to the position in the underlying on disk file (which may be compressed)
> Thus, if the map outputs are compressed then the progress may be 
> underestimated (e.g., only 1 map output ondisk file, the compressed file is 
> 25% of its original size, then the reduce task progress during the reduce 
> phase will range between 0 and 0.25 and then artificially jump to 1.0).
> Attached there is a patch: the number of processed bytes is now computed by 
> exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
> _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-07-05 Thread Emilio Coppa (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emilio Coppa updated MAPREDUCE-5958:


Attachment: HADOOP-5958.patch

Patch updated with correct path (previous one failed with Hadoop QA)

> Wrong reduce task progress if map output is compressed
> --
>
> Key: MAPREDUCE-5958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
>Reporter: Emilio Coppa
>Priority: Minor
>  Labels: progress, reduce
> Fix For: 2.4.1
>
> Attachments: HADOOP-5958.patch
>
>
> If the map output is compressed (_mapreduce.map.output.compress_ set to 
> _true_) then the reduce task progress may be highly underestimated.
> In the reduce phase (but also in the merge phase), the progress of a reduce 
> task is computed as the ratio between the number of processed bytes and the 
> number of total bytes. But:
> - the number of total bytes is computed by summing up the uncompressed 
> segment sizes (_Merger.Segment.getRawDataLength()_)
> - the number of processed bytes is computed by exploiting the position of the 
> current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
> refer to the position in the underlying on disk file (which may be compressed)
> Thus, if the map outputs are compressed then the progress may be 
> underestimated (e.g., only 1 map output ondisk file, the compressed file is 
> 25% of its original size, then the reduce task progress during the reduce 
> phase will range between 0 and 0.25 and then artificially jump to 1.0).
> Attached there is a patch: the number of processed bytes is now computed by 
> exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
> _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-07-05 Thread Emilio Coppa (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emilio Coppa updated MAPREDUCE-5958:


Attachment: (was: HADOOP-5958.patch)

> Wrong reduce task progress if map output is compressed
> --
>
> Key: MAPREDUCE-5958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
>Reporter: Emilio Coppa
>Priority: Minor
>  Labels: progress, reduce
> Fix For: 2.4.1
>
>
> If the map output is compressed (_mapreduce.map.output.compress_ set to 
> _true_) then the reduce task progress may be highly underestimated.
> In the reduce phase (but also in the merge phase), the progress of a reduce 
> task is computed as the ratio between the number of processed bytes and the 
> number of total bytes. But:
> - the number of total bytes is computed by summing up the uncompressed 
> segment sizes (_Merger.Segment.getRawDataLength()_)
> - the number of processed bytes is computed by exploiting the position of the 
> current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
> refer to the position in the underlying on disk file (which may be compressed)
> Thus, if the map outputs are compressed then the progress may be 
> underestimated (e.g., only 1 map output ondisk file, the compressed file is 
> 25% of its original size, then the reduce task progress during the reduce 
> phase will range between 0 and 0.25 and then artificially jump to 1.0).
> Attached there is a patch: the number of processed bytes is now computed by 
> exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
> _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-07-05 Thread Emilio Coppa (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emilio Coppa updated MAPREDUCE-5958:


Status: Open  (was: Patch Available)

> Wrong reduce task progress if map output is compressed
> --
>
> Key: MAPREDUCE-5958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.4.1, 2.2.1, 2.3.0, 2.2.0, 2.4.0
>Reporter: Emilio Coppa
>Priority: Minor
>  Labels: progress, reduce
> Fix For: 2.4.1
>
> Attachments: HADOOP-5958.patch
>
>
> If the map output is compressed (_mapreduce.map.output.compress_ set to 
> _true_) then the reduce task progress may be highly underestimated.
> In the reduce phase (but also in the merge phase), the progress of a reduce 
> task is computed as the ratio between the number of processed bytes and the 
> number of total bytes. But:
> - the number of total bytes is computed by summing up the uncompressed 
> segment sizes (_Merger.Segment.getRawDataLength()_)
> - the number of processed bytes is computed by exploiting the position of the 
> current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
> refer to the position in the underlying on disk file (which may be compressed)
> Thus, if the map outputs are compressed then the progress may be 
> underestimated (e.g., only 1 map output ondisk file, the compressed file is 
> 25% of its original size, then the reduce task progress during the reduce 
> phase will range between 0 and 0.25 and then artificially jump to 1.0).
> Attached there is a patch: the number of processed bytes is now computed by 
> exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
> _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-07-05 Thread Emilio Coppa (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emilio Coppa updated MAPREDUCE-5958:


Description: 
If the map output is compressed (_mapreduce.map.output.compress_ set to _true_) 
then the reduce task progress may be highly underestimated.

In the reduce phase (but also in the merge phase), the progress of a reduce 
task is computed as the ratio between the number of processed bytes and the 
number of total bytes. But:

- the number of total bytes is computed by summing up the uncompressed segment 
sizes (_Merger.Segment.getRawDataLength()_)

- the number of processed bytes is computed by exploiting the position of the 
current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may refer 
to the position in the underlying on disk file (which may be compressed)

Thus, if the map outputs are compressed then the progress may be underestimated 
(e.g., only 1 map output ondisk file, the compressed file is 25% of its 
original size, then the reduce task progress during the reduce phase will range 
between 0 and 0.25 and then artificially jump to 1.0).

Attached there is a patch: the number of processed bytes is now computed by 
exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
_getPosition()_ already returns exactly this field).


  was:
If the map output is compressed (_mapreduce.map.output.compress_ set to _true_) 
then the reduce task progress may be highly underestimated.

In the reduce phase (but also in the merge phase), the progress of a reduce 
task is computed as the ratio between the number of processed bytes and the 
number of total bytes. But:

- the number of total bytes is computed by summing up the uncompressed segment 
sizes (_Merger.Segment.getRawDataLength()_)

- the number of processed bytes is computed by exploiting the position of the 
current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may refer 
to the position in the underlying on disk file (which may be compressed)

Thus, if the map output are compressed then the progress may be underestimated 
(e.g., only 1 map output ondisk file, the compressed file is 25% of its 
original size, then the reduce task progress during the reduce phase will range 
between 0 and 0.25 and then artificially jump to 1.0).

Attached there is a patch: the number of processed bytes is now computed by 
exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
_getPosition()_ already returns exactly this field).



> Wrong reduce task progress if map output is compressed
> --
>
> Key: MAPREDUCE-5958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
>Reporter: Emilio Coppa
>Priority: Minor
>  Labels: progress, reduce
> Fix For: 2.4.1
>
> Attachments: HADOOP-5958.patch
>
>
> If the map output is compressed (_mapreduce.map.output.compress_ set to 
> _true_) then the reduce task progress may be highly underestimated.
> In the reduce phase (but also in the merge phase), the progress of a reduce 
> task is computed as the ratio between the number of processed bytes and the 
> number of total bytes. But:
> - the number of total bytes is computed by summing up the uncompressed 
> segment sizes (_Merger.Segment.getRawDataLength()_)
> - the number of processed bytes is computed by exploiting the position of the 
> current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
> refer to the position in the underlying on disk file (which may be compressed)
> Thus, if the map outputs are compressed then the progress may be 
> underestimated (e.g., only 1 map output ondisk file, the compressed file is 
> 25% of its original size, then the reduce task progress during the reduce 
> phase will range between 0 and 0.25 and then artificially jump to 1.0).
> Attached there is a patch: the number of processed bytes is now computed by 
> exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
> _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-07-05 Thread Emilio Coppa (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emilio Coppa updated MAPREDUCE-5958:


Fix Version/s: 2.4.1
   Status: Patch Available  (was: Open)

See the file attached: HADOOP-5958.patch

> Wrong reduce task progress if map output is compressed
> --
>
> Key: MAPREDUCE-5958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.4.1, 2.2.1, 2.3.0, 2.2.0, 2.4.0
>Reporter: Emilio Coppa
>Priority: Minor
>  Labels: progress, reduce
> Fix For: 2.4.1
>
> Attachments: HADOOP-5958.patch
>
>
> If the map output is compressed (_mapreduce.map.output.compress_ set to 
> _true_) then the reduce task progress may be highly underestimated.
> In the reduce phase (but also in the merge phase), the progress of a reduce 
> task is computed as the ratio between the number of processed bytes and the 
> number of total bytes. But:
> - the number of total bytes is computed by summing up the uncompressed 
> segment sizes (_Merger.Segment.getRawDataLength()_)
> - the number of processed bytes is computed by exploiting the position of the 
> current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
> refer to the position in the underlying on disk file (which may be compressed)
> Thus, if the map output are compressed then the progress may be 
> underestimated (e.g., only 1 map output ondisk file, the compressed file is 
> 25% of its original size, then the reduce task progress during the reduce 
> phase will range between 0 and 0.25 and then artificially jump to 1.0).
> Attached there is a patch: the number of processed bytes is now computed by 
> exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
> _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-07-05 Thread Emilio Coppa (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emilio Coppa updated MAPREDUCE-5958:


Status: Open  (was: Patch Available)

> Wrong reduce task progress if map output is compressed
> --
>
> Key: MAPREDUCE-5958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.4.1, 2.2.1, 2.3.0, 2.2.0, 2.4.0
>Reporter: Emilio Coppa
>Priority: Minor
>  Labels: progress, reduce
> Attachments: HADOOP-5958.patch
>
>
> If the map output is compressed (_mapreduce.map.output.compress_ set to 
> _true_) then the reduce task progress may be highly underestimated.
> In the reduce phase (but also in the merge phase), the progress of a reduce 
> task is computed as the ratio between the number of processed bytes and the 
> number of total bytes. But:
> - the number of total bytes is computed by summing up the uncompressed 
> segment sizes (_Merger.Segment.getRawDataLength()_)
> - the number of processed bytes is computed by exploiting the position of the 
> current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
> refer to the position in the underlying on disk file (which may be compressed)
> Thus, if the map output are compressed then the progress may be 
> underestimated (e.g., only 1 map output ondisk file, the compressed file is 
> 25% of its original size, then the reduce task progress during the reduce 
> phase will range between 0 and 0.25 and then artificially jump to 1.0).
> Attached there is a patch: the number of processed bytes is now computed by 
> exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
> _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-07-05 Thread Emilio Coppa (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emilio Coppa updated MAPREDUCE-5958:


Attachment: HADOOP-5958.patch

> Wrong reduce task progress if map output is compressed
> --
>
> Key: MAPREDUCE-5958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
>Reporter: Emilio Coppa
>Priority: Minor
>  Labels: progress, reduce
> Attachments: HADOOP-5958.patch
>
>
> If the map output is compressed (_mapreduce.map.output.compress_ set to 
> _true_) then the reduce task progress may be highly underestimated.
> In the reduce phase (but also in the merge phase), the progress of a reduce 
> task is computed as the ratio between the number of processed bytes and the 
> number of total bytes. But:
> - the number of total bytes is computed by summing up the uncompressed 
> segment sizes (_Merger.Segment.getRawDataLength()_)
> - the number of processed bytes is computed by exploiting the position of the 
> current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
> refer to the position in the underlying on disk file (which may be compressed)
> Thus, if the map output are compressed then the progress may be 
> underestimated (e.g., only 1 map output ondisk file, the compressed file is 
> 25% of its original size, then the reduce task progress during the reduce 
> phase will range between 0 and 0.25 and then artificially jump to 1.0).
> Attached there is a patch: the number of processed bytes is now computed by 
> exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
> _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-07-05 Thread Emilio Coppa (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emilio Coppa updated MAPREDUCE-5958:


Status: Patch Available  (was: Open)

> Wrong reduce task progress if map output is compressed
> --
>
> Key: MAPREDUCE-5958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.4.1, 2.2.1, 2.3.0, 2.2.0, 2.4.0
>Reporter: Emilio Coppa
>Priority: Minor
>  Labels: progress, reduce
>
> If the map output is compressed (_mapreduce.map.output.compress_ set to 
> _true_) then the reduce task progress may be highly underestimated.
> In the reduce phase (but also in the merge phase), the progress of a reduce 
> task is computed as the ratio between the number of processed bytes and the 
> number of total bytes. But:
> - the number of total bytes is computed by summing up the uncompressed 
> segment sizes (_Merger.Segment.getRawDataLength()_)
> - the number of processed bytes is computed by exploiting the position of the 
> current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
> refer to the position in the underlying on disk file (which may be compressed)
> Thus, if the map output are compressed then the progress may be 
> underestimated (e.g., only 1 map output ondisk file, the compressed file is 
> 25% of its original size, then the reduce task progress during the reduce 
> phase will range between 0 and 0.25 and then artificially jump to 1.0).
> Attached there is a patch: the number of processed bytes is now computed by 
> exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
> _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.2#6252)