[
https://issues.apache.org/jira/browse/HADOOP-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ravi Gummadi updated HADOOP-5572:
---------------------------------
Attachment: HADOOP-5572.v1.2.patch
Made code changes as per Jothi's 1st 4 comments.
# Check if progress is being updated correctly and works fine with new Reducer
API
As progress is not updated with new Reducer api while records are being fed to
reducer, reduce task progress is not updated from 66.66% and jumps to 100% when
the task is done. May be we need to file a separate JIRA for the new API to
have the updation of progress similar to old api.
# Merger: Remove Collections.sort() in the beginning
OK. Removed sort() in the begining of merge() and changed the code in the
callers to get sorted segments to merge() if there are more than ioSortFactor
segments.
Changed mergeParts() to call merge() with sorted segments if there are more
than ioSortFactor segments. Earlier, mergeParts() was sending unsorted segments
to merge() and after first intermediate merge only, segments are sorted --- so
1st merge is not merging the smallest segments.
Removed sort() call after each intermediate merge and 'insertion into sorted
segments list' is done. This could improve performance as calling sort with
complexity O(n.logn) after each intermediate merge is costly.
# Can we do better than relying on writesCounter to determine if the final
merge needs to be included in the calculation or not?
I couldn't see a cleaner/better way of doing this.
Attaching patch with the above changes. Please review and provide your comments.
> The map progress value should have a separate phase for doing the final sort.
> -----------------------------------------------------------------------------
>
> Key: HADOOP-5572
> URL: https://issues.apache.org/jira/browse/HADOOP-5572
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Owen O'Malley
> Assignee: Ravi Gummadi
> Attachments: HADOOP-5572.patch, HADOOP-5572.v1.1.patch,
> HADOOP-5572.v1.2.patch, HADOOP-5572.v1.patch
>
>
> Currently, the final spill and sort doesn't record any progress while it
> runs, leading to the perception that the map is done, but "stuck".
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.