[jira] [Updated] (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure
[ https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-901: Release Note: Efficient implementation of MapReduce framework counters. Move Framework Counters into a TaskMetric structure --- Key: MAPREDUCE-901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.21.0 Reporter: Owen O'Malley Assignee: Luke Lu Fix For: 0.23.0 Attachments: 901_1.patch, 901_1.patch, FrameworkCounterGroup.java, MAPREDUCE-901.patch, MAPREDUCE-901.patch, MAPREDUCE-901.patch, MAPREDUCE-901.patch_2, mr-901-trunk-v1.patch I think we should move all of the Counters that the framework updates into a single class called TaskMetrics. TaskMetrics would have specific fields for each of the metrics like input records, input bytes, output records, etc. It would both reduce the serialized size of the heartbeats (by shrinking the Counters down to just the user's counters) and decrease the latency for updates to the JobTracker (since Counters are sent at most 1/minute instead of 1/heartbeat). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure
[ https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-901: Attachment: MAPREDUCE-901.patch_2 Updated patch to fix a couple of unit tests, which now pass. I think with MAPREDUCE-279, the counter limits are less interesting given that they do not affect other jobs - unlike MRv1 where this would affect the JobTracker. For now, I propose we commit this and re-visit Counter limits via a follow-on (blocker) jira. Thoughts? Move Framework Counters into a TaskMetric structure --- Key: MAPREDUCE-901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.21.0 Reporter: Owen O'Malley Assignee: Luke Lu Fix For: 0.23.0 Attachments: 901_1.patch, 901_1.patch, FrameworkCounterGroup.java, MAPREDUCE-901.patch, MAPREDUCE-901.patch, MAPREDUCE-901.patch, MAPREDUCE-901.patch_2, mr-901-trunk-v1.patch I think we should move all of the Counters that the framework updates into a single class called TaskMetrics. TaskMetrics would have specific fields for each of the metrics like input records, input bytes, output records, etc. It would both reduce the serialized size of the heartbeats (by shrinking the Counters down to just the user's counters) and decrease the latency for updates to the JobTracker (since Counters are sent at most 1/minute instead of 1/heartbeat). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure
[ https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-901: Resolution: Fixed Status: Resolved (was: Patch Available) Thanks Tom. I just committed this. Thanks Luke! Move Framework Counters into a TaskMetric structure --- Key: MAPREDUCE-901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.21.0 Reporter: Owen O'Malley Assignee: Luke Lu Fix For: 0.23.0 Attachments: 901_1.patch, 901_1.patch, FrameworkCounterGroup.java, MAPREDUCE-901.patch, MAPREDUCE-901.patch, MAPREDUCE-901.patch, MAPREDUCE-901.patch_2, mr-901-trunk-v1.patch I think we should move all of the Counters that the framework updates into a single class called TaskMetrics. TaskMetrics would have specific fields for each of the metrics like input records, input bytes, output records, etc. It would both reduce the serialized size of the heartbeats (by shrinking the Counters down to just the user's counters) and decrease the latency for updates to the JobTracker (since Counters are sent at most 1/minute instead of 1/heartbeat). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure
[ https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-901: Attachment: MAPREDUCE-901.patch Patch ported from y-merge branch for ensuring we can merge MAPREDUCE-901 to trunk. Credit, of course, goes to Luke. Move Framework Counters into a TaskMetric structure --- Key: MAPREDUCE-901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.21.0 Reporter: Owen O'Malley Assignee: Luke Lu Attachments: 901_1.patch, 901_1.patch, FrameworkCounterGroup.java, MAPREDUCE-901.patch, MAPREDUCE-901.patch, MAPREDUCE-901.patch, mr-901-trunk-v1.patch I think we should move all of the Counters that the framework updates into a single class called TaskMetrics. TaskMetrics would have specific fields for each of the metrics like input records, input bytes, output records, etc. It would both reduce the serialized size of the heartbeats (by shrinking the Counters down to just the user's counters) and decrease the latency for updates to the JobTracker (since Counters are sent at most 1/minute instead of 1/heartbeat). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure
[ https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-901: Fix Version/s: 0.23.0 Status: Patch Available (was: Open) Move Framework Counters into a TaskMetric structure --- Key: MAPREDUCE-901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.21.0 Reporter: Owen O'Malley Assignee: Luke Lu Fix For: 0.23.0 Attachments: 901_1.patch, 901_1.patch, FrameworkCounterGroup.java, MAPREDUCE-901.patch, MAPREDUCE-901.patch, MAPREDUCE-901.patch, mr-901-trunk-v1.patch I think we should move all of the Counters that the framework updates into a single class called TaskMetrics. TaskMetrics would have specific fields for each of the metrics like input records, input bytes, output records, etc. It would both reduce the serialized size of the heartbeats (by shrinking the Counters down to just the user's counters) and decrease the latency for updates to the JobTracker (since Counters are sent at most 1/minute instead of 1/heartbeat). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure
[ https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Lu updated MAPREDUCE-901: -- Attachment: mr-901-trunk-v1.patch v1 patch passes all existing counters tests, probably good enough to bother hudson. This is patch is basically a refactor of the counters framework. Blame java generics for the syntax :) Highlight of the patch: * No (intentional) user API changes. * Make mapred.Counters.Counter (legacy) and mapreduce.Counter (new) abstract ** the existing constructors are already package private, needed to fix one test to use proper API instead of 'new' * Make mapred.Counters.Group (legacy) and mapreduce.CounterGroup interface ** the existing constructors are already package private, needed to fix EventReader to use proper API instead of 'new' * Implement AbstractCounters and generic counter group in mapreduce.AbstractCounterGroup and framework counter group in mapreduce.FrameworkCounterGroup ** Framework group counters have efficient in memory (fixed arrays of long and light weight counter facade) and serialized ((vint ordinal, vint value)* tuples) representations. Framework counters can be easily declared in CounterGroupFactory. * mapred.Counters and mapreduce.Counters contain the delta of AbstractCounters and use mixins to adapt to different interface ** Much of the generics dancing is to support type safe Iterable in both legacy and new Counters interface. Move Framework Counters into a TaskMetric structure --- Key: MAPREDUCE-901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.21.0 Reporter: Owen O'Malley Assignee: Luke Lu Attachments: 901_1.patch, 901_1.patch, FrameworkCounterGroup.java, MAPREDUCE-901.patch, MAPREDUCE-901.patch, mr-901-trunk-v1.patch I think we should move all of the Counters that the framework updates into a single class called TaskMetrics. TaskMetrics would have specific fields for each of the metrics like input records, input bytes, output records, etc. It would both reduce the serialized size of the heartbeats (by shrinking the Counters down to just the user's counters) and decrease the latency for updates to the JobTracker (since Counters are sent at most 1/minute instead of 1/heartbeat). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure
[ https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Lu updated MAPREDUCE-901: -- Attachment: FrameworkCounterGroup.java Looking through the patches, it seems that the existing approach requires sweeping changes to internal apis across pretty much all mapreduce components. I'm proposing a new approach to this issue: refactor the counter framework (while keeping the existing api) so that it's possible to have different implementation of counter groups, one of which is the FrameworkCounterGroup (see attached file.) This way, we can achieve the benefit of more efficient implementation for framework counters without changing the client code. The problem is a little complicated due to need to support of both old and new counters interface (both marked public and stable.) I'm working on a more complete patch to minimize the code duplication between new and old counter code. But I'm confident that the resulting code will be more concise and general (to support any future framework counters in different groups.) than the existing approach. Move Framework Counters into a TaskMetric structure --- Key: MAPREDUCE-901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.21.0 Reporter: Owen O'Malley Assignee: Luke Lu Attachments: 901_1.patch, 901_1.patch, FrameworkCounterGroup.java, MAPREDUCE-901.patch, MAPREDUCE-901.patch I think we should move all of the Counters that the framework updates into a single class called TaskMetrics. TaskMetrics would have specific fields for each of the metrics like input records, input bytes, output records, etc. It would both reduce the serialized size of the heartbeats (by shrinking the Counters down to just the user's counters) and decrease the latency for updates to the JobTracker (since Counters are sent at most 1/minute instead of 1/heartbeat). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure
[ https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-901: Attachment: MAPREDUCE-901.patch Preliminary patch while I'm blocked by MAPREDUCE-917. Move Framework Counters into a TaskMetric structure --- Key: MAPREDUCE-901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.21.0 Reporter: Owen O'Malley Assignee: Arun C Murthy Fix For: 0.21.0 Attachments: 901_1.patch, 901_1.patch, MAPREDUCE-901.patch I think we should move all of the Counters that the framework updates into a single class called TaskMetrics. TaskMetrics would have specific fields for each of the metrics like input records, input bytes, output records, etc. It would both reduce the serialized size of the heartbeats (by shrinking the Counters down to just the user's counters) and decrease the latency for updates to the JobTracker (since Counters are sent at most 1/minute instead of 1/heartbeat). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure
[ https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-901: -- Attachment: 901_1.patch Attaching a patch for review. I am still testing the patch. Also, a little bit of cleanup is required especially w.r.t to naming variables/fields in the classes. I will do that in a follow up patch. Some points on the approach: 1) Defined a class TaskMetrics that has methods for updating the counters defined in o.a.h.mapreduce.TaskCounter.java. It also provides a utility method to update framework Counters that aren't defined in TaskCounter.java. Examples of such counters are the counters that the framework defines in the countergroup FileSystemCounters. For the TaskCounter counters, the RPC is optimized. For the framework counters like the FileSystemCounters, RPC uses the Counters serialization. 2) The above is serialized out as part of TaskStatus object in the heartbeats. 3) In TaskInProgress.java, the TIP's Counters is updated with the above counters obtained in the heartbeat. Would really appreciate a review on this one. And yes, this looks like a good thing to have for the jiras MAPREDUCE-220 and MAPREDUCE-718. Move Framework Counters into a TaskMetric structure --- Key: MAPREDUCE-901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.21.0 Reporter: Owen O'Malley Assignee: Devaraj Das Fix For: 0.21.0 Attachments: 901_1.patch I think we should move all of the Counters that the framework updates into a single class called TaskMetrics. TaskMetrics would have specific fields for each of the metrics like input records, input bytes, output records, etc. It would both reduce the serialized size of the heartbeats (by shrinking the Counters down to just the user's counters) and decrease the latency for updates to the JobTracker (since Counters are sent at most 1/minute instead of 1/heartbeat). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure
[ https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-901: -- Attachment: 901_1.patch That was my bad. *sigh* Attached is the correct patch. The TaskMetrics has a Counters field but that's mostly to take care of counters that are related to the FileSystemCounters which depends on the FileSystem in use, etc. Move Framework Counters into a TaskMetric structure --- Key: MAPREDUCE-901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.21.0 Reporter: Owen O'Malley Assignee: Devaraj Das Fix For: 0.21.0 Attachments: 901_1.patch, 901_1.patch I think we should move all of the Counters that the framework updates into a single class called TaskMetrics. TaskMetrics would have specific fields for each of the metrics like input records, input bytes, output records, etc. It would both reduce the serialized size of the heartbeats (by shrinking the Counters down to just the user's counters) and decrease the latency for updates to the JobTracker (since Counters are sent at most 1/minute instead of 1/heartbeat). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.