[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926072#action_12926072 ] Hudson commented on MAPREDUCE-1881: --- Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See [https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/]) Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881-v3.patch, mapreduce-1881-v4.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900840#action_12900840 ] Matei Zaharia commented on MAPREDUCE-1881: -- Thanks, Arun! Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881-v3.patch, mapreduce-1881-v4.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898992#action_12898992 ] Luke Lu commented on MAPREDUCE-1881: One nit for the test code: could have used mock (we have mockito in trunk) to avoid manually writing a instrumentation class for verification. I also suggest that you refactor the instrumentation object creation code into a static factory method so that you can unit test the expected behavior as well. The code looks fine otherwise, if the above concerns (especially the latter) are addressed and hudson finds no related issues. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881-v3.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899031#action_12899031 ] Matei Zaharia commented on MAPREDUCE-1881: -- By instrumentation creation code, do you mean the one in TaskTracker? I can do that. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881-v3.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899035#action_12899035 ] Luke Lu commented on MAPREDUCE-1881: Yes, I meant something like: {{static TaskTrackerInstrumentation createInstrumenation(JobConf conf);}} in the TaskTracker class or {{static TaskTrackerInstrumentation create(JobConf conf);}} in the TaskTrackerInstrumentation class Though I prefer the latter, either way is fine, as long as you write a unit test for it. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881-v3.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899205#action_12899205 ] Luke Lu commented on MAPREDUCE-1881: v4 looks fine to me. Thanks Matei. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881-v3.patch, mapreduce-1881-v4.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898787#action_12898787 ] Matei Zaharia commented on MAPREDUCE-1881: -- BTW, the test failures in the previous Hudson output seem to be unrelated to this patch. Let me know if it looks good to commit with the new additions. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881-v3.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898264#action_12898264 ] Hadoop QA commented on MAPREDUCE-1881: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12451711/mapreduce-1881-v2b.patch against trunk revision 984707. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/607/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/607/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/607/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/607/console This message is automatically generated. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897178#action_12897178 ] Luke Lu commented on MAPREDUCE-1881: I have no issue with the statusUpdate method. I got where you're coming from :) But I question many users will want to do the same thing. I'm curious about many useful instrumentation classes being written. Adding features (especially redundant ones), IMO, doesn't necessarily make Hadoop better but rather bloated and harder to maintain. You know, perfection is attained not when no more can be added, but when no more can be removed. Another thing about the patch is that if the instrumentation class is specified as an empty string, it silently defaults to the composite class with a empty list (essentially a noop instrumentation), which is a behavior change from the existing behavior: an exception would be thrown. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897334#action_12897334 ] Philip Zeyliger commented on MAPREDUCE-1881: I'll chime in that I'm using the instrumentation classes and find them a useful way to listen to some events that are otherwise hard to get at. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897348#action_12897348 ] Arun C Murthy commented on MAPREDUCE-1881: -- I'm trying to understand the proposal... please help me. Currently you can define multiple 'sinks' for the same data via CompositeContext. Thus you can define multiple listeners and each will get the same data, is that sufficient for this use case? Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897367#action_12897367 ] Luke Lu commented on MAPREDUCE-1881: The instrumentation class is related to but not dependent on metrics frameworks. Some of the events are actually not collected in the regular metrics, so there is an expert level config property mapreduce.tasktracker.instrumentation to specify a subclass for TaskTrackerInstrumentation which contains all the overridable callbacks. The default value for the property is the TaskTrackerMetricsInst class which currently implements the Updater interface to collect tasktracker metrics in the mapred metrics context. Similarly for metrics v2, TaskTrackerMetricsSource would be the default. Matei and others want to use the overridable instrumentation property to hook in other listeners, for things that're not strictly metrics related, like statusUpdate, which is useful for his project which does two-level scheduling :) He can achieve this with the addition of the statusUpdate method in TaskTrackerInstrumentation. To make adding more instrumentation classes (while preserving the existing instrumentation like metrics) slightly easier (IMO, a user defined composite class is just as easy), he wants to make the property a list of classes so that the events are fired for each instances of the specified classes. The latter part of the patch would add a composite instrumentation class that dispatches all the events to all the instances of the specified instrumentation classes. Currently the patch lacks unit tests for the composite class. I can see problems down the road maintaining the class, like making sure it doesn't block in one of the classes that can potentially do RPCs etc and properly handle exceptions in the delegate objects. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897442#action_12897442 ] Luke Lu commented on MAPREDUCE-1881: The jobtracker and tasktracker instrumentation is introduced in HADOOP-3772, which contains more background info. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897047#action_12897047 ] Luke Lu commented on MAPREDUCE-1881: Having to route through a specific implementation of composite object could lead to situations that user cannot override without changing library code. Currently, we can measure the overhead of instrumentation by comparing with a noop instrumentation. Forcing it through the composite object incurs overhead of a loop construct and doubles the amount of method calls, which may or may not be acceptable given a user application (it's not you or I who should decide whether it's acceptable or not.) IMO, you don't even need the composite class in official hadoop source to support multiple listeners, which adds minor convenience as well as maintenance burden to Hadoop developers. The user instrumentation feature is supposedly only for experts who knows how to write a more complex instrumentation class than a trivial composite class. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897051#action_12897051 ] Luke Lu commented on MAPREDUCE-1881: BTW, the v2b patch is looks fine besides the necessity question. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897114#action_12897114 ] Matei Zaharia commented on MAPREDUCE-1881: -- By necessity, do you mean why should Hadoop provide this feature rather than letting users implement it themselves? The answer is pretty simple -- since many users will want to do the same thing, it makes sense to put it into the platform instead of asking them all to reinvent it. The goal of the JIRA process is not to minimize changes to Hadoop, it's to make Hadoop better. One can imagine many useful instrumentation classes being written that people will combine (already, lots of people are using the default metrics one). I actually opened this issue because I'm working on a project where I want to programmatically launch a TaskTracker with an extra instrumentation class on top of the ones the user configured in mapred-site.xml. I could do it by setting the parameter to a composite class, and then passing it the old parameter, but it felt more natural to add support for multiple instrumentation objects and just append to the user's list. I care more about the second part of the issue (statusUpdate callback) though, because my project can't work at all without that. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896663#action_12896663 ] Luke Lu commented on MAPREDUCE-1881: Having to be routed via the default composite object is really what I object. There are many reasons why it's a bad idea. However there is a simple fix to your patch: if there is only one class specified in the config, use it as the top level class instead of a delegate in the default composite class. This way, user (arguably expert :), who want to mess with tasktracker implementation) convenience is preserved and the default composite class implementation is overridable. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896769#action_12896769 ] Matei Zaharia commented on MAPREDUCE-1881: -- Sure, I can do that. I think the biggest risk with the composite object is adding a method to TaskTrackerInstrumentation that we forget to add in the composite. While this is bad, it would arguably get noticed faster if the composite object is used by default than if it isn't. Any other opinions on this? Any other reasons to avoid it? I really see the composite object as not much different than having a for loop at every call site. Supporting multiple instrumentation objects is clearly useful (the same way that most classes in the JDK with events support multiple listeners). The question then is how to do it. Since the instrumentation interface isn't designed in such a way that an implementation knows who called it (i.e. can tell whether it went through a composite object), it seems OK to use a composite object to route calls to a list of implementations. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896181#action_12896181 ] Matei Zaharia commented on MAPREDUCE-1881: -- The point of this change is to allow the user to specify a comma-separated list of classes in the job.tracker.instrumentation field instead of a single class. Asking users to specify the composite class, and then go set another property somewhere else, is needlessly inconvenient. What is the problem with the current approach? If the user only specifies one instrumentation class (as they do today), only that one class will be used, and the behavior will be exactly the same as today (except that calls get routed through the composite object first). If the user lists multiple classes, multiple classes will be used. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895845#action_12895845 ] Luke Lu commented on MAPREDUCE-1881: The problem of the v2 patch is that the composite instrumentation class is always used and not pluggable. IMO, I would not change the semantics of job.tracker.instrumentation to a list a classes. You can add a composite instrumentation class that looks for job.tracker.instrumentation.composite.classes (or something like that.) BTW, although the composite class is a convenience, users wanting the feature already can implement this feature without changing the the hadoop code. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895073#action_12895073 ] Matei Zaharia commented on MAPREDUCE-1881: -- Sounds good, I will add a composite class then. I used for loops because other listener systems in Hadoop, such as the JobTrackerListener, use them as well. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894602#action_12894602 ] Luke Lu commented on MAPREDUCE-1881: The main point of having an instrumentation class is to hide the implementation details behind the instrumentation interface. Using for loops in the instrumentation client code is really jarring. I'd recommend implementing a composite instrumentation class, if you want send events to multiple instrumentation implementations. That way client code is not changed and you can implement more advanced logic (like only send events to a subset of instrumentation objects based on some rules.) without changing client code. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892410#action_12892410 ] Scott Chen commented on MAPREDUCE-1881: --- +1 The patch looks good to me. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880594#action_12880594 ] Matei Zaharia commented on MAPREDUCE-1881: -- One other suggestion: A statusUpdate callback should be added to TaskTrackerInstrumentation to let it know when a task changes state (e.g. from RUNNING to COMMIT_PENDING). If this is done, then there's probably no need to modify reportTaskLaunch and reportTaskEnd (which means no changes to existing clients). Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Priority: Minor The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.