[jira] Updated: (MAPREDUCE-479) Add reduce ID to shuffle clienttrace
[ https://issues.apache.org/jira/browse/MAPREDUCE-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-479: Resolution: Fixed Hadoop Flags: [Incompatible change, Reviewed] Status: Resolved (was: Patch Available) +1 I committed this. Thanks Jiaqi! Add reduce ID to shuffle clienttrace Key: MAPREDUCE-479 URL: https://issues.apache.org/jira/browse/MAPREDUCE-479 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Jiaqi Tan Assignee: Jiaqi Tan Priority: Minor Fix For: 0.21.0 Attachments: HADOOP-6013.patch, MAPREDUCE-479-1.patch, MAPREDUCE-479-2.patch, MAPREDUCE-479-3.patch, MAPREDUCE-479-4.patch, MAPREDUCE-479.patch Current clienttrace messages from shuffles note only the destination map ID but not the source reduce ID. Having both source and destination ID of each shuffle enables full tracing of execution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-479) Add reduce ID to shuffle clienttrace
[ https://issues.apache.org/jira/browse/MAPREDUCE-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaqi Tan updated MAPREDUCE-479: Release Note: Adds Reduce Attempt ID to ClientTrace log messages, and adds Reduce Attempt ID to HTTP query string sent to mapOutputServlet. Extracts partition number from attempt ID. (was: Adds Reduce Attempt ID to ClientTrace log messages, and adds Reduce Attempt ID to HTTP query string sent to mapOutputServlet.) Status: Patch Available (was: Open) Did microbenchmark of shuffle durations with and without added reduce attempt ID transmission and reduce partition number extraction; shuffle times before and after this patch are statistically comparable (chi-squared test for distribution similarity of shuffle times, p-value 0.23 = null-hypothesis of statistically different distributions not rejected); thus this patch does not cause any performance impact. Add reduce ID to shuffle clienttrace Key: MAPREDUCE-479 URL: https://issues.apache.org/jira/browse/MAPREDUCE-479 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Jiaqi Tan Assignee: Jiaqi Tan Priority: Minor Fix For: 0.21.0 Attachments: HADOOP-6013.patch, MAPREDUCE-479-1.patch, MAPREDUCE-479-2.patch, MAPREDUCE-479-3.patch, MAPREDUCE-479-4.patch, MAPREDUCE-479.patch Current clienttrace messages from shuffles note only the destination map ID but not the source reduce ID. Having both source and destination ID of each shuffle enables full tracing of execution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-479) Add reduce ID to shuffle clienttrace
[ https://issues.apache.org/jira/browse/MAPREDUCE-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-479: Status: Open (was: Patch Available) bq. That would be suboptimal... it's not actually a parameter in the request and maintaining it as a necessary side-effect requires future versions to preserve it. I remain opposed to adding a string to the query to be logged on the remote side. If you want to make the case for _replacing_ the partition with the attempt ID- and extracting the partition from it on the TaskTracker side- I would be +0 on that approach. Add reduce ID to shuffle clienttrace Key: MAPREDUCE-479 URL: https://issues.apache.org/jira/browse/MAPREDUCE-479 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Jiaqi Tan Assignee: Jiaqi Tan Priority: Minor Fix For: 0.21.0 Attachments: HADOOP-6013.patch, MAPREDUCE-479-1.patch, MAPREDUCE-479-2.patch, MAPREDUCE-479-3.patch, MAPREDUCE-479.patch Current clienttrace messages from shuffles note only the destination map ID but not the source reduce ID. Having both source and destination ID of each shuffle enables full tracing of execution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-479) Add reduce ID to shuffle clienttrace
[ https://issues.apache.org/jira/browse/MAPREDUCE-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaqi Tan updated MAPREDUCE-479: Status: Open (was: Patch Available) Will submit a new patch to add reduce attempt ID to eliminate assumption that no 2 attempts will run on same host, in case the assumption breaks in post-0.20 scheduling. Add reduce ID to shuffle clienttrace Key: MAPREDUCE-479 URL: https://issues.apache.org/jira/browse/MAPREDUCE-479 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Jiaqi Tan Assignee: Jiaqi Tan Priority: Minor Fix For: 0.21.0 Attachments: HADOOP-6013.patch, MAPREDUCE-479.patch Current clienttrace messages from shuffles note only the destination map ID but not the source reduce ID. Having both source and destination ID of each shuffle enables full tracing of execution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-479) Add reduce ID to shuffle clienttrace
[ https://issues.apache.org/jira/browse/MAPREDUCE-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaqi Tan updated MAPREDUCE-479: Release Note: Adds Reduce Attempt ID to ClientTrace log messages, and adds Reduce Attempt ID to HTTP query string sent to mapOutputServlet. (was: Adds Reduce ID to ClientTrace log messages. Explicitly uses new mapreduce.JobID for compatibility with updated TaskID constructor.) Status: Patch Available (was: Open) I would prefer adding the reduce attempt ID to the HTTP query string because this eliminates the need for assuming that no two attempts of the same task can run on the same node; I can see scenarios where a custom scheduler may break this assumption and make tracing very complicated. The incremental cost in terms of additional network traffic of adding the reduce attempt ID should be minimal and much smaller than the total data shuffled in a typical job. Add reduce ID to shuffle clienttrace Key: MAPREDUCE-479 URL: https://issues.apache.org/jira/browse/MAPREDUCE-479 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Jiaqi Tan Assignee: Jiaqi Tan Priority: Minor Fix For: 0.21.0 Attachments: HADOOP-6013.patch, MAPREDUCE-479-1.patch, MAPREDUCE-479.patch Current clienttrace messages from shuffles note only the destination map ID but not the source reduce ID. Having both source and destination ID of each shuffle enables full tracing of execution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-479) Add reduce ID to shuffle clienttrace
[ https://issues.apache.org/jira/browse/MAPREDUCE-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaqi Tan updated MAPREDUCE-479: Status: Patch Available (was: Open) Add reduce ID to shuffle clienttrace Key: MAPREDUCE-479 URL: https://issues.apache.org/jira/browse/MAPREDUCE-479 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Jiaqi Tan Assignee: Jiaqi Tan Priority: Minor Fix For: 0.21.0 Attachments: HADOOP-6013.patch, MAPREDUCE-479-1.patch, MAPREDUCE-479-2.patch, MAPREDUCE-479.patch Current clienttrace messages from shuffles note only the destination map ID but not the source reduce ID. Having both source and destination ID of each shuffle enables full tracing of execution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-479) Add reduce ID to shuffle clienttrace
[ https://issues.apache.org/jira/browse/MAPREDUCE-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaqi Tan updated MAPREDUCE-479: Attachment: MAPREDUCE-479-2.patch Updated, correct patch. Add reduce ID to shuffle clienttrace Key: MAPREDUCE-479 URL: https://issues.apache.org/jira/browse/MAPREDUCE-479 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Jiaqi Tan Assignee: Jiaqi Tan Priority: Minor Fix For: 0.21.0 Attachments: HADOOP-6013.patch, MAPREDUCE-479-1.patch, MAPREDUCE-479-2.patch, MAPREDUCE-479.patch Current clienttrace messages from shuffles note only the destination map ID but not the source reduce ID. Having both source and destination ID of each shuffle enables full tracing of execution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-479) Add reduce ID to shuffle clienttrace
[ https://issues.apache.org/jira/browse/MAPREDUCE-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaqi Tan updated MAPREDUCE-479: Attachment: MAPREDUCE-479.patch Cleaned up patch for new branched tree Add reduce ID to shuffle clienttrace Key: MAPREDUCE-479 URL: https://issues.apache.org/jira/browse/MAPREDUCE-479 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Jiaqi Tan Assignee: Jiaqi Tan Priority: Minor Fix For: 0.21.0 Attachments: HADOOP-6013.patch, MAPREDUCE-479.patch Current clienttrace messages from shuffles note only the destination map ID but not the source reduce ID. Having both source and destination ID of each shuffle enables full tracing of execution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.