[ https://issues.apache.org/jira/browse/MAPREDUCE-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571628#comment-14571628 ]
Jason Lowe commented on MAPREDUCE-6354: --------------------------------------- Thanks for updating the patch, Chang. Sorry for taking so long to respond. Now that I thought about it further there's still an issue where existing users who have custom log4j.properties and do not update to setup the new ShuffleHandler.audit logger will get shuffle connections logged by default. Therefore it may make more sense to log connections at the debug or trace level rather than info, so users will have to go out of their way to enable them if they want to see them. When making this change we should still update log4j.properties in the source, but instead of setting anything in there we should just put a commented-out directive to set the ShuffleHandler.audit logger to debug/trace with a comment above saying it can be uncommented to enable logging of shuffle connections. Also in the interest of making the logging message more efficient, we probably only need to log the job ID and the reducer number. The main purpose of logging this information is to track which reducers from which jobs are connecting to the NM at a particular time to help narrow down jobs that are spamming NMs. The list of map IDs is probably not that useful in that context. If we really want to know that arguably we should also be logging the data size being sent and when it completes, but that can be too much logging information. Maybe we can log the job and reducer number at the debug level and the map ID and data size at the trace level? The latter should not log when the futures are setup but rather when they are executed, as we don't want to log that we are sending data to a reducer until we are actually sending it. A bit tricky to orchestrate but doable, and we could defer the detailed trace logging to another JIRA if desired. > shuffle handler should log connection info > ------------------------------------------ > > Key: MAPREDUCE-6354 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6354 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Chang Li > Assignee: Chang Li > Attachments: MAPREDUCE-6354.2.patch, MAPREDUCE-6354.3.patch, > MAPREDUCE-6354.4.patch, MAPREDUCE-6354.5.patch, MAPREDUCE-6354.6.patch, > MAPREDUCE-6354.patch > > > currently, shuffle handler only log connection info in debug mode, we want to > log that info in a more concise way -- This message was sent by Atlassian JIRA (v6.3.4#6332)