[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571628#comment-14571628
 ] 

Jason Lowe commented on MAPREDUCE-6354:
---------------------------------------

Thanks for updating the patch, Chang.  Sorry for taking so long to respond.

Now that I thought about it further there's still an issue where existing users 
who have custom log4j.properties and do not update to setup the new 
ShuffleHandler.audit logger will get shuffle connections logged by default.  
Therefore it may make more sense to log connections at the debug or trace level 
rather than info, so users will have to go out of their way to enable them if 
they want to see them.  When making this change we should still update 
log4j.properties in the source, but instead of setting anything in there we 
should just put a commented-out directive to set the ShuffleHandler.audit 
logger to debug/trace with a comment above saying it can be uncommented to 
enable logging of shuffle connections.

Also in the interest of making the logging message more efficient, we probably 
only need to log the job ID and the reducer number.  The main purpose of 
logging this information is to track which reducers from which jobs are 
connecting to the NM at a particular time to help narrow down jobs that are 
spamming NMs.  The list of map IDs is probably not that useful in that context. 
 If we really want to know that arguably we should also be logging the data 
size being sent and when it completes, but that can be too much logging 
information.  Maybe we can log the job and reducer number at the debug level 
and the map ID and data size at the trace level?  The latter should not log 
when the futures are setup but rather when they are executed, as we don't want 
to log that we are sending data to a reducer until we are actually sending it.  
A bit tricky to orchestrate but doable, and we could defer the detailed trace 
logging to another JIRA if desired.

> shuffle handler should log connection info
> ------------------------------------------
>
>                 Key: MAPREDUCE-6354
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6354
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Chang Li
>            Assignee: Chang Li
>         Attachments: MAPREDUCE-6354.2.patch, MAPREDUCE-6354.3.patch, 
> MAPREDUCE-6354.4.patch, MAPREDUCE-6354.5.patch, MAPREDUCE-6354.6.patch, 
> MAPREDUCE-6354.patch
>
>
> currently, shuffle handler only log connection info in debug mode, we want to 
> log that info in a more concise way



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to