[ 
https://issues.apache.org/jira/browse/TEZ-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775320#comment-17775320
 ] 

Ayush Saxena commented on TEZ-4451:
-----------------------------------

Sounds cool, I checked the Hadoop code, it does log the FileSystem statistics 
on every FS close, based on a config. I was thinking we could have logged the 
entire {{TezChild}} statistics somewhere here:

[https://github.com/apache/tez/blob/master/tez-runtime-internals/src/main/java/org/apache/tez/runtime/task/TezChild.java#L319]

It does {{{}FileSystem.closeAllForUGI(childUGI);{}}}, So, an aggregated 
statistics per {{TezChild}} could have been logged. But I think there isn't 
anything like that, unless I coin a util at Tez, which gets all the instances & 
then do an aggregate, I think FS doesn't expose all FS per ugi either....

The stream one I think I can get it done. This can be put in 
{{TaskRunner2Callable}} in finally block.
{code:java}
      String ioStats = 
IOStatisticsContext.getCurrentIOStatisticsContext().snapshot().toString();
      if (StringUtils.isNotEmpty(ioStats)) {
        LOG.info("TaskAttemptId={}, IOStatistics={}", task.getTaskAttemptID(), 
ioStats);
      }
{code}
Maybe cover it with a conf, if it is costly. I will take this up in a few days 
unless anyone else wants to take it.

> ThreadLevel IO Stats Support for TEZ
> ------------------------------------
>
>                 Key: TEZ-4451
>                 URL: https://issues.apache.org/jira/browse/TEZ-4451
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Harshit Gupta
>            Priority: Major
>
> Dump IO Statistics for each of the tasks in the log.
> This will requires upgrading Tez to use Hadoop-3.3.9-SNAPSHOT
>  
> cc: [~rbalamohan] [~abstractdog] [~mthakur] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to