[
https://issues.apache.org/jira/browse/TEZ-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775320#comment-17775320
]
Ayush Saxena commented on TEZ-4451:
-----------------------------------
Sounds cool, I checked the Hadoop code, it does log the FileSystem statistics
on every FS close, based on a config. I was thinking we could have logged the
entire {{TezChild}} statistics somewhere here:
[https://github.com/apache/tez/blob/master/tez-runtime-internals/src/main/java/org/apache/tez/runtime/task/TezChild.java#L319]
It does {{{}FileSystem.closeAllForUGI(childUGI);{}}}, So, an aggregated
statistics per {{TezChild}} could have been logged. But I think there isn't
anything like that, unless I coin a util at Tez, which gets all the instances &
then do an aggregate, I think FS doesn't expose all FS per ugi either....
The stream one I think I can get it done. This can be put in
{{TaskRunner2Callable}} in finally block.
{code:java}
String ioStats =
IOStatisticsContext.getCurrentIOStatisticsContext().snapshot().toString();
if (StringUtils.isNotEmpty(ioStats)) {
LOG.info("TaskAttemptId={}, IOStatistics={}", task.getTaskAttemptID(),
ioStats);
}
{code}
Maybe cover it with a conf, if it is costly. I will take this up in a few days
unless anyone else wants to take it.
> ThreadLevel IO Stats Support for TEZ
> ------------------------------------
>
> Key: TEZ-4451
> URL: https://issues.apache.org/jira/browse/TEZ-4451
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Harshit Gupta
> Priority: Major
>
> Dump IO Statistics for each of the tasks in the log.
> This will requires upgrading Tez to use Hadoop-3.3.9-SNAPSHOT
>
> cc: [~rbalamohan] [~abstractdog] [~mthakur]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)