[
https://issues.apache.org/jira/browse/TEZ-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811931#comment-17811931
]
Steve Loughran commented on TEZ-4451:
-------------------------------------
write of 204 bytes to an output stream. nothing else collected. has the output
stream been closed? were any input streams used?
stat names are intended to be shared across the different stores
* org.apache.hadoop.fs.statistics.StoreStatisticNames : store-level stats,
including fs api "op_" operations
* org.apache.hadoop.fs.statistics.StreamStatisticNames stream stats; javadocs
should expnaiun
* full set of s3a stats: org.apache.hadoop.fs.s3a.Statistic ; includes
description and type.
types
* counters: aggregation is "add"
* gauges: can go up and down; don't really aggregate well but adding is the one
used
* minimums: minimum value for something (usually a duration, data read/write
size). Aggregation: smallest wins
* maximums: max value for something (usually a duration, data read/write size).
Aggregation: largest wins
* means: average value...usually duration. aggregation: make the new average of
the combined values
A lot of stats collected are durations, which have a counter, min, max and
mean; these are on each call and aggregated into the current set. lets you see
things like the mean time for an operation, but also when there's a very slow
outlier.
input/output streams only update their filesystems and context in close(),
rather than on every read/write call. s3a input stream also updates on
unbuffer(), so apps like impala can get updated values while still keeping the
stream open.
{code}
bin/hadoop fs -ls $BUCKET
Found 1 items
drwxrwxrwx - stevel stevel 0 2024-01-29 15:05
s3a://stevel-london/user/stevel/target
2024-01-29 15:05:15,265 [shutdown-hook-0] INFO statistics.IOStatisticsLogging
(IOStatisticsLogging.java:logIOStatisticsAtLevel(269)) - IOStatistics:
counters=((action_http_head_request=1)
(audit_request_execution=3)
(audit_span_creation=4)
(object_list_request=2)
(object_metadata_request=1)
(op_get_file_status=1)
(op_glob_status=1)
(op_list_status=1)
(store_io_request=3));
gauges=();
minimums=((action_http_head_request.min=812)
(object_list_request.min=119)
(op_get_file_status.min=943)
(op_glob_status.min=958)
(op_list_status.min=163));
maximums=((action_http_head_request.max=812)
(object_list_request.max=154)
(op_get_file_status.max=943)
(op_glob_status.max=958)
(op_list_status.max=163));
means=((action_http_head_request.mean=(samples=1, sum=812, mean=812.0000))
(object_list_request.mean=(samples=2, sum=273, mean=136.5000))
(op_get_file_status.mean=(samples=1, sum=943, mean=943.0000))
(op_glob_status.mean=(samples=1, sum=958, mean=958.0000))
(op_list_status.mean=(samples=1, sum=163, mean=163.0000)));
{code}
> ThreadLevel IO Stats Support for TEZ
> ------------------------------------
>
> Key: TEZ-4451
> URL: https://issues.apache.org/jira/browse/TEZ-4451
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Harshit Gupta
> Assignee: Ayush Saxena
> Priority: Major
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Dump IO Statistics for each of the tasks in the log.
> This will requires upgrading Tez to use Hadoop-3.3.9-SNAPSHOT
>
> cc: [~rbalamohan] [~abstractdog] [~mthakur]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)