Steve Loughran created HADOOP-18190: ---------------------------------------
Summary: s3a prefetching streams to collect iostats on prefetching operations Key: HADOOP-18190 URL: https://issues.apache.org/jira/browse/HADOOP-18190 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran There is a lot more happening in reads, so lot of more to collect and publish in IO stats for us to view in a summary at the end of processes as well as get from the stream while it is active Some useful ones would seem to be counters * is in memory. using 0 or 1 here lets aggregation reports count total #of memory cached files. * prefetching operations executed * errors during prefetching gauges * number of blocks in cache * total size of blocks * active prefetches + active memory used duration tracking count/min/max/ave * time to fetch a block * time queued before the actual fetch begins * time a reader is blocked waiting for a block fetch to complete and some info on cache use itself * number of blocks discarded unread * number of prefetched blocks later used * number of backward seeks to a prefetched block * number of forward seeks to a prefetched block the key ones I care about are # memory consumption # can we determine if cache is working (reads with cache hit) and when it is not (misses, wasted prefetches) # time blocked on executors The stats need to be accessible on a stream even when closed, and aggregated into the FS. once we get per-thread stats contexts we can publish there too and collect in worker threads for reporting in task commits -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org