[jira] [Commented] (IMPALA-8544) Expose additional S3A / S3Guard metrics

Steve Loughran (JIRA) Tue, 21 May 2019 03:54:16 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-8544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844719#comment-16844719
 ]


Steve Loughran commented on IMPALA-8544:
----------------------------------------

The trouble with thread-local tracking is that operations span threads, e.g. 
writes up are done block-by-block in the thread pool, rename/copy will soon do 
the same, etc. This is why the current statistics underreport, while the 
aggregate value overreports on a per-query basis (see the aggregate stats in a 
_SUCCESS file for a spark query as an example)

w.r.t exposing our stream statistics method, -1 as is  

# it removes the option for us to change that data structure.
# the fields are all non-atomic, non-volatile values so that the cost of 
incrementing them is ~0. If things are being collected, that may change.
# if people start wrapping/proxying streams, it ceases to be valid.

if you want to have some per-input stream statistics, it'd be better to have 
something which all input streams can implement, so HDFS, ABFS, etc can also 
implement. I'll take suggestions as to what is the best design here, given that 
goal of keeping incrementing the statistics low cost.

As usual, anything which goes near the filesystem APIs will need spec updates, 
tests for all the stores to implement etc. We need that to stop us breaking 
your code later.

The other thing to consider is passing more of a stats context down to 
read/write/copy/commit operations so that the work done across threads can be 
tied back to the final operation. e.g. every async write would update some 
counters which in outputstream.close() would be merged back into the classic 
per-thread counters.

+[~DanielZhou] for his thoughts on ABFS stats gathering

> Expose additional S3A / S3Guard metrics
> ---------------------------------------
>
>                 Key: IMPALA-8544
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8544
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>              Labels: s3
>
> S3A / S3Guard internally collects several useful metrics that we should 
> consider exposing to Impala users. The full list of statistics can be found 
> in {{o.a.h.fs.s3a.Statistic}}. The stats include: the number of S3 operations 
> performed (put, get, etc.), invocation counts for various {{FileSystem}} 
> methods, stream statistics (bytes read, written, etc.), etc.
> Some interesting stats that stand out:
>  * "stream_aborted": "Count of times the TCP stream was aborted" - the number 
> of TCP connection aborts, a high value would indicate performance issues
>  * "stream_read_exceptions" : "Number of exceptions invoked on input streams" 
> - incremented whenever an {{IOException}} is caught while reading (these 
> exception don't always get propagated to Impala because they trigger a retry)
>  * "store_io_throttled": "Requests throttled and retried" - looks like it 
> tracks the number of times the fs retries an operation because the original 
> request hit a throttling exception
>  * "s3guard_metadatastore_retry": "S3Guard metadata store retry events" - 
> looks like it tracks the number of times the fs retries S3Guard operations
>  * "s3guard_metadatastore_throttled" : "S3Guard metadata store throttled 
> events" - similar to "store_io_throttled" but looks like it is specific to 
> S3Guard
> We should consider how to expose these metrics via Impala logs / runtime 
> profiles.
> There are a few options:
>  * {{S3AFileSystem}} exposes {{StorageStatistics}} specific to S3A / S3Guard 
> via the {{FileSystem#getStorageStatistics}} method; the 
> {{S3AStorageStatistics}} seems to include all the S3A / S3Guard metrics, 
> however, I think the stats might be aggregated globally, which would make it 
> hard to create per-query specific metrics
>  * {{S3AInstrumentation}} exposes all the metrics as well, and looks like it 
> is per-fs instance, so it is not aggregated globally; {{S3AInstrumentation}} 
> extends {{o.a.h.metrics2.MetricsSource}} so perhaps it is exposed via some 
> API (haven't looked into this yet)
>  * {{S3AInputStream#toString}} dumps the statistics from 
> {{o.a.h.fs.s3a.S3AInstrumentation.InputStreamStatistics}} and 
> {{S3AFileSystem#toString}} dumps them all as well
>  * {{S3AFileSystem}} updates the stats in 
> {{o.a.h.fs.Statistics.StatisticsData}} as well (e.g. bytesRead, bytesWritten, 
> etc.)
> Impala has a {{hdfs-fs-cache}} as well, so {{hdfsFs}} objects get shared 
> across threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8544) Expose additional S3A / S3Guard metrics

Reply via email to