[ 
https://issues.apache.org/jira/browse/HADOOP-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825927#comment-15825927
 ] 

Steve Loughran commented on HADOOP-13453:
-----------------------------------------

They're going to have to go into that file because those are the metrics 
published by the S3A filesystem when deployed, returned by S3AStorageStatistics 
in a call to {{S3AFileSystem.getStorageStatistics(), and printed in 
{{S3AFileSystem.toString()}}. We could choose whether to add the specific 
metrics to every S3a FS instance; that's something to consider. Listing the 
values but returning 0 for all gauges and counters is the most consistent.

Don't worry about the class length: if you look at it in detail, there's two 
nested classes + support methods explicitly for output/input streams...you 
don't need to go there. The rest of the code is fairly simple

# add new values to org.apache.hadoop.fs.s3a.Statistic; prefix {{s3guard_}}
# In {{S3AInstrumentation}}, add counters to the array {{COUNTERS_TO_CREATE}}; 
gauges to {{GAUGES_TO_CREATE}}
# Pass in an instance of the instrumentation down to S3Guard
# have the code call incrementCounter and increment/decrementGauge as 
appropriate
# I'd like a simple counter of {{s3guard_enabled}} and 
{{s3guard_authoritative}}, which will be 0 when there's no s3guard running, 1 
when the respective booleans are up. Why? Remote visibility

You make a good point, "where are the tests?". The answer is: the metrics can 
be used to test the internal state of the S3 classes, therefore become 
implicitly tested there. 

Take a look at {{ITestS3ADirectoryPerformance}} for a key example of this: our 
test cases use the counters of the various HTTP operations as the means to 
verify that API calls work as expected. (note that s3guard, by reducing these, 
has complicated the tests)

That is, you verify the counters work by asserting that they change as you make 
operations to the DFS. see: 
http://steveloughran.blogspot.co.uk/2016/04/distributed-testing-making-use-of.html
 for more of my thinking here

bq. Sorry for the basic question, i'm really new for work on Hadoop code base.

happy to explain my reasoning. We've all started off staring at a vast amount 
of code that we don't understand; there are still big bits of Hadoop that I 
don't go near.

> S3Guard: Instrument new functionality with Hadoop metrics.
> ----------------------------------------------------------
>
>                 Key: HADOOP-13453
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13453
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Ai Deng
>
> Provide Hadoop metrics showing operational details of the S3Guard 
> implementation.
> The metrics will be implemented in this ticket:
> ● S3GuardRechecksNthPercentileLatency (MutableQuantiles) ­​ Percentile time 
> spent
> in rechecks attempting to achieve consistency. Repeated for multiple 
> percentile values
> of N.  This metric is an indicator of the additional latency cost of running 
> S3A with
> S3Guard.
> ● S3GuardRechecksNumOps (MutableQuantiles) ­​ Number of times a consistency
> recheck was required while attempting to achieve consistency.
> ● S3GuardStoreNthPercentileLatency (MutableQuantiles) ­​ Percentile time 
> spent in
> operations against the consistent store, including both write operations 
> during file system
> mutations and read operations during file system consistency checks. Repeated 
> for
> multiple percentile values of N. This metric is an indicator of latency to 
> the consistent
> store implementation.
> ● S3GuardConsistencyStoreNumOps (MutableQuantiles) ­​ Number of operations
> against the consistent store, including both write operations during file 
> system mutations
> and read operations during file system consistency checks.
> ● S3GuardConsistencyStoreFailures (MutableCounterLong) ­​ Number of failures
> during operations against the consistent store implementation.
> ● S3GuardConsistencyStoreTimeouts (MutableCounterLong) ­​ Number of timeouts
> during operations against the consistent store implementation.
> ● S3GuardInconsistencies (MutableCounterLong) ­ C​ ount of times S3Guard 
> failed to
> achieve consistency, even after exhausting all rechecks. A high count may 
> indicate
> unexpected out­of­band modification of the S3 bucket contents, such as by an 
> external
> tool that does not make corresponding updates to the consistent store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to