[ 
https://issues.apache.org/jira/browse/HADOOP-19920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089029#comment-18089029
 ] 

ASF GitHub Bot commented on HADOOP-19920:
-----------------------------------------

pan3793 opened a new pull request, #8549:
URL: https://github.com/apache/hadoop/pull/8549

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   
   Fix flaky tests like 
https://github.com/kokonguyen191/hadoop/actions/runs/27467848524/job/81194100665
   
   ```
   Error:  Errors: 
   Error:  org.apache.hadoop.metrics2.sink.TestPrometheusMetricsSink.testPublish
   Error:    Run 1: TestPrometheusMetricsSink.testPublish:60 » Metrics Metrics 
source TestMetrics already exists!
   Error:    Run 2: TestPrometheusMetricsSink.testPublish:60 » Metrics Metrics 
source TestMetrics already exists!
   Error:    Run 3: TestPrometheusMetricsSink.testPublish:60 » Metrics Metrics 
source TestMetrics already exists!
   [INFO] 
   Error:  
org.apache.hadoop.metrics2.sink.TestPrometheusMetricsSink.testPublishFlush
   Error:    Run 1: TestPrometheusMetricsSink.testPublishFlush:159 The first 
metric should not exist after flushing ==> expected: <false> but was: <true>
   Error:    Run 2: TestPrometheusMetricsSink.testPublishFlush:137 » Metrics 
Metrics source TestMetrics already exists!
   Error:    Run 3: TestPrometheusMetricsSink.testPublishFlush:137 » Metrics 
Metrics source TestMetrics already exists!
   [INFO] 
   Error:  
org.apache.hadoop.metrics2.sink.TestPrometheusMetricsSink.testPublishMultiple
   Error:    Run 1: TestPrometheusMetricsSink.testPublishMultiple:112 The 
expected first metric line is missing from prometheus metrics output ==> 
expected: <true> but was: <false>
   Error:    Run 2: TestPrometheusMetricsSink.testPublishMultiple:95 » Metrics 
Metrics source TestMetrics1 already exists!
   Error:    Run 3: TestPrometheusMetricsSink.testPublishMultiple:95 » Metrics 
Metrics source TestMetrics1 already exists!
   [INFO] 
   [INFO] 
   Error:  Tests run: 5385, Failures: 0, Errors: 3, Skipped: 201
   ```
   
   The root cause is the refCount-based global singleton:
   
   - `DefaultMetricsSystem.instance()` returns a JVM-global singleton shared by 
all tests in hadoop-common.
   - `init()` short-circuits (returns early without incrementing refCount) if 
monitoring is already `true`.
   - `shutdown()` only clears `allSources`/`allSinks` when refCount hits 0.
   
   So if any test (this class's own methods, or any other metrics test in the 
module) throws before its inline `stop()`/`shutdown()`, the singleton stays 
`monitoring=true` with leaked sources. Subsequent tests' `init()` becomes a 
no-op, sources never get cleared, and you get cascading "Metrics source 
TestMetrics already exists!" plus stale-data assertion failures — exactly what 
the CI shows (including on surefire reruns).
   
   ### How was this patch tested?
   
   Pass GHA.
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(HADOOP-19920)?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   ### AI Tooling
   
   Contains content generated by Claude Opus 4.8




> Fix flaky TestPrometheusMetricsSink
> -----------------------------------
>
>                 Key: HADOOP-19920
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19920
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: test
>            Reporter: Cheng Pan
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to