[ 
https://issues.apache.org/jira/browse/GOBBLIN-1598?focusedWorklogId=713997&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713997
 ]

ASF GitHub Bot logged work on GOBBLIN-1598:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Jan/22 20:12
            Start Date: 24/Jan/22 20:12
    Worklog Time Spent: 10m 
      Work Description: phet commented on a change in pull request #3454:
URL: https://github.com/apache/gobblin/pull/3454#discussion_r791097576



##########
File path: 
gobblin-metrics-libs/gobblin-metrics-base/src/test/java/org/apache/gobblin/metrics/metric/filter/MetricNameRegexFilterTest.java
##########
@@ -42,5 +42,11 @@ public void matchesTest() {
     Assert.assertTrue(metricNameRegexFilter2.matches("test1", 
mock(Metric.class)));
     Assert.assertFalse(metricNameRegexFilter2.matches("test2", 
mock(Metric.class)));
     Assert.assertFalse(metricNameRegexFilter2.matches("test3", 
mock(Metric.class)));
+    MetricNameRegexFilter metricNameRegexFilter3 = new 
MetricNameRegexFilter("GobblinService\\..*\\.RunningStatus");

Review comment:
       I don't want to go for overkill, yet I'd hope to validate the connection 
between this regex hard-coded here and what's constructed in the impl:
   ```
   ServiceMetricNames.GOBBLIN_SERVICE_PREFIX + "\\..*\\." + 
ServiceMetricNames.RUNNING_STATUS
   ```
   maybe add a static method there to constructs the regex and then have the 
test call it to set `metricNameRegexFilter3`?
   

##########
File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java
##########
@@ -399,6 +400,7 @@ public synchronized void setActive(boolean active) {
       } else { //Mark the DagManager inactive.
         log.info("Inactivating the DagManager. Shutting down all DagManager 
threads");
         this.scheduledExecutorPool.shutdown();
+        RootMetricContext.get().removeMatching(new 
MetricNameRegexFilter(ServiceMetricNames.GOBBLIN_SERVICE_PREFIX + "\\..*\\." + 
ServiceMetricNames.RUNNING_STATUS));

Review comment:
       let's add a comment here about background info (since a bit complex how 
several pieces conspire to create the issue)... something like:
   
   the DMThread's metrics mappings follow the lifecycle of the DMThread itself 
and so are lost by DM deactivation-reactivation.  the `RootMetricContext` OTOH 
is a (persistent) singleton.  to avoid XYZ exception by the RMC preventing 
(re-)add of a metric already known, remove all metrics that a new DMThread 
thread would attempt to add (in `DagManagerThread::initialize`) whenever 
running post-re-enablement




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 713997)
    Time Spent: 40m  (was: 0.5h)

> Fix metrics already exist issue in dag manager
> ----------------------------------------------
>
>                 Key: GOBBLIN-1598
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1598
>             Project: Apache Gobblin
>          Issue Type: Bug
>            Reporter: Zihan Li
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code:java}
> 2021/06/12 07:11:32.067 ERROR [DagManager] [pool-19-thread-3] 
> [gobblin-service-war] [] Exception encountered in 
> org.apache.gobblin.service.modules.orchestration.DagManager$DagManagerThread
> java.lang.IllegalArgumentException: A metric named 
> GobblinService.testGroup.testFlow.RunningStatus already exists
>  at 
> org.apache.gobblin.metrics.InnerMetricContext.register(InnerMetricContext.java:266)
>  
>  at org.apache.gobblin.metrics.MetricContext.register(MetricContext.java:409) 
>  at 
> org.apache.gobblin.service.modules.orchestration.DagManager$DagManagerThread.initialize(DagManager.java:661)
>  
>  at 
> org.apache.gobblin.service.modules.orchestration.DagManager$DagManagerThread.run(DagManager.java:493)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_172]
>  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [?:1.8.0_172]
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [?:1.8.0_172]
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [?:1.8.0_172]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_172]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_172]
>  at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
> {code}
> Likely root cause:
> If we activate/deactivate DagManager several times, DagManagerThreads 
> alongside with their flowGauges maps will be recreated, but RootMetricContext 
> will still stay the same. So the newly created threads will incorrectly think 
> that the flow was not registered previously, and try to register it again.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to