[ https://issues.apache.org/jira/browse/FLINK-35764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Piotr Nowojski closed FLINK-35764. ---------------------------------- Fix Version/s: 2.0.0 1.19.2 1.20.1 Resolution: Fixed Thanks for reporting and fixing the issue [~auroflow]. Merged to master as 4fae64cb011 Merged to release-1.20 as c4dd2a6550a Merged to release-1.19 as 98f1349bb4b > TimerGauge is incorrect when update is called during a measurement > ------------------------------------------------------------------ > > Key: FLINK-35764 > URL: https://issues.apache.org/jira/browse/FLINK-35764 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics > Affects Versions: 1.15.0, 1.16.0, 1.17.0, 1.18.0, 1.19.0, 1.20.0 > Reporter: Liu Liu > Assignee: Liu Liu > Priority: Major > Labels: pull-request-available > Fix For: 2.0.0, 1.19.2, 1.20.1 > > > *Description* > Currently in {{{}TimerGauge{}}}, the timer measures the time spent in a state > (marked by {{markStart()}} and {{{}markEnd(){}}}). The current logic is as > follows: > - {{markStart()}} bumps {{currentMeasurementStartTS}} and > {{{}currentUpdateTS{}}}, while {{update()}} bumps {{currentUpdateTS}} > - {{currentCount}} stores the time in the state within this update interval > - When calling {{{}markEnd(){}}}, the time since > {{currentMeasurementStartTS}} is added to the {{currentCount}} > - When calling {{{}update(){}}}, the time since {{currentUpdateTS}} is added > to the {{{}currentCount{}}}, and the {{currentCount}} is used to update the > gauge value > The intent is that a state can span across two update intervals (by calling > {{{}markStart() -> update() -> markEnd(){}}}). However, the results will be > incorrect, since the time between {{markStart()}} and {{update()}} will be > counted in the first update interval by {{{}update(){}}}, and then in the > second update interval by {{{}markEnd(){}}}, so the measurement will be > larger than the correct value. The correct solution is to only add the time > since {{currentUpdateTS}} to {{currentCount}} in {{{}markEnd(){}}}. > *Test case* > {{@Test }} > {{void testUpdateBeforeMarkingEnd() { }} > {{ ManualClock clock = new ManualClock(42_000_000);}} > {{ // this timer gauge measures 2 update intervals}} > {{ TimerGauge gauge = new TimerGauge(clock, 2 * > View.UPDATE_INTERVAL_SECONDS);}} > {{ long UPDATE_INTERVAL_MILLIS = > TimeUnit.SECONDS.toMillis(View.UPDATE_INTERVAL_SECONDS); // 5000 ms}} > {{ long SLEEP = 10; // 10 ms}} > {{ // interval 1}} > {{ clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, > TimeUnit.MILLISECONDS); // *(1)}} > {{ gauge.markStart(); }} > {{ clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS); }} > {{ gauge.update();}} > {{ // interval 2}} > {{ clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS); }} > {{ gauge.markEnd();}} > {{ clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, > TimeUnit.MILLISECONDS); // *(1)}} > {{ gauge.update(); }} > {{ // expected: 2, actual: 3}} > {{ assertThat(gauge.getValue()).isEqualTo(SLEEP / > View.UPDATE_INTERVAL_SECONDS); }} > {{}}} > The current test cases in {{TimerGaugeTest}} do not catch this bug, because > the assert condition is (conveniently) {{{}isGreaterThanOrEqualTo{}}}, and > the code does not simulate the time passed outside the state ({{{}*(1){}}} in > the code above). > *Proposed changes* > * In {{{}TimerGauge{}}}, only add the time since {{currentUpdateTS}} to > {{currentCount}} in {{markEnd()}} > * Add the test case above to {{{}TimerGaugeTest{}}}, and adjust other test > cases -- This message was sent by Atlassian Jira (v8.20.10#820010)