[ 
https://issues.apache.org/jira/browse/FLINK-35764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-35764.
----------------------------------
    Fix Version/s: 2.0.0
                   1.19.2
                   1.20.1
       Resolution: Fixed

Thanks for reporting and fixing the issue [~auroflow].

Merged to master as 4fae64cb011
Merged to release-1.20 as c4dd2a6550a
Merged to release-1.19 as 98f1349bb4b

> TimerGauge is incorrect when update is called during a measurement
> ------------------------------------------------------------------
>
>                 Key: FLINK-35764
>                 URL: https://issues.apache.org/jira/browse/FLINK-35764
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Metrics
>    Affects Versions: 1.15.0, 1.16.0, 1.17.0, 1.18.0, 1.19.0, 1.20.0
>            Reporter: Liu Liu
>            Assignee: Liu Liu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0, 1.19.2, 1.20.1
>
>
> *Description*
> Currently in {{{}TimerGauge{}}}, the timer measures the time spent in a state 
> (marked by {{markStart()}} and {{{}markEnd(){}}}). The current logic is as 
> follows:
>  - {{markStart()}} bumps {{currentMeasurementStartTS}} and 
> {{{}currentUpdateTS{}}}, while {{update()}} bumps {{currentUpdateTS}}
>  - {{currentCount}} stores the time in the state within this update interval
>  - When calling {{{}markEnd(){}}}, the time since 
> {{currentMeasurementStartTS}} is added to the {{currentCount}}
>  - When calling {{{}update(){}}}, the time since {{currentUpdateTS}} is added 
> to the {{{}currentCount{}}}, and the {{currentCount}} is used to update the 
> gauge value
> The intent is that a state can span across two update intervals (by calling 
> {{{}markStart() -> update() -> markEnd(){}}}). However, the results will be 
> incorrect, since the time between {{markStart()}} and {{update()}} will be 
> counted in the first update interval by {{{}update(){}}}, and then in the 
> second update interval by {{{}markEnd(){}}}, so the measurement will be 
> larger than the correct value. The correct solution is to only add the time 
> since {{currentUpdateTS}} to {{currentCount}} in {{{}markEnd(){}}}.
> *Test case*
> {{@Test  }}
> {{void testUpdateBeforeMarkingEnd() {  }}
> {{    ManualClock clock = new ManualClock(42_000_000);}}
> {{    // this timer gauge measures 2 update intervals}}
> {{    TimerGauge gauge = new TimerGauge(clock, 2 * 
> View.UPDATE_INTERVAL_SECONDS);}}
> {{    long UPDATE_INTERVAL_MILLIS = 
> TimeUnit.SECONDS.toMillis(View.UPDATE_INTERVAL_SECONDS);  // 5000 ms}}
> {{    long SLEEP = 10;  // 10 ms}}
> {{    // interval 1}}
> {{    clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, 
> TimeUnit.MILLISECONDS);  // *(1)}}
> {{    gauge.markStart();  }}
> {{    clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);  }}
> {{    gauge.update();}}
> {{    // interval 2}}
> {{    clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);  }}
> {{    gauge.markEnd();}}
> {{    clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, 
> TimeUnit.MILLISECONDS);  // *(1)}}
> {{    gauge.update();  }}
> {{    // expected: 2, actual: 3}}
> {{    assertThat(gauge.getValue()).isEqualTo(SLEEP / 
> View.UPDATE_INTERVAL_SECONDS);  }}
> {{}}}
> The current test cases in {{TimerGaugeTest}} do not catch this bug, because 
> the assert condition is (conveniently) {{{}isGreaterThanOrEqualTo{}}}, and 
> the code does not simulate the time passed outside the state ({{{}*(1){}}} in 
> the code above).
> *Proposed changes*
>  * In {{{}TimerGauge{}}}, only add the time since {{currentUpdateTS}} to 
> {{currentCount}} in {{markEnd()}}
>  * Add the test case above to {{{}TimerGaugeTest{}}}, and adjust other test 
> cases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to