[jira] [Commented] (FLINK-29825) Improve benchmark stability

Yanfei Lei (Jira) Tue, 07 Feb 2023 02:31:20 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-29825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685205#comment-17685205
 ]


Yanfei Lei commented on FLINK-29825:
------------------------------------

[~lindong] 
Thanks for the algorithm you proposed, I wrote a 
[script|https://github.com/fredia/flink-benchmarks/blob/FLINK-29825/check_regression.py]
 to test it briefly, new algorithm shows better sensitivity than existing 
median-based method.
I did two kinds of tests:
1. For benchmarks where regression has occurred:
    a. Under the appropriate parameters, the new algorithm has higher precision 
and recall on most benchmarks.
    b. The new algorithm can find the regression faster, and the current 
algorithm needs to wait until the median window slides into the corresponding 
interval, which means that the regression may have occurred for several days.
2. For noisy benchmarks:
    a. New algorithm produces fewer false positives for most benchmarks. like 
fireProcessingTimers of Flink (Java11) and (fireProcessingTimers of Flink.
     b. For the benchmark with regression in the noise(like serializerTuple of 
Flink (Java11)), the new algorithm can also detect it, but the existing 
median-based method cannot detect it.

In my opinion, the new algorithm is very concise and efficient, it can also 
avoid the effects of distorted baselines caused by regression.

> Improve benchmark stability
> ---------------------------
>
>                 Key: FLINK-29825
>                 URL: https://issues.apache.org/jira/browse/FLINK-29825
>             Project: Flink
>          Issue Type: Improvement
>          Components: Benchmarks
>    Affects Versions: 1.17.0
>            Reporter: Yanfei Lei
>            Assignee: Yanfei Lei
>            Priority: Minor
>
> Currently, regressions are detected by a simple script which may have false 
> positives and false negatives, especially for benchmarks with small absolute 
> values, small value changes would cause large percentage changes. see 
> [here|https://github.com/apache/flink-benchmarks/blob/master/regression_report.py#L132-L136]
>  for details.
> And all benchmarks are executed on one physical machine, it might happen that 
> hardware issues affect performance, like "[FLINK-18614] Performance 
> regression 2020.07.13".
>  
> This ticket aims to improve the precision and recall of the regression-check 
> script.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-29825) Improve benchmark stability

Reply via email to