[ https://issues.apache.org/jira/browse/FLINK-33588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tongtong Zhu updated FLINK-33588: --------------------------------- Flags: Patch,Important Language: java Description: When the Flink task is first started, the checkpoint data is null due to the lack of data, and Percentile throws a null pointer exception when calculating the percentage. After multiple tests, I found that it is necessary to set an initial value for the statistical data value of the checkpoint when the checkpoint data is null (i.e. at the beginning of the task) to solve this problem. The following is an abnormal description of the bug: 2023-09-13 15:02:54,608 ERROR org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler [] - Unhandled exception. org.apache.commons.math3.exception.NullArgumentException: input array at org.apache.commons.math3.util.MathArrays.verifyValues(MathArrays.java:1650) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at org.apache.commons.math3.stat.descriptive.AbstractUnivariateStatistic.test(AbstractUnivariateStatistic.java:158) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:272) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:241) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics$CommonMetricsSnapshot.getPercentile(DescriptiveStatisticsHistogramStatistics.java:159) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics.getQuantile(DescriptiveStatisticsHistogramStatistics.java:53) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at org.apache.flink.runtime.checkpoint.StatsSummarySnapshot.getQuantile(StatsSummarySnapshot.java:108) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at org.apache.flink.runtime.rest.messages.checkpoints.StatsSummaryDto.valueOf(StatsSummaryDto.java:81) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.createCheckpointingStatistics(CheckpointingStatisticsHandler.java:129) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:84) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:58) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at org.apache.flink.runtime.rest.handler.job.AbstractAccessExecutionGraphHandler.handleRequest(AbstractAccessExecutionGraphHandler.java:68) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$0(AbstractExecutionGraphHandler.java:87) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) [?:1.8.0_151] at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) [?:1.8.0_151] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) [?:1.8.0_151] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_151] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_151] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_151] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151] was:When the Flink task is first started, the checkpoint data is null due to the lack of data, and Percentile throws a null pointer exception when calculating the percentage. After multiple tests, I found that it is necessary to set an initial value for the statistical data value of the checkpoint when the checkpoint data is null (i.e. at the beginning of the task) to solve this problem. > Fix Flink Checkpointing Statistics Bug > -------------------------------------- > > Key: FLINK-33588 > URL: https://issues.apache.org/jira/browse/FLINK-33588 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.14.5, 1.16.0, 1.17.0, 1.15.2, 1.14.6, 1.18.0, 1.17.1 > Reporter: Tongtong Zhu > Priority: Major > Fix For: 1.19.0, 1.18.1 > > > When the Flink task is first started, the checkpoint data is null due to the > lack of data, and Percentile throws a null pointer exception when calculating > the percentage. After multiple tests, I found that it is necessary to set an > initial value for the statistical data value of the checkpoint when the > checkpoint data is null (i.e. at the beginning of the task) to solve this > problem. > The following is an abnormal description of the bug: > 2023-09-13 15:02:54,608 ERROR > org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler > [] - Unhandled exception. > org.apache.commons.math3.exception.NullArgumentException: input array > at > org.apache.commons.math3.util.MathArrays.verifyValues(MathArrays.java:1650) > ~[flink-dist_2.12-1.14.5.jar:1.14.5] > at > org.apache.commons.math3.stat.descriptive.AbstractUnivariateStatistic.test(AbstractUnivariateStatistic.java:158) > ~[flink-dist_2.12-1.14.5.jar:1.14.5] > at > org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:272) > ~[flink-dist_2.12-1.14.5.jar:1.14.5] > at > org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:241) > ~[flink-dist_2.12-1.14.5.jar:1.14.5] > at > org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics$CommonMetricsSnapshot.getPercentile(DescriptiveStatisticsHistogramStatistics.java:159) > ~[flink-dist_2.12-1.14.5.jar:1.14.5] > at > org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics.getQuantile(DescriptiveStatisticsHistogramStatistics.java:53) > ~[flink-dist_2.12-1.14.5.jar:1.14.5] > at > org.apache.flink.runtime.checkpoint.StatsSummarySnapshot.getQuantile(StatsSummarySnapshot.java:108) > ~[flink-dist_2.12-1.14.5.jar:1.14.5] > at > org.apache.flink.runtime.rest.messages.checkpoints.StatsSummaryDto.valueOf(StatsSummaryDto.java:81) > ~[flink-dist_2.12-1.14.5.jar:1.14.5] > at > org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.createCheckpointingStatistics(CheckpointingStatisticsHandler.java:129) > ~[flink-dist_2.12-1.14.5.jar:1.14.5] > at > org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:84) > ~[flink-dist_2.12-1.14.5.jar:1.14.5] > at > org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:58) > ~[flink-dist_2.12-1.14.5.jar:1.14.5] > at > org.apache.flink.runtime.rest.handler.job.AbstractAccessExecutionGraphHandler.handleRequest(AbstractAccessExecutionGraphHandler.java:68) > ~[flink-dist_2.12-1.14.5.jar:1.14.5] > at > org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$0(AbstractExecutionGraphHandler.java:87) > ~[flink-dist_2.12-1.14.5.jar:1.14.5] > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) > [?:1.8.0_151] > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) > [?:1.8.0_151] > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > [?:1.8.0_151] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [?:1.8.0_151] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_151] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > [?:1.8.0_151] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > [?:1.8.0_151] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_151] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_151] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151] -- This message was sent by Atlassian Jira (v8.20.10#820010)