Marcono1234 created FLINK-32896: ----------------------------------- Summary: Incorrect `Map.computeIfAbsent(..., ...::new)` usage which misinterprets key as initial capacity Key: FLINK-32896 URL: https://issues.apache.org/jira/browse/FLINK-32896 Project: Flink Issue Type: Bug Affects Versions: 1.17.1 Reporter: Marcono1234
The are multiple cases in the code which look like this: {code} map.computeIfAbsent(..., ArrayList::new) {code} Not only does this create a new collection (here an {{ArrayList}}), but {{computeIfAbsent}} also passes the map key as argument to the mapping function, so instead of calling the no-args constructor such as {{new ArrayList<>()}}, this actually calls the constructor with {{int}} initial capacity parameter, such as {{new ArrayList<>(initialCapacity)}}. For some cases where this occurs in the Flink code I am not sure if it is intended, but in same cases this does not seem to be intended. Here are the cases I found: - [{{HsFileDataIndexImpl:163}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsFileDataIndexImpl.java#L163C70-L163C84] - [{{HsSpillingStrategy:128}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L128C68-L128C82] - [{{HsSpillingStrategy:134}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L134C63-L134C77] - [{{HsSpillingStrategy:140}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L140C63-L140C77] - [{{HsSpillingStrategy:145}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L145C70-L145C84] - [{{HsSpillingStrategy:151}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L151C65-L151C79] - [{{HsSpillingStrategy:157}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L157C65-L157C79] - [{{HsSpillingStrategyUtils:76}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategyUtils.java#L76C74-L76C88] - [{{ProducerMergedPartitionFileIndex:171}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/file/ProducerMergedPartitionFileIndex.java#L171C75-L171C89] - [{{TestingSpillingInfoProvider:201}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/hybrid/TestingSpillingInfoProvider.java#L201C56-L201C70] - [{{TestingSpillingInfoProvider:208}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/hybrid/TestingSpillingInfoProvider.java#L208C54-L208C66] - [{{TestingSpillingInfoProvider:216}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/hybrid/TestingSpillingInfoProvider.java#L216C54-L216C66] - [{{SourceCoordinatorConcurrentAttemptsTest:269}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/test/java/org/apache/flink/runtime/source/coordinator/SourceCoordinatorConcurrentAttemptsTest.java#L269C53-L269C65] This can lead either to runtime exceptions in case the map key is negative, since the collection constructors reject negative initial capacity values, or it can lead to bad performance if the key (which is misinterpreted as initial capacity) is pretty low, such as 0, or is pretty large and therefore pre-allocates a lot of memory for the collection. Regardless of whether or not these cases are intended it might be good to replace them with lambda expressions to make this more explicit: {code} map.computeIfAbsent(..., capacity -> new ArrayList<>(capacity)) {code} or {code} map.computeIfAbsent(..., k -> new ArrayList<>()) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)