Marcono1234 created FLINK-32896:
-----------------------------------

             Summary: Incorrect `Map.computeIfAbsent(..., ...::new)` usage 
which misinterprets key as initial capacity
                 Key: FLINK-32896
                 URL: https://issues.apache.org/jira/browse/FLINK-32896
             Project: Flink
          Issue Type: Bug
    Affects Versions: 1.17.1
            Reporter: Marcono1234


The are multiple cases in the code which look like this:
{code}
map.computeIfAbsent(..., ArrayList::new)
{code}

Not only does this create a new collection (here an {{ArrayList}}), but 
{{computeIfAbsent}} also passes the map key as argument to the mapping 
function, so instead of calling the no-args constructor such as {{new 
ArrayList<>()}}, this actually calls the constructor with {{int}} initial 
capacity parameter, such as {{new ArrayList<>(initialCapacity)}}.

For some cases where this occurs in the Flink code I am not sure if it is 
intended, but in same cases this does not seem to be intended. Here are the 
cases I found:

- 
[{{HsFileDataIndexImpl:163}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsFileDataIndexImpl.java#L163C70-L163C84]
- 
[{{HsSpillingStrategy:128}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L128C68-L128C82]
- 
[{{HsSpillingStrategy:134}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L134C63-L134C77]
- 
[{{HsSpillingStrategy:140}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L140C63-L140C77]
- 
[{{HsSpillingStrategy:145}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L145C70-L145C84]
- 
[{{HsSpillingStrategy:151}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L151C65-L151C79]
- 
[{{HsSpillingStrategy:157}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategy.java#L157C65-L157C79]
- 
[{{HsSpillingStrategyUtils:76}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/HsSpillingStrategyUtils.java#L76C74-L76C88]
- 
[{{ProducerMergedPartitionFileIndex:171}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/file/ProducerMergedPartitionFileIndex.java#L171C75-L171C89]
- 
[{{TestingSpillingInfoProvider:201}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/hybrid/TestingSpillingInfoProvider.java#L201C56-L201C70]
- 
[{{TestingSpillingInfoProvider:208}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/hybrid/TestingSpillingInfoProvider.java#L208C54-L208C66]
- 
[{{TestingSpillingInfoProvider:216}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/hybrid/TestingSpillingInfoProvider.java#L216C54-L216C66]
- 
[{{SourceCoordinatorConcurrentAttemptsTest:269}}|https://github.com/apache/flink/blob/9546f8243a24e7b45582b6de6702f819f1d73f97/flink-runtime/src/test/java/org/apache/flink/runtime/source/coordinator/SourceCoordinatorConcurrentAttemptsTest.java#L269C53-L269C65]

This can lead either to runtime exceptions in case the map key is negative, 
since the collection constructors reject negative initial capacity values, or 
it can lead to bad performance if the key (which is misinterpreted as initial 
capacity) is pretty low, such as 0, or is pretty large and therefore 
pre-allocates a lot of memory for the collection.


Regardless of whether or not these cases are intended it might be good to 
replace them with lambda expressions to make this more explicit:
{code}
map.computeIfAbsent(..., capacity -> new ArrayList<>(capacity))
{code}
or
{code}
map.computeIfAbsent(..., k -> new ArrayList<>())
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to