[ https://issues.apache.org/jira/browse/FLINK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kostas Kloudas updated FLINK-18352: ----------------------------------- Fix Version/s: 1.12.0 1.10.2 1.11.0 > org.apache.flink.core.execution.DefaultExecutorServiceLoader not thread safe > ---------------------------------------------------------------------------- > > Key: FLINK-18352 > URL: https://issues.apache.org/jira/browse/FLINK-18352 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission > Affects Versions: 1.10.0 > Reporter: Marcos Klein > Assignee: Kostas Kloudas > Priority: Major > Labels: pull-request-available > Fix For: 1.11.0, 1.10.2, 1.12.0 > > > The singleton nature of the > *org.apache.flink.core.execution.DefaultExecutorServiceLoader* class is not > thread-safe due to the fact that *java.util.ServiceLoader* class is not > thread-safe. > [https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html#Concurrency] > > This can result in *ServiceLoader* class entering into an inconsistent state > for processes which attempt to self-heal. This then requires bouncing the > process/container in the hopes the race condition does not re-occur. > [https://stackoverflow.com/questions/60391499/apache-flink-cannot-find-compatible-factory-for-specified-execution-target-lo] > > Additionally the following stack traces have been seen when using a > *org.apache.flink.streaming.api.environment.RemoteStreamEnvironment* > instances. > {code:java} > java.lang.ArrayIndexOutOfBoundsException: 2 > at sun.misc.CompoundEnumeration.nextElement(CompoundEnumeration.java:61) > at > java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357) > at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393) > at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474) > at > org.apache.flink.core.execution.DefaultExecutorServiceLoader.getExecutorFactory(DefaultExecutorServiceLoader.java:60) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1724) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1706) > {code} > > {code:java} > java.util.NoSuchElementException: null > at sun.misc.CompoundEnumeration.nextElement(CompoundEnumeration.java:59) > at > java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357) > at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393) > at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474) > at > org.apache.flink.core.execution.DefaultExecutorServiceLoader.getExecutorFactory(DefaultExecutorServiceLoader.java:60) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1724) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1706) > {code} > The workaround for using the ***StreamExecutionEnvironment* implementations > is to write a custom implementation of *DefaultExecutorServiceLoader* which > is thread-safe and pass that to the *StreamExecutionEnvironment* constructors. -- This message was sent by Atlassian Jira (v8.3.4#803005)