[ 
https://issues.apache.org/jira/browse/FLINK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostas Kloudas updated FLINK-18352:
-----------------------------------
    Component/s: Client / Job Submission

> org.apache.flink.core.execution.DefaultExecutorServiceLoader not thread safe
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-18352
>                 URL: https://issues.apache.org/jira/browse/FLINK-18352
>             Project: Flink
>          Issue Type: Bug
>          Components: Client / Job Submission
>    Affects Versions: 1.10.0
>            Reporter: Marcos Klein
>            Assignee: Kostas Kloudas
>            Priority: Major
>              Labels: pull-request-available
>
> The singleton nature of the  
> *org.apache.flink.core.execution.DefaultExecutorServiceLoader* class is not 
> thread-safe due to the fact that *java.util.ServiceLoader* class is not 
> thread-safe.
> [https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html#Concurrency]
>  
> This can result in *ServiceLoader* class entering into an inconsistent state 
> for processes which attempt to self-heal. This then requires bouncing the 
> process/container in the hopes the race condition does not re-occur.
> [https://stackoverflow.com/questions/60391499/apache-flink-cannot-find-compatible-factory-for-specified-execution-target-lo]
>  
> Additionally the following stack traces have been seen when using a 
> *org.apache.flink.streaming.api.environment.RemoteStreamEnvironment* 
> instances.
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: 2
>     at sun.misc.CompoundEnumeration.nextElement(CompoundEnumeration.java:61)
>     at 
> java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357)
>     at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
>     at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
>     at 
> org.apache.flink.core.execution.DefaultExecutorServiceLoader.getExecutorFactory(DefaultExecutorServiceLoader.java:60)
>     at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1724)
>     at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1706)
> {code}
>  
> {code:java}
> java.util.NoSuchElementException: null
>     at sun.misc.CompoundEnumeration.nextElement(CompoundEnumeration.java:59)
>     at 
> java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357)
>     at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
>     at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
>     at 
> org.apache.flink.core.execution.DefaultExecutorServiceLoader.getExecutorFactory(DefaultExecutorServiceLoader.java:60)
>     at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1724)
>     at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1706)
> {code}
> The workaround for using the ***StreamExecutionEnvironment* implementations 
> is to write a custom implementation of *DefaultExecutorServiceLoader* which 
> is thread-safe and pass that to the *StreamExecutionEnvironment* constructors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to