[ 
https://issues.apache.org/jira/browse/FLINK-20284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806739#comment-17806739
 ] 

Thai Luong commented on FLINK-20284:
------------------------------------

Dear [~hxbks2ks] , [~dian.fu] ,

 

This issue still happens while using FlinkRunner 1.16 and Flink 1.16.3.

 

I would like to share my `docker-compose` for further investigation.

 

```
version: '3.8'
services:
jobmanager:
image:flink:1.16.3-java11
networks:
-flink-network
ports:
-"8081:8081"
command:jobmanager
environment:
-|
FLINK_PROPERTIES=jobmanager.rpc.address: jobmanager
jobmanager.memory.process.size: 2g
-BEAM_WORKER_POOL_IN_DOCKER_VM=1
-DOCKER_MAC_CONTAINER=1

taskmanager:
image:flink:1.16.3-java11
networks:
-flink-network
ports:
-"8100-8200:8100-8200"
depends_on:
-jobmanager
-beamworker
command:taskmanager
scale:1
environment:
-|
FLINK_PROPERTIES=jobmanager.rpc.address: jobmanager
taskmanager.numberOfTaskSlots: 1
taskmanager.memory.process.size: 2g
-BEAM_WORKER_POOL_IN_DOCKER_VM=1
-DOCKER_MAC_CONTAINER=1

beamworker:
image:apache/beam_python3.11_sdk:latest
networks:
-flink-network
entrypoint:/opt/apache/beam/boot
command:--worker_pool
volumes:
-./data/sample.txt:/data/sample.txt
-./results:/results


networks:
flink-network:
name:flink-network
```

> Error happens in TaskExecutor when closing JobMaster connection if there was 
> a python UDF
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-20284
>                 URL: https://issues.apache.org/jira/browse/FLINK-20284
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Python
>    Affects Versions: 1.11.0, 1.12.0
>            Reporter: Zhu Zhu
>            Assignee: Huang Xingbo
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.11.3, 1.12.0
>
>
> When a TaskExecutor successfully finished running a python UDF task and 
> disconnecting from JobMaster, errors below will happen. This error, however, 
> seems not affect job execution at the moment.
> {code:java}
> 2020-11-20 17:05:21,932 INFO  
> org.apache.beam.runners.fnexecution.logging.GrpcLoggingService [] - 1 Beam Fn 
> Logging clients still connected during shutdown.
> 2020-11-20 17:05:21,938 WARN  
> org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer        [] - Hanged up 
> for unknown endpoint.
> 2020-11-20 17:05:22,126 INFO  org.apache.flink.runtime.taskmanager.Task       
>              [] - Source: Custom Source -> select: (f0) -> select: 
> (add_one(f0) AS a) -> to: Tuple2 -> Sink: Streaming select table sink (1/1)#0 
> (b0c2104dd8f87bb1caf0c83586c22a51) switched from RUNNING to FINISHED.
> 2020-11-20 17:05:22,126 INFO  org.apache.flink.runtime.taskmanager.Task       
>              [] - Freeing task resources for Source: Custom Source -> select: 
> (f0) -> select: (add_one(f0) AS a) -> to: Tuple2 -> Sink: Streaming select 
> table sink (1/1)#0 (b0c2104dd8f87bb1caf0c83586c22a51).
> 2020-11-20 17:05:22,128 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - 
> Un-registering task and sending final execution state FINISHED to JobManager 
> for task Source: Custom Source -> select: (f0) -> select: (add_one(f0) AS a) 
> -> to: Tuple2 -> Sink: Streaming select table sink (1/1)#0 
> b0c2104dd8f87bb1caf0c83586c22a51.
> 2020-11-20 17:05:22,156 INFO  
> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot 
> TaskSlot(index:0, state:ACTIVE, resource profile: 
> ResourceProfile{cpuCores=1.0000000000000000, taskHeapMemory=384.000mb 
> (402653174 bytes), taskOffHeapMemory=0 bytes, managedMemory=512.000mb 
> (536870920 bytes), networkMemory=128.000mb (134217730 bytes)}, allocationId: 
> b67c3307dcf93757adfb4f0f9f7b8c7b, jobId: d05f32162f38ec3ec813c4621bc106d9).
> 2020-11-20 17:05:22,157 INFO  
> org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Remove job 
> d05f32162f38ec3ec813c4621bc106d9 from job leader monitoring.
> 2020-11-20 17:05:22,157 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Close 
> JobManager connection for job d05f32162f38ec3ec813c4621bc106d9.
> 2020-11-20 17:05:23,064 ERROR 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.rejectedExecution
>  [] - Failed to submit a listener notification task. Event loop shut down?
> java.lang.NoClassDefFoundError: 
> org/apache/beam/vendor/grpc/v1p26p0/io/netty/util/concurrent/GlobalEventExecutor$2
>         at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.GlobalEventExecutor.startThread(GlobalEventExecutor.java:227)
>  
> ~[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
>         at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.GlobalEventExecutor.execute(GlobalEventExecutor.java:215)
>  
> ~[blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
>         at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:841)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
>         at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:498)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
>         at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
>         at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
>         at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:96)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
>         at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1089)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
>         at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
>         at 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  
> [blob_p-bd7a5d615512eb8a2e856e7c1630a0c22fca7cf3-ff27946fda7e2b8cb24ea56d505b689e:1.12-SNAPSHOT]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.GlobalEventExecutor$2
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 
> ~[?:1.8.0_261]
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:418) 
> ~[?:1.8.0_261]
>         at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:63)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>         at 
> org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:72)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>         at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:49)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:351) 
> ~[?:1.8.0_261]
>         ... 11 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to