[jira] [Commented] (FLINK-15308) Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1

2020-01-22 Thread Gary Yao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021053#comment-17021053
 ] 

Gary Yao commented on FLINK-15308:
--

If affectsVersion is 1.10.0 and fixVersion is 1.10.0, I think we can remove the 
release note since there is no behavioral change compared to 1.9.

> Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1
> --
>
> Key: FLINK-15308
> URL: https://issues.apache.org/jira/browse/FLINK-15308
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.10.0
> Environment: $ git log
> commit 4b54da2c67692b1c9d43e1184c00899b0151b3ae
> Author: bowen.li 
> Date: Tue Dec 17 17:37:03 2019 -0800
>Reporter: Feng Jiajie
>Assignee: Yingjie Cao
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.10.0
>
> Attachments: image-2019-12-19-10-55-30-644.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Job worked well with default flink-conf.yaml with 
> pipelined-shuffle.compression:
> {code:java}
> taskmanager.numberOfTaskSlots: 1
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> But when I set taskmanager.numberOfTaskSlots to 4 or 6:
> {code:java}
> taskmanager.numberOfTaskSlots: 6
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> job failed:
> {code:java}
> $ bin/flink run -m yarn-cluster -p 16 -yjm 1024m -ytm 12288m 
> ~/flink-example-1.0-SNAPSHOT.jar
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/sa_cluster/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,907 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - No path for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2019-12-18 15:04:41,084 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Cluster specification: 
> ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=12288, 
> numberTaskManagers=1, slotsPerTaskManager=6}
> 2019-12-18 15:04:42,344 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Submitting application master application_1576573857638_0026
> 2019-12-18 15:04:42,370 INFO  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted 
> application application_1576573857638_0026
> 2019-12-18 15:04:42,371 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Waiting for the cluster to be allocated
> 2019-12-18 15:04:42,372 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Deploying cluster, current state ACCEPTED
> 2019-12-18 15:04:45,388 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - YARN application has been deployed successfully.
> 2019-12-18 15:04:45,390 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Found Web Interface debugboxcreate431x3.sa:36162 of 
> application 'application_1576573857638_0026'.
> Job has been submitted with JobID 9140c70769f4271cc22ea8becaa26272
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: org.apache.flink.client.program.ProgramInvocationException: 
> Job failed (JobID: 9140c70769f4271cc22ea8becaa26272)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
>   at 
> 

[jira] [Commented] (FLINK-15308) Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1

2019-12-20 Thread Yingjie Cao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000982#comment-17000982
 ] 

Yingjie Cao commented on FLINK-15308:
-

Fix via 8525c378b91c16245d2e0456d423ed39f5c9b330 on master.

Fix via b87fc76ace24c69423037e68220091cb2965ac3e on release-1.10.

> Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1
> --
>
> Key: FLINK-15308
> URL: https://issues.apache.org/jira/browse/FLINK-15308
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.10.0
> Environment: $ git log
> commit 4b54da2c67692b1c9d43e1184c00899b0151b3ae
> Author: bowen.li 
> Date: Tue Dec 17 17:37:03 2019 -0800
>Reporter: Feng Jiajie
>Assignee: Yingjie Cao
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.10.0
>
> Attachments: image-2019-12-19-10-55-30-644.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Job worked well with default flink-conf.yaml with 
> pipelined-shuffle.compression:
> {code:java}
> taskmanager.numberOfTaskSlots: 1
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> But when I set taskmanager.numberOfTaskSlots to 4 or 6:
> {code:java}
> taskmanager.numberOfTaskSlots: 6
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> job failed:
> {code:java}
> $ bin/flink run -m yarn-cluster -p 16 -yjm 1024m -ytm 12288m 
> ~/flink-example-1.0-SNAPSHOT.jar
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/sa_cluster/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,907 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - No path for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2019-12-18 15:04:41,084 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Cluster specification: 
> ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=12288, 
> numberTaskManagers=1, slotsPerTaskManager=6}
> 2019-12-18 15:04:42,344 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Submitting application master application_1576573857638_0026
> 2019-12-18 15:04:42,370 INFO  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted 
> application application_1576573857638_0026
> 2019-12-18 15:04:42,371 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Waiting for the cluster to be allocated
> 2019-12-18 15:04:42,372 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Deploying cluster, current state ACCEPTED
> 2019-12-18 15:04:45,388 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - YARN application has been deployed successfully.
> 2019-12-18 15:04:45,390 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Found Web Interface debugboxcreate431x3.sa:36162 of 
> application 'application_1576573857638_0026'.
> Job has been submitted with JobID 9140c70769f4271cc22ea8becaa26272
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: org.apache.flink.client.program.ProgramInvocationException: 
> Job failed (JobID: 9140c70769f4271cc22ea8becaa26272)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
>   at 
> 

[jira] [Commented] (FLINK-15308) Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1

2019-12-19 Thread Feng Jiajie (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999892#comment-16999892
 ] 

Feng Jiajie commented on FLINK-15308:
-

Really looking forward to it.

> Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1
> --
>
> Key: FLINK-15308
> URL: https://issues.apache.org/jira/browse/FLINK-15308
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.10.0
> Environment: $ git log
> commit 4b54da2c67692b1c9d43e1184c00899b0151b3ae
> Author: bowen.li 
> Date: Tue Dec 17 17:37:03 2019 -0800
>Reporter: Feng Jiajie
>Assignee: Yingjie Cao
>Priority: Blocker
> Fix For: 1.10.0
>
> Attachments: image-2019-12-19-10-55-30-644.png
>
>
> Job worked well with default flink-conf.yaml with 
> pipelined-shuffle.compression:
> {code:java}
> taskmanager.numberOfTaskSlots: 1
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> But when I set taskmanager.numberOfTaskSlots to 4 or 6:
> {code:java}
> taskmanager.numberOfTaskSlots: 6
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> job failed:
> {code:java}
> $ bin/flink run -m yarn-cluster -p 16 -yjm 1024m -ytm 12288m 
> ~/flink-example-1.0-SNAPSHOT.jar
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/sa_cluster/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,907 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - No path for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2019-12-18 15:04:41,084 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Cluster specification: 
> ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=12288, 
> numberTaskManagers=1, slotsPerTaskManager=6}
> 2019-12-18 15:04:42,344 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Submitting application master application_1576573857638_0026
> 2019-12-18 15:04:42,370 INFO  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted 
> application application_1576573857638_0026
> 2019-12-18 15:04:42,371 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Waiting for the cluster to be allocated
> 2019-12-18 15:04:42,372 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Deploying cluster, current state ACCEPTED
> 2019-12-18 15:04:45,388 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - YARN application has been deployed successfully.
> 2019-12-18 15:04:45,390 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Found Web Interface debugboxcreate431x3.sa:36162 of 
> application 'application_1576573857638_0026'.
> Job has been submitted with JobID 9140c70769f4271cc22ea8becaa26272
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: org.apache.flink.client.program.ProgramInvocationException: 
> Job failed (JobID: 9140c70769f4271cc22ea8becaa26272)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
>   at 
> 

[jira] [Commented] (FLINK-15308) Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1

2019-12-19 Thread Yingjie Cao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999868#comment-16999868
 ] 

Yingjie Cao commented on FLINK-15308:
-

The problem is cause by race of multi netty threads. The simplest way of fix 
the problem may be make the BufferCompressor/BufferDecompressor util thread 
safe, however it can complicate the network stack. After an offline discussion, 
we finally decide to disable data compression for pipeline mode in version 
release-1.10 and we may add the feature back if there a better solution in the 
future.

> Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1
> --
>
> Key: FLINK-15308
> URL: https://issues.apache.org/jira/browse/FLINK-15308
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.10.0
> Environment: $ git log
> commit 4b54da2c67692b1c9d43e1184c00899b0151b3ae
> Author: bowen.li 
> Date: Tue Dec 17 17:37:03 2019 -0800
>Reporter: Feng Jiajie
>Assignee: Yingjie Cao
>Priority: Blocker
> Fix For: 1.10.0
>
> Attachments: image-2019-12-19-10-55-30-644.png
>
>
> Job worked well with default flink-conf.yaml with 
> pipelined-shuffle.compression:
> {code:java}
> taskmanager.numberOfTaskSlots: 1
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> But when I set taskmanager.numberOfTaskSlots to 4 or 6:
> {code:java}
> taskmanager.numberOfTaskSlots: 6
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> job failed:
> {code:java}
> $ bin/flink run -m yarn-cluster -p 16 -yjm 1024m -ytm 12288m 
> ~/flink-example-1.0-SNAPSHOT.jar
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/sa_cluster/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,907 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - No path for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2019-12-18 15:04:41,084 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Cluster specification: 
> ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=12288, 
> numberTaskManagers=1, slotsPerTaskManager=6}
> 2019-12-18 15:04:42,344 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Submitting application master application_1576573857638_0026
> 2019-12-18 15:04:42,370 INFO  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted 
> application application_1576573857638_0026
> 2019-12-18 15:04:42,371 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Waiting for the cluster to be allocated
> 2019-12-18 15:04:42,372 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Deploying cluster, current state ACCEPTED
> 2019-12-18 15:04:45,388 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - YARN application has been deployed successfully.
> 2019-12-18 15:04:45,390 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Found Web Interface debugboxcreate431x3.sa:36162 of 
> application 'application_1576573857638_0026'.
> Job has been submitted with JobID 9140c70769f4271cc22ea8becaa26272
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: org.apache.flink.client.program.ProgramInvocationException: 
> Job failed (JobID: 9140c70769f4271cc22ea8becaa26272)
>   at 
> 

[jira] [Commented] (FLINK-15308) Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1

2019-12-18 Thread Yingjie Cao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999708#comment-16999708
 ] 

Yingjie Cao commented on FLINK-15308:
-

[~fengjiajie] I also reproduced it.

> Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1
> --
>
> Key: FLINK-15308
> URL: https://issues.apache.org/jira/browse/FLINK-15308
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.10.0
> Environment: $ git log
> commit 4b54da2c67692b1c9d43e1184c00899b0151b3ae
> Author: bowen.li 
> Date: Tue Dec 17 17:37:03 2019 -0800
>Reporter: Feng Jiajie
>Assignee: Yingjie Cao
>Priority: Blocker
> Attachments: image-2019-12-19-10-55-30-644.png
>
>
> Job worked well with default flink-conf.yaml with 
> pipelined-shuffle.compression:
> {code:java}
> taskmanager.numberOfTaskSlots: 1
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> But when I set taskmanager.numberOfTaskSlots to 4 or 6:
> {code:java}
> taskmanager.numberOfTaskSlots: 6
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> job failed:
> {code:java}
> $ bin/flink run -m yarn-cluster -p 16 -yjm 1024m -ytm 12288m 
> ~/flink-example-1.0-SNAPSHOT.jar
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/sa_cluster/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,907 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - No path for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2019-12-18 15:04:41,084 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Cluster specification: 
> ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=12288, 
> numberTaskManagers=1, slotsPerTaskManager=6}
> 2019-12-18 15:04:42,344 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Submitting application master application_1576573857638_0026
> 2019-12-18 15:04:42,370 INFO  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted 
> application application_1576573857638_0026
> 2019-12-18 15:04:42,371 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Waiting for the cluster to be allocated
> 2019-12-18 15:04:42,372 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Deploying cluster, current state ACCEPTED
> 2019-12-18 15:04:45,388 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - YARN application has been deployed successfully.
> 2019-12-18 15:04:45,390 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Found Web Interface debugboxcreate431x3.sa:36162 of 
> application 'application_1576573857638_0026'.
> Job has been submitted with JobID 9140c70769f4271cc22ea8becaa26272
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: org.apache.flink.client.program.ProgramInvocationException: 
> Job failed (JobID: 9140c70769f4271cc22ea8becaa26272)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:664)
>   at 

[jira] [Commented] (FLINK-15308) Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1

2019-12-18 Thread Feng Jiajie (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999706#comment-16999706
 ] 

Feng Jiajie commented on FLINK-15308:
-

Hi [~kevin.cyj] ,

I can reproduce the problem every time.

YARN cluster:  3 node ( 8 core 32GB )
{code:java}
$ cat flink-conf.yaml | grep -v '^#' | grep -v '^$'
jobmanager.rpc.address: localhost
jobmanager.rpc.port: 6123
jobmanager.heap.size: 1024m
taskmanager.memory.total-process.size: 1024m
taskmanager.numberOfTaskSlots: 6
parallelism.default: 1
taskmanager.network.pipelined-shuffle.compression.enabled: true
jobmanager.execution.failover-strategy: region
{code}

> Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1
> --
>
> Key: FLINK-15308
> URL: https://issues.apache.org/jira/browse/FLINK-15308
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.10.0
> Environment: $ git log
> commit 4b54da2c67692b1c9d43e1184c00899b0151b3ae
> Author: bowen.li 
> Date: Tue Dec 17 17:37:03 2019 -0800
>Reporter: Feng Jiajie
>Assignee: Yingjie Cao
>Priority: Blocker
> Attachments: image-2019-12-19-10-55-30-644.png
>
>
> Job worked well with default flink-conf.yaml with 
> pipelined-shuffle.compression:
> {code:java}
> taskmanager.numberOfTaskSlots: 1
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> But when I set taskmanager.numberOfTaskSlots to 4 or 6:
> {code:java}
> taskmanager.numberOfTaskSlots: 6
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> job failed:
> {code:java}
> $ bin/flink run -m yarn-cluster -p 16 -yjm 1024m -ytm 12288m 
> ~/flink-example-1.0-SNAPSHOT.jar
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/sa_cluster/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,907 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - No path for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2019-12-18 15:04:41,084 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Cluster specification: 
> ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=12288, 
> numberTaskManagers=1, slotsPerTaskManager=6}
> 2019-12-18 15:04:42,344 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Submitting application master application_1576573857638_0026
> 2019-12-18 15:04:42,370 INFO  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted 
> application application_1576573857638_0026
> 2019-12-18 15:04:42,371 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Waiting for the cluster to be allocated
> 2019-12-18 15:04:42,372 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Deploying cluster, current state ACCEPTED
> 2019-12-18 15:04:45,388 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - YARN application has been deployed successfully.
> 2019-12-18 15:04:45,390 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Found Web Interface debugboxcreate431x3.sa:36162 of 
> application 'application_1576573857638_0026'.
> Job has been submitted with JobID 9140c70769f4271cc22ea8becaa26272
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: org.apache.flink.client.program.ProgramInvocationException: 
> Job failed (JobID: 

[jira] [Commented] (FLINK-15308) Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1

2019-12-18 Thread Yingjie Cao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999694#comment-16999694
 ] 

Yingjie Cao commented on FLINK-15308:
-

[~fengjiajie] I can not reproduce the problem in my test environment. Is there 
any other settings?

!image-2019-12-19-10-55-30-644.png!

> Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1
> --
>
> Key: FLINK-15308
> URL: https://issues.apache.org/jira/browse/FLINK-15308
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.10.0
> Environment: $ git log
> commit 4b54da2c67692b1c9d43e1184c00899b0151b3ae
> Author: bowen.li 
> Date: Tue Dec 17 17:37:03 2019 -0800
>Reporter: Feng Jiajie
>Assignee: Yingjie Cao
>Priority: Blocker
> Attachments: image-2019-12-19-10-55-30-644.png
>
>
> Job worked well with default flink-conf.yaml with 
> pipelined-shuffle.compression:
> {code:java}
> taskmanager.numberOfTaskSlots: 1
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> But when I set taskmanager.numberOfTaskSlots to 4 or 6:
> {code:java}
> taskmanager.numberOfTaskSlots: 6
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> job failed:
> {code:java}
> $ bin/flink run -m yarn-cluster -p 16 -yjm 1024m -ytm 12288m 
> ~/flink-example-1.0-SNAPSHOT.jar
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/sa_cluster/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,907 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - No path for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2019-12-18 15:04:41,084 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Cluster specification: 
> ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=12288, 
> numberTaskManagers=1, slotsPerTaskManager=6}
> 2019-12-18 15:04:42,344 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Submitting application master application_1576573857638_0026
> 2019-12-18 15:04:42,370 INFO  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted 
> application application_1576573857638_0026
> 2019-12-18 15:04:42,371 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Waiting for the cluster to be allocated
> 2019-12-18 15:04:42,372 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Deploying cluster, current state ACCEPTED
> 2019-12-18 15:04:45,388 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - YARN application has been deployed successfully.
> 2019-12-18 15:04:45,390 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Found Web Interface debugboxcreate431x3.sa:36162 of 
> application 'application_1576573857638_0026'.
> Job has been submitted with JobID 9140c70769f4271cc22ea8becaa26272
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: org.apache.flink.client.program.ProgramInvocationException: 
> Job failed (JobID: 9140c70769f4271cc22ea8becaa26272)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
>   at 

[jira] [Commented] (FLINK-15308) Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1

2019-12-18 Thread Yingjie Cao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998934#comment-16998934
 ] 

Yingjie Cao commented on FLINK-15308:
-

[~fengjiajie] Thanks for reporting the issue and sharing the code. I'll try to 
reproduce the problem.

> Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1
> --
>
> Key: FLINK-15308
> URL: https://issues.apache.org/jira/browse/FLINK-15308
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.10.0
> Environment: $ git log
> commit 4b54da2c67692b1c9d43e1184c00899b0151b3ae
> Author: bowen.li 
> Date: Tue Dec 17 17:37:03 2019 -0800
>Reporter: Feng Jiajie
>Assignee: Yingjie Cao
>Priority: Blocker
>
> Job worked well with default flink-conf.yaml with 
> pipelined-shuffle.compression:
> {code:java}
> taskmanager.numberOfTaskSlots: 1
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> But when I set taskmanager.numberOfTaskSlots to 4 or 6:
> {code:java}
> taskmanager.numberOfTaskSlots: 6
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> job failed:
> {code:java}
> $ bin/flink run -m yarn-cluster -p 16 -yjm 1024m -ytm 12288m 
> ~/flink-example-1.0-SNAPSHOT.jar
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/sa_cluster/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,907 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - No path for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2019-12-18 15:04:41,084 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Cluster specification: 
> ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=12288, 
> numberTaskManagers=1, slotsPerTaskManager=6}
> 2019-12-18 15:04:42,344 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Submitting application master application_1576573857638_0026
> 2019-12-18 15:04:42,370 INFO  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted 
> application application_1576573857638_0026
> 2019-12-18 15:04:42,371 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Waiting for the cluster to be allocated
> 2019-12-18 15:04:42,372 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Deploying cluster, current state ACCEPTED
> 2019-12-18 15:04:45,388 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - YARN application has been deployed successfully.
> 2019-12-18 15:04:45,390 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Found Web Interface debugboxcreate431x3.sa:36162 of 
> application 'application_1576573857638_0026'.
> Job has been submitted with JobID 9140c70769f4271cc22ea8becaa26272
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: org.apache.flink.client.program.ProgramInvocationException: 
> Job failed (JobID: 9140c70769f4271cc22ea8becaa26272)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:664)
>   at 

[jira] [Commented] (FLINK-15308) Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1

2019-12-18 Thread Feng Jiajie (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998931#comment-16998931
 ] 

Feng Jiajie commented on FLINK-15308:
-

[https://github.com/fengjiajie/my-flink-test/tree/master/src/main]

run cmd:

bin/flink run -m yarn-cluster -p 16 -yjm 1024m -ytm 8192m 
~/laputa-flink-example-1.0-SNAPSHOT.jar

> Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1
> --
>
> Key: FLINK-15308
> URL: https://issues.apache.org/jira/browse/FLINK-15308
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.10.0
> Environment: $ git log
> commit 4b54da2c67692b1c9d43e1184c00899b0151b3ae
> Author: bowen.li 
> Date: Tue Dec 17 17:37:03 2019 -0800
>Reporter: Feng Jiajie
>Assignee: Yingjie Cao
>Priority: Blocker
>
> Job worked well with default flink-conf.yaml with 
> pipelined-shuffle.compression:
> {code:java}
> taskmanager.numberOfTaskSlots: 1
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> But when I set taskmanager.numberOfTaskSlots to 4 or 6:
> {code:java}
> taskmanager.numberOfTaskSlots: 6
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> job failed:
> {code:java}
> $ bin/flink run -m yarn-cluster -p 16 -yjm 1024m -ytm 12288m 
> ~/flink-example-1.0-SNAPSHOT.jar
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/sa_cluster/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,907 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - No path for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2019-12-18 15:04:41,084 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Cluster specification: 
> ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=12288, 
> numberTaskManagers=1, slotsPerTaskManager=6}
> 2019-12-18 15:04:42,344 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Submitting application master application_1576573857638_0026
> 2019-12-18 15:04:42,370 INFO  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted 
> application application_1576573857638_0026
> 2019-12-18 15:04:42,371 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Waiting for the cluster to be allocated
> 2019-12-18 15:04:42,372 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Deploying cluster, current state ACCEPTED
> 2019-12-18 15:04:45,388 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - YARN application has been deployed successfully.
> 2019-12-18 15:04:45,390 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Found Web Interface debugboxcreate431x3.sa:36162 of 
> application 'application_1576573857638_0026'.
> Job has been submitted with JobID 9140c70769f4271cc22ea8becaa26272
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: org.apache.flink.client.program.ProgramInvocationException: 
> Job failed (JobID: 9140c70769f4271cc22ea8becaa26272)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
>   at 
> 

[jira] [Commented] (FLINK-15308) Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1

2019-12-17 Thread Yingjie Cao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998892#comment-16998892
 ] 

Yingjie Cao commented on FLINK-15308:
-

[~fengjiajie] Could you share your code? I would like to see if I can reproduce 
the problem locally.

> Job failed when enable pipelined-shuffle.compression and numberOfTaskSlots > 1
> --
>
> Key: FLINK-15308
> URL: https://issues.apache.org/jira/browse/FLINK-15308
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.10.0
> Environment: $ git log
> commit 4b54da2c67692b1c9d43e1184c00899b0151b3ae
> Author: bowen.li 
> Date: Tue Dec 17 17:37:03 2019 -0800
>Reporter: Feng Jiajie
>Priority: Blocker
>
> Job worked well with default flink-conf.yaml with 
> pipelined-shuffle.compression:
> {code:java}
> taskmanager.numberOfTaskSlots: 1
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> But when I set taskmanager.numberOfTaskSlots to 4 or 6:
> {code:java}
> taskmanager.numberOfTaskSlots: 6
> taskmanager.network.pipelined-shuffle.compression.enabled: true
> {code}
> job failed:
> {code:java}
> $ bin/flink run -m yarn-cluster -p 16 -yjm 1024m -ytm 12288m 
> ~/flink-example-1.0-SNAPSHOT.jar
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/sa_cluster/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,514 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - The configuration directory 
> ('/data/build/flink/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf')
>  already contains a LOG4J config file.If you want to use logback, then please 
> delete or rename the log configuration file.
> 2019-12-18 15:04:40,907 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - No path for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2019-12-18 15:04:41,084 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Cluster specification: 
> ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=12288, 
> numberTaskManagers=1, slotsPerTaskManager=6}
> 2019-12-18 15:04:42,344 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Submitting application master application_1576573857638_0026
> 2019-12-18 15:04:42,370 INFO  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted 
> application application_1576573857638_0026
> 2019-12-18 15:04:42,371 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Waiting for the cluster to be allocated
> 2019-12-18 15:04:42,372 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Deploying cluster, current state ACCEPTED
> 2019-12-18 15:04:45,388 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - YARN application has been deployed successfully.
> 2019-12-18 15:04:45,390 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>   - Found Web Interface debugboxcreate431x3.sa:36162 of 
> application 'application_1576573857638_0026'.
> Job has been submitted with JobID 9140c70769f4271cc22ea8becaa26272
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: org.apache.flink.client.program.ProgramInvocationException: 
> Job failed (JobID: 9140c70769f4271cc22ea8becaa26272)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:664)
>   at