[ https://issues.apache.org/jira/browse/FLINK-34202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819180#comment-17819180 ]
Lorenzo Affetti commented on FLINK-34202: ----------------------------------------- Update, CI001 looks just a bit overcommitted: !Screenshot 2024-02-21 at 09.45.18.png|width=731,height=191! All these Java processes are happening at the moment: {code:java} agent05 6537 0.0 0.0 17004 1412 ? S 18:35 0:00 /bin/sh -c cd '/__w/2/s/flink-table/flink-table-planner' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' '--add-opens=java.base/java.lang=ALL-UNNAMED' '--add-opens=java.base/java.util=ALL-UNNAMED' '--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED' '--add-opens=java.base/java.time=ALL-UNNAMED' '--add-opens=java.base/java.math=ALL-UNNAMED' '--add-opens=java.base/java.nio=ALL-UNNAMED' '-Xmx1536m' '-jar' '/__w/2/s/flink-table/flink-table-planner/target/surefire/surefirebooter-20240221103036252_109.jar' '/__w/2/s/flink-table/flink-table-planner/target/surefire' '2024-02-21T10-23-54_036-jvmRun1' 'surefire-20240221103036252_105tmp' 'surefire_34-20240221103036252_107tmp' agent05 6538 0.0 0.0 17004 1424 ? S 18:35 0:00 /bin/sh -c cd '/__w/2/s/flink-table/flink-table-planner' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' '--add-opens=java.base/java.lang=ALL-UNNAMED' '--add-opens=java.base/java.util=ALL-UNNAMED' '--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED' '--add-opens=java.base/java.time=ALL-UNNAMED' '--add-opens=java.base/java.math=ALL-UNNAMED' '--add-opens=java.base/java.nio=ALL-UNNAMED' '-Xmx1536m' '-jar' '/__w/2/s/flink-table/flink-table-planner/target/surefire/surefirebooter-20240221103036252_110.jar' '/__w/2/s/flink-table/flink-table-planner/target/surefire' '2024-02-21T10-23-54_036-jvmRun2' 'surefire-20240221103036252_106tmp' 'surefire_35-20240221103036252_108tmp' agent05 6542 231 1.9 4738124 1289100 ? Sl 18:35 3:37 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -XX:+UseG1GC -Xms256m -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/java.time=ALL-UNNAMED --add-opens=java.base/java.math=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED -Xmx1536m -jar /__w/2/s/flink-table/flink-table-planner/target/surefire/surefirebooter-20240221103036252_110.jar /__w/2/s/flink-table/flink-table-planner/target/surefire 2024-02-21T10-23-54_036-jvmRun2 surefire-20240221103036252_106tmp surefire_35-20240221103036252_108tmp agent05 6547 168 2.3 4631284 1545452 ? Sl 18:35 2:38 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -XX:+UseG1GC -Xms256m -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/java.time=ALL-UNNAMED --add-opens=java.base/java.math=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED -Xmx1536m -jar /__w/2/s/flink-table/flink-table-planner/target/surefire/surefirebooter-20240221103036252_109.jar /__w/2/s/flink-table/flink-table-planner/target/surefire 2024-02-21T10-23-54_036-jvmRun1 surefire-20240221103036252_105tmp surefire_34-20240221103036252_107tmp agent03 8139 5.4 2.9 19693224 1963424 ? Sl 17:50 2:35 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -XX:+IgnoreUnrecognizedVMOptions --add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED -classpath /__w/1/s/.mvn/wrapper/maven-wrapper.jar -Dmaven.multiModuleProjectDirectory=/__w/1/s org.apache.maven.wrapper.MavenWrapperMain -Dmaven.repo.local=/__w/1/.m2/repository -Dmaven.wagon.http.pool=false -Dorg.slf4j.simpleLogger.showDateTime=true -Dorg.slf4j.simpleLogger.dateTimeFormat=HH:mm:ss.SSS -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn --no-snapshot-updates -B -Dflink.hadoop.version=2.10.2 --settings /__w/1/s/tools/ci/google-mirror-settings.xml -Dfast -Pskip-webui-build -Dlog.dir=/__w/_temp/debug_files -Dlog4j.configurationFile=file:///__w/1/s/tools/ci/log4j.properties -Dflink.tests.with-openssl -Dflink.tests.check-segment-multiple-free -Darchunit.freeze.store.default.allowStoreUpdate=false -Dpekko.rpc.force-invocation-serialization -Dflink.hadoop.version=2.10.2 -pl flink-tests, verify agent04 12833 1.7 0.0 0 0 ? Z 18:13 0:25 [java] <defunct> agent04 15877 1.3 0.0 0 0 ? Z 18:13 0:19 [java] <defunct> agent04 16126 1.4 0.0 0 0 ? Z 18:13 0:20 [java] <defunct> 200 18382 2629 5.4 15012168 3609384 ? Ssl Jan18 1291718:08 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.392.b08-4.el8.x86_64/jre/bin/java -server -Dinstall4j.jvmDir=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.392.b08-4.el8.x86_64/jre -Dexe4j.moduleName=/opt/sonatype/nexus/bin/nexus -XX:+UnlockDiagnosticVMOptions -Dinstall4j.launcherId=245 -Dinstall4j.swt=false -Di4jv=0 -Di4jv=0 -Di4jv=0 -Di4jv=0 -Di4jv=0 -Xms2703m -Xmx2703m -XX:MaxDirectMemorySize=2703m -Djava.util.prefs.userRoot=/nexus-data/javaprefs -XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX:LogFile=../sonatype-work/nexus3/log/jvm.log -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Dkaraf.home=. -Dkaraf.base=. -Dkaraf.etc=etc/karaf -java.util.logging.config.file=etc/karaf/java.util.logging.properties -Dkaraf.data=../sonatype-work/nexus3 -Dkaraf.log=../sonatype-work/nexus3/log -Djava.io.tmpdir=../sonatype-work/nexus3/tmp -Dkaraf.startLocalConsole=false -Djdk.tls.ephemeralDHKeySize=2048 -Djava.endorsed.dirs=lib/endorsed -Di4j.vpt=true -classpath /opt/sonatype/nexus/.install4j/i4jruntime.jar:/opt/sonatype/nexus/lib/boot/nexus-main.jar:/opt/sonatype/nexus/lib/boot/activation-1.1.1.jar:/opt/sonatype/nexus/lib/boot/jakarta.xml.bind-api-2.3.3.jar:/opt/sonatype/nexus/lib/boot/jaxb-runtime-2.3.3.jar:/opt/sonatype/nexus/lib/boot/txw2-2.3.3.jar:/opt/sonatype/nexus/lib/boot/istack-commons-runtime-3.0.10.jar:/opt/sonatype/nexus/lib/boot/org.apache.karaf.main-4.3.9.jar:/opt/sonatype/nexus/lib/boot/osgi.core-7.0.0.jar:/opt/sonatype/nexus/lib/boot/org.apache.karaf.specs.activator-4.3.9.jar:/opt/sonatype/nexus/lib/boot/org.apache.karaf.diagnostic.boot-4.3.9.jar:/opt/sonatype/nexus/lib/boot/org.apache.karaf.jaas.boot-4.3.9.jar com.install4j.runtime.launcher.UnixLauncher run 9d17dc87 0 0 org.sonatype.nexus.karaf.NexusMain agent03 22044 1.2 0.0 0 0 ? Z 18:07 0:22 [java] <defunct> agent05 23299 34.2 4.5 19662424 3020476 ? Sl 18:23 4:43 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -XX:+IgnoreUnrecognizedVMOptions --add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED -classpath /__w/2/s/.mvn/wrapper/maven-wrapper.jar -Dmaven.multiModuleProjectDirectory=/__w/2/s org.apache.maven.wrapper.MavenWrapperMain -Dmaven.repo.local=/__w/2/.m2/repository -Dmaven.wagon.http.pool=false -Dorg.slf4j.simpleLogger.showDateTime=true -Dorg.slf4j.simpleLogger.dateTimeFormat=HH:mm:ss.SSS -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn --no-snapshot-updates -B -Dflink.hadoop.version=2.10.2 --settings /__w/2/s/tools/ci/google-mirror-settings.xml -Dfast -Pskip-webui-build -Dlog.dir=/__w/_temp/debug_files -Dlog4j.configurationFile=file:///__w/2/s/tools/ci/log4j.properties -Dflink.tests.with-openssl -Dflink.tests.check-segment-multiple-free -Darchunit.freeze.store.default.allowStoreUpdate=false -Dpekko.rpc.force-invocation-serialization -Dflink.hadoop.version=2.10.2 -pl flink-table,flink-table/flink-sql-parser,flink-table/flink-table-common,flink-table/flink-table-api-java,flink-table/flink-table-api-scala,flink-table/flink-table-api-bridge-base,flink-table/flink-table-api-java-bridge,flink-table/flink-table-api-scala-bridge,flink-table/flink-table-api-java-uber,flink-table/flink-sql-client,flink-table/flink-sql-gateway-api,flink-table/flink-sql-gateway,flink-table/flink-table-planner,flink-table/flink-table-planner-loader,flink-table/flink-table-planner-loader-bundle,flink-table/flink-table-runtime,flink-table/flink-table-code-splitter,flink-table/flink-table-test-utils, verify agent03 24292 1.1 0.0 0 0 ? Z 18:07 0:20 [java] <defunct> agent03 24475 1.1 0.0 0 0 ? Z 18:07 0:19 [java] <defunct> agent04 25593 0.0 0.0 17004 5572 ? S 18:36 0:00 /bin/sh -c cd /__w/2/s/flink-tests && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -XX:+UseG1GC -Xms256m -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED -Xmx1536m -jar /__w/2/s/flink-tests/target/surefire/surefirebooter1389210316768519672.jar /__w/2/s/flink-tests/target/surefire 2024-02-21T09-57-00_677-jvmRun4 surefire7018523912167266799tmp surefire_1897583157620381530347tmp agent04 25605 131 2.7 4783948 1796980 ? Sl 18:36 1:14 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -XX:+UseG1GC -Xms256m -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED -Xmx1536m -jar /__w/2/s/flink-tests/target/surefire/surefirebooter1389210316768519672.jar /__w/2/s/flink-tests/target/surefire 2024-02-21T09-57-00_677-jvmRun4 surefire7018523912167266799tmp surefire_1897583157620381530347tmp root 26976 0.0 0.0 112792 1992 pts/0 R+ 18:37 0:00 grep --color=auto java agent03 27978 0.0 0.0 17004 5432 ? S 18:31 0:00 /bin/sh -c cd '/__w/1/s/flink-tests' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' '--add-opens=java.base/java.util=ALL-UNNAMED' '--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' '/__w/1/s/flink-tests/target/surefire/surefirebooter-20240221095043142_572.jar' '/__w/1/s/flink-tests/target/surefire' '2024-02-21T09-50-36_899-jvmRun1' 'surefire-20240221095043142_570tmp' 'surefire_189-20240221095043142_571tmp' agent03 27981 125 4.0 5963448 2689132 ? Sl 18:31 8:06 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -XX:+UseG1GC -Xms256m -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED -Xmx1536m -jar /__w/1/s/flink-tests/target/surefire/surefirebooter-20240221095043142_572.jar /__w/1/s/flink-tests/target/surefire 2024-02-21T09-50-36_899-jvmRun1 surefire-20240221095043142_570tmp surefire_189-20240221095043142_571tmp agent03 28372 0.0 0.0 17004 5612 ? S 18:36 0:00 /bin/sh -c cd '/__w/1/s/flink-tests' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' '--add-opens=java.base/java.util=ALL-UNNAMED' '--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' '/__w/1/s/flink-tests/target/surefire/surefirebooter-20240221095043142_596.jar' '/__w/1/s/flink-tests/target/surefire' '2024-02-21T09-50-36_899-jvmRun4' 'surefire-20240221095043142_594tmp' 'surefire_190-20240221095043142_595tmp' agent03 28379 143 1.6 4642024 1059432 ? Sl 18:36 1:14 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -XX:+UseG1GC -Xms256m -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED -Xmx1536m -jar /__w/1/s/flink-tests/target/surefire/surefirebooter-20240221095043142_596.jar /__w/1/s/flink-tests/target/surefire 2024-02-21T09-50-36_899-jvmRun4 surefire-20240221095043142_594tmp surefire_190-20240221095043142_595tmp agent04 28408 5.8 4.0 19652536 2679676 ? Sl 17:56 2:22 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -XX:+IgnoreUnrecognizedVMOptions --add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED -classpath /__w/2/s/.mvn/wrapper/maven-wrapper.jar -Dmaven.home=/__w/2 -Dmaven.multiModuleProjectDirectory=/__w/2/s org.apache.maven.wrapper.MavenWrapperMain -Dmaven.repo.local=/__w/2/.m2/repository -Dmaven.wagon.http.pool=false -Dorg.slf4j.simpleLogger.showDateTime=true -Dorg.slf4j.simpleLogger.dateTimeFormat=HH:mm:ss.SSS -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn --no-snapshot-updates -B -Dflink.hadoop.version=2.10.2 -Dscala-2.12 --settings /__w/2/s/tools/ci/google-mirror-settings.xml -Dfast -Pskip-webui-build -Dlog.dir=/__w/_temp/debug_files -Dlog4j.configurationFile=file:///__w/2/s/tools/ci/log4j.properties -Dflink.tests.with-openssl -Dflink.tests.check-segment-multiple-free -Darchunit.freeze.store.default.allowStoreUpdate=false -Dpekko.rpc.force-invocation-serialization -Dflink.hadoop.version=2.10.2 -Dscala-2.12 -pl flink-tests, verify agent04 31966 0.0 0.0 17004 1476 ? S 18:33 0:00 /bin/sh -c cd /__w/2/s/flink-tests && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -XX:+UseG1GC -Xms256m -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED -Xmx1536m -jar /__w/2/s/flink-tests/target/surefire/surefirebooter7921915626739830970.jar /__w/2/s/flink-tests/target/surefire 2024-02-21T09-57-00_677-jvmRun2 surefire2539620150285362830tmp surefire_1868293788085014629717tmp agent04 31971 157 4.5 5721044 3029500 ? Sl 18:33 6:20 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -XX:+UseG1GC -Xms256m -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED -Xmx1536m -jar /__w/2/s/flink-tests/target/surefire/surefirebooter7921915626739830970.jar /__w/2/s/flink-tests/target/surefire 2024-02-21T09-57-00_677-jvmRun2 surefire2539620150285362830tmp surefire_1868293788085014629717tmp {code} Output of {code:java} ps -aux | grep java | awk '{print $1}' | grep agent | sort | uniq{code} is: {code:java} agent03 agent04 agent05{code} So, 3 agent are running in parallel. For further investigation, I need to better understand Ci architecture, for example how the CI workers are configured and their parallelism degree. Just throwing around an idea, maybe it is just a matter of reducing the parallelism accepted by the worker: CI jobs will stay pending, but won't fail at runtime for timeouts. Will get back to you once I clarify the situation. > python tests take suspiciously long in some of the cases > -------------------------------------------------------- > > Key: FLINK-34202 > URL: https://issues.apache.org/jira/browse/FLINK-34202 > Project: Flink > Issue Type: Bug > Components: API / Python > Affects Versions: 1.17.2, 1.19.0, 1.18.1 > Reporter: Matthias Pohl > Assignee: Xingbo Huang > Priority: Critical > Labels: pull-request-available, test-stability > Attachments: Screenshot 2024-02-21 at 09.45.18.png > > > [This release-1.18 > build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56603&view=logs&j=3e4dd1a2-fe2f-5e5d-a581-48087e718d53&t=b4612f28-e3b5-5853-8a8b-610ae894217a] > has the python stage running into a timeout without any obvious reason. The > [python stage run for > JDK17|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56603&view=logs&j=b53e1644-5cb4-5a3b-5d48-f523f39bcf06] > was also getting close to the 4h timeout. > I'm creating this issue for documentation purposes. -- This message was sent by Atlassian Jira (v8.20.10#820010)