[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100592#comment-17100592 ] Caizhi Weng edited comment on FLINK-16636 at 5/6/20, 8:58 AM: -- Hi, After a few more investigations I'm afraid I have to conclude that this is not a bug. It's just that the memory size of our testing container is too small. I use the [native memory tracking tool|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html] to track the native memory usage of all test cases, and I'll post the final memory usage below. Click [here|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr022.html] to see the explanation for each category. {code} RSS: 2928128 (in 1KB blocks) Total: reserved=4679436KB, committed=3198040KB - Java Heap (reserved=2097152KB, committed=1740800KB) (mmap: reserved=2097152KB, committed=1740800KB) - Class (reserved=1257953KB, committed=248449KB) (classes #29687) (malloc=25057KB #113855) (mmap: reserved=1232896KB, committed=223392KB) -Thread (reserved=56902KB, committed=56902KB) (thread #56) (stack: reserved=55456KB, committed=55456KB) (malloc=167KB #287) (arena=1279KB #110) - Code (reserved=279028KB, committed=180816KB) (malloc=29428KB #38259) (mmap: reserved=249600KB, committed=151388KB) -GC (reserved=139125KB, committed=125901KB) (malloc=28533KB #81265) (mmap: reserved=110592KB, committed=97368KB) - Compiler (reserved=179KB, committed=179KB) (malloc=48KB #444) (arena=131KB #3) - Internal (reserved=801672KB, committed=801664KB) (malloc=801632KB #83153) (mmap: reserved=40KB, committed=32KB) -Symbol (reserved=33730KB, committed=33730KB) (malloc=32059KB #274943) (arena=1670KB #1) -Native Memory Tracking (reserved=9283KB, committed=9283KB) (malloc=21KB #254) (tracking overhead=9262KB) - Arena Chunk (reserved=316KB, committed=316KB) (malloc=316KB) - Unknown (reserved=4096KB, committed=0KB) (mmap: reserved=4096KB, committed=0KB) {code} We see that besides heap memory, we have another 1GB+ native memory usage. What seems to be the most suspicious is the "Internal" memory which uses up to 800MB native memory, but I don't know what this "Internal" is (it's explained very roughly in the category documentation) and more detailed stack trace doesn't give me any information either. Besides, this "Internal" memory will drop from time to time to a small value, so I don't think there is a native memory leak here. We also have a somewhat large "Code" and "Class" memory usage but this is also normal, as we generate lots of Java code when running SQL. Note that besides the two surefire process, maven process and other process will also consume memory. So it just seems that we should enlarge the memory size of the container, or make the heap size limit smaller, or just to run these test cases with one single process. It's true that JDK8 has some native memory leaks, but they're not big deal for tests running in about 30 minutes. Those leaks will take tens of hours to finally eat up native memories. I'm not familiar with changing these test settings. [~rmetzger] could you point out what changes should I make if I want to run tests for flink-table-planner-blink with only 1 process? was (Author: tsreaper): Hi, After a few more investigations I'm afraid I have to conclude that this is not a bug. It's just that the memory size of our testing container is too small. I use the [native memory tracking tool|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html] to track the native memory usage of all test cases, and I'll post the final memory usage below. Click [here|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr022.html] to see the explanation for each category. {code} RSS: 2928128 (in 1KB blocks) Total: reserved=4679436KB, committed=3198040KB - Java Heap (reserved=2097152KB, committed=1740800KB) (mmap: reserved=2097152KB, committed=1740800KB) - Class (reserved=1257953KB,
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100592#comment-17100592 ] Caizhi Weng edited comment on FLINK-16636 at 5/6/20, 8:46 AM: -- Hi, After a few more investigations I'm afraid I have to conclude that this is not a bug. It's just that the memory size of our testing container is too small. I use the [native memory tracking tool|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html] to track the native memory usage of all test cases, and I'll post the final memory usage below. Click [here|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr022.html] to see the explanation for each category. {code} RSS: 2928128 (in 1KB blocks) Total: reserved=4679436KB, committed=3198040KB - Java Heap (reserved=2097152KB, committed=1740800KB) (mmap: reserved=2097152KB, committed=1740800KB) - Class (reserved=1257953KB, committed=248449KB) (classes #29687) (malloc=25057KB #113855) (mmap: reserved=1232896KB, committed=223392KB) -Thread (reserved=56902KB, committed=56902KB) (thread #56) (stack: reserved=55456KB, committed=55456KB) (malloc=167KB #287) (arena=1279KB #110) - Code (reserved=279028KB, committed=180816KB) (malloc=29428KB #38259) (mmap: reserved=249600KB, committed=151388KB) -GC (reserved=139125KB, committed=125901KB) (malloc=28533KB #81265) (mmap: reserved=110592KB, committed=97368KB) - Compiler (reserved=179KB, committed=179KB) (malloc=48KB #444) (arena=131KB #3) - Internal (reserved=801672KB, committed=801664KB) (malloc=801632KB #83153) (mmap: reserved=40KB, committed=32KB) -Symbol (reserved=33730KB, committed=33730KB) (malloc=32059KB #274943) (arena=1670KB #1) -Native Memory Tracking (reserved=9283KB, committed=9283KB) (malloc=21KB #254) (tracking overhead=9262KB) - Arena Chunk (reserved=316KB, committed=316KB) (malloc=316KB) - Unknown (reserved=4096KB, committed=0KB) (mmap: reserved=4096KB, committed=0KB) {code} We see that besides heap memory, we have another 1GB+ native memory usage. What seems to be the most suspicious is the "Internal" memory which uses up to 800MB native memory, but I don't know what this "Internal" is (it's explained very roughly in the category documentation) and more detailed stack trace doesn't give me any information either. Besides, this "Internal" memory will drop from time to time to a small value, so I don't think there is a native memory leak here. We also have a somewhat large "Code" and "Class" memory usage but this is also normal, as we generate lots of Java code when running SQL. Note that besides the two surefire process, maven process and other process will also consume memory. So it just seems that we should enlarge the memory size of the container, or make the heap size limit smaller, or just to run these test cases with one single process. It's true that JDK8 has some native memory leaks, but they're not big deal for tests running in about 30 minutes. Those leaks will take tens of hours to finally eat up native memories. was (Author: tsreaper): Hi, After a few more investigations I'm afraid I have to conclude that this is not a bug. It's just that the memory size of our testing container is too small. I use the [native memory tracking tool|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html] to track the native memory usage of all test cases, and I'll post the final memory usage below. Click [here|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr022.html] to see the explanation for each category. {code} RSS: 2928128 (in 1KB blocks) Total: reserved=4679436KB, committed=3198040KB - Java Heap (reserved=2097152KB, committed=1740800KB) (mmap: reserved=2097152KB, committed=1740800KB) - Class (reserved=1257953KB, committed=248449KB) (classes #29687) (malloc=25057KB #113855) (mmap: reserved=1232896KB, committed=223392KB) -
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093099#comment-17093099 ] Robert Metzger edited comment on FLINK-16636 at 4/27/20, 7:24 AM: -- [~TsReaper]: Yes, you can log the memory usage. Either through Java, or from bash. I've used something similar recently to debug build failures (in that case to log the disk space usage): https://github.com/rmetzger/flink/commit/10a3af2a83745c5868034de24b613289fe81cda2 [~ykt836]: Yes, we could run the tests in JDK9, however if we want to run above JDK8, I would propose JDK11, because we've set up the infrastructure for JDK11 already. However, I'm generally not convinced that this is the right approach to fix the issue. We would loose test coverage for JDK8 for the blink planner tests. I would rather first look into running the tests serially. The blink planner tests currently need ~30m. I believe it would not be an issue to double that time. was (Author: rmetzger): [~TsReaper]: Yes, you can log the memory usage. Either through Java, or from bash. I've used something similar recently to debug build failures (in that case to log the disk space usage): https://github.com/rmetzger/flink/commit/10a3af2a83745c5868034de24b613289fe81cda2 [~ykt836]: We could run the tests in JDK9, however if we want to run above JDK8, I would propose JDK11, because we've set up the infrastructure for JDK11 already. However, I'm generally not convinced that this is the right approach to fix the issue. We would loose test coverage for JDK8 for the blink planner tests. I would rather first look into running the tests serially. The blink planner tests currently need ~30m. I believe it would not be an issue to double that time. > TableEnvironmentITCase is crashing on Travis > > > Key: FLINK-16636 > URL: https://issues.apache.org/jira/browse/FLINK-16636 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Jark Wu >Assignee: Caizhi Weng >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Here is the instance and exception stack: > https://api.travis-ci.org/v3/job/663408376/log.txt > But there is not too much helpful information there, maybe a accidental maven > problem. > {code} > 09:55:07.703 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test > (integration-tests) on project flink-table-planner-blink_2.11: There are test > failures. > 09:55:07.703 [ERROR] > 09:55:07.703 [ERROR] Please refer to > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports > for the individual test results. > 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, > [date]-jvmRun[N].dump and [date].dumpstream. > 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without > properly saying goodbye. VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: >
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090646#comment-17090646 ] Kurt Young edited comment on FLINK-16636 at 4/23/20, 2:13 PM: -- [~rmetzger] Is it possible to use JDK9 for executing these cases? was (Author: ykt836): [~rmetzger] Is it possible to use JDK9 to executing these cases? > TableEnvironmentITCase is crashing on Travis > > > Key: FLINK-16636 > URL: https://issues.apache.org/jira/browse/FLINK-16636 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Jark Wu >Assignee: Caizhi Weng >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Here is the instance and exception stack: > https://api.travis-ci.org/v3/job/663408376/log.txt > But there is not too much helpful information there, maybe a accidental maven > problem. > {code} > 09:55:07.703 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test > (integration-tests) on project flink-table-planner-blink_2.11: There are test > failures. > 09:55:07.703 [ERROR] > 09:55:07.703 [ERROR] Please refer to > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports > for the individual test results. > 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, > [date]-jvmRun[N].dump and [date].dumpstream. > 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without > properly saying goodbye. VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:382) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) > 09:55:07.704 [ERROR] at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) > 09:55:07.704 [ERROR] at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) > 09:55:07.704 [ERROR] at >
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090348#comment-17090348 ] Caizhi Weng edited comment on FLINK-16636 at 4/23/20, 7:39 AM: --- For just solving this issue we can force a full GC after large IT cases such as {{HashAggITCase}}. But I'm afraid other test cases will eventually trigger similar issues with the developing going on. [~rmetzger] is it possible to log current memory usage (including heap, total_vm, rss, etc.) after each test case (maybe add some functionality into {{TestLogger}})? I see that in daily cron tests blink planner tests will run with both jdk8 and jdk11. The memory leak of {{ProcessBuilder}} is fixed in jdk9 according to the [bug report|https://bugs.openjdk.java.net/browse/JDK-8054841], if the memory usage with different jdk varies greatly then it might be the case. With this we can also discover which test case is using a lot of native memory. was (Author: tsreaper): For just solving this issue we can force a full GC after large IT cases such as {{HashAggITCase}}. But I'm afraid other test cases will eventually trigger similar issues with the developing going on. [~rmetzger] is it possible to log current memory usage (including heap, total_vm, rss, etc.) after each test case (maybe add some functionality into {{TestLogger}})? I see that in daily cron tests blink planner tests will run with both jdk8 and jdk11. The memory leak of {{ProcessBuilder}} is fixed in jdk9 according to the [bug report|https://bugs.openjdk.java.net/browse/JDK-8054841], if the memory usage with different jdk varies greatly then it might be the case. > TableEnvironmentITCase is crashing on Travis > > > Key: FLINK-16636 > URL: https://issues.apache.org/jira/browse/FLINK-16636 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Jark Wu >Assignee: Caizhi Weng >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Here is the instance and exception stack: > https://api.travis-ci.org/v3/job/663408376/log.txt > But there is not too much helpful information there, maybe a accidental maven > problem. > {code} > 09:55:07.703 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test > (integration-tests) on project flink-table-planner-blink_2.11: There are test > failures. > 09:55:07.703 [ERROR] > 09:55:07.703 [ERROR] Please refer to > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports > for the individual test results. > 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, > [date]-jvmRun[N].dump and [date].dumpstream. > 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without > properly saying goodbye. VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090334#comment-17090334 ] Robert Metzger edited comment on FLINK-16636 at 4/23/20, 7:17 AM: -- Thanks a lot for looking so deeply into this. Sadly, I don't know enough about JVM memory management. Maybe [~azagrebin] has an idea, as he has worked on Flink's memory management a lot recently. was (Author: rmetzger): Thanks a lot for looking to deeply into this. Sadly, I don't know enough about JVM memory management. Maybe [~azagrebin] has an idea, as he has worked on Flink's memory management a lot recently. > TableEnvironmentITCase is crashing on Travis > > > Key: FLINK-16636 > URL: https://issues.apache.org/jira/browse/FLINK-16636 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Jark Wu >Assignee: Caizhi Weng >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Here is the instance and exception stack: > https://api.travis-ci.org/v3/job/663408376/log.txt > But there is not too much helpful information there, maybe a accidental maven > problem. > {code} > 09:55:07.703 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test > (integration-tests) on project flink-table-planner-blink_2.11: There are test > failures. > 09:55:07.703 [ERROR] > 09:55:07.703 [ERROR] Please refer to > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports > for the individual test results. > 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, > [date]-jvmRun[N].dump and [date].dumpstream. > 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without > properly saying goodbye. VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:382) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) > 09:55:07.704 [ERROR] at >
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088467#comment-17088467 ] Caizhi Weng edited comment on FLINK-16636 at 4/23/20, 6:38 AM: --- I've tried to run the IT cases on my local machine and read some surefire source code. Although I'm not 100% sure about the reason of this problem, the following is my guessing: Surefire plugin needs to periodically check whether the maven process (its parent process) is still running so that it can stop itself when the maven process unexpectedly exits (see {{processCheckerJob}} in [this code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]). To achieve this, it periodically calls [{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112] and finally goes into [{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163] method. In this method, it builds up a shell command with a {{ProcessBuilder}} and executes it with {{UNIXProcess.forkAndExec}} (see stack in [[~rmetzger]'s post|https://issues.apache.org/jira/browse/FLINK-16636?focusedCommentId=17087385=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17087385]). -It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap size). But as the shell command is executed by forking a new process, when this surefire check happens, it is possible that the memory usage doubles, thus reaching the hard 8GB limit of the container.- -There are some IT cases which uses a real lot of heap memory (especially {{HashAggITCase}}), but according to my local {{top}} command, currently the memory usage of all tests seems to be precisely a bit lower than what is needed to trigger a full GC (memory usage of each JVM goes up to about 2000MB). As {{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two test cases to run, the memory failure is most likely to happen in their turn.- -So it seems that the only thing that goes wrong is the heap limit we apply to the testing JVMs. I think lowering the heap limit a bit can solve this issue.- was (Author: tsreaper): I've tried to run the IT cases on my local machine and read some surefire source code. Although I'm not 100% sure about the reason of this problem, the following is my guessing: Surefire plugin needs to periodically check whether the maven process (its parent process) is still running so that it can stop itself when the maven process unexpectedly exits (see {{processCheckerJob}} in [this code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]). To achieve this, it periodically calls [{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112] and finally goes into [{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163] method. In this method, it builds up a shell command with a {{ProcessBuilder}} and executes it with {{UNIXProcess.forkAndExec}} (see stack in [[~rmetzger]'s post|https://issues.apache.org/jira/browse/FLINK-16636?focusedCommentId=17087385=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17087385]). It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap size). But as the shell command is executed by forking a new process, when this surefire check happens, it is possible that the memory usage doubles, thus reaching the hard 8GB limit of the container. There are some IT cases which uses a real lot of heap memory (especially {{HashAggITCase}}), but according to my local {{top}} command, currently the memory usage of all tests seems to be precisely a bit lower than what is needed to trigger a full GC (memory usage of each JVM goes up to about 2000MB). As {{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two test cases to run, the memory failure is most likely to happen in their turn. So it seems that the only thing that goes wrong is the heap limit we apply to the testing JVMs. I think lowering the heap limit a bit can solve this issue. > TableEnvironmentITCase is crashing on Travis > > > Key: FLINK-16636 > URL: https://issues.apache.org/jira/browse/FLINK-16636 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088467#comment-17088467 ] Caizhi Weng edited comment on FLINK-16636 at 4/21/20, 9:39 AM: --- I've tried to run the IT cases on my local machine and read some surefire source code. Although I'm not 100% sure about the reason of this problem, the following is my guessing: Surefire plugin needs to periodically check whether the maven process (its parent process) is still running so that it can stop itself when the maven process unexpectedly exits (see {{processCheckerJob}} in [this code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]). To achieve this, it periodically calls [{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112] and finally goes into [{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163] method. In this method, it builds up a shell command with a {{ProcessBuilder}} and executes it with {{UNIXProcess.forkAndExec}} (see stack in [[~rmetzger]'s post|https://issues.apache.org/jira/browse/FLINK-16636?focusedCommentId=17087385=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17087385]). It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap size). But as the shell command is executed by forking a new process, when this surefire check happens, it is possible that the memory usage doubles, thus reaching the hard 8GB limit of the container. There are some IT cases which uses a real lot of heap memory (especially {{HashAggITCase}}), but according to my local {{top}} command, currently the memory usage of all tests seems to be precisely a bit lower than what is needed to trigger a full GC (memory usage of each JVM goes up to about 2000MB). As {{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two test cases to run, the memory failure is most likely to happen in their turn. So it seems that the only thing that goes wrong is the heap limit we apply to the testing JVMs. I think lowering the heap limit a bit can solve this issue. was (Author: tsreaper): I've tried to run the IT cases on my local machine and read some surefire source code. Although I'm not 100% sure about the reason of this problem, the following is my guessing: Surefire plugin needs to periodically check whether the maven process (its parent process) is still running so that it can stop itself when the maven process unexpectedly exits (see {{processCheckerJob}} in [this code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]). To achieve this, it periodically calls [{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112] and finally goes into [{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163] method. In this method, it builds up a shell command with a {{ProcessBuilder}} and executes it with {{UNIXProcess.forkAndExec}}. It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap size). But as the shell command is executed by forking a new process, when this surefire check happens, it is possible that the memory usage doubles, thus reaching the hard 8GB limit of the container. There are some IT cases which uses a real lot of heap memory (especially {{HashAggITCase}}), but according to my local {{top}} command, currently the memory usage of all tests seems to be precisely a bit lower than what is needed to trigger a full GC (memory usage of each JVM goes up to about 2000MB). As {{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two test cases to run, the memory failure is most likely to happen in their turn. So it seems that the only thing that goes wrong is the heap limit we apply to the testing JVMs. I think lowering the heap limit a bit can solve this issue. > TableEnvironmentITCase is crashing on Travis > > > Key: FLINK-16636 > URL: https://issues.apache.org/jira/browse/FLINK-16636 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Jark Wu >Assignee: Caizhi Weng >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.11.0 > > Time Spent: 20m >
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088467#comment-17088467 ] Caizhi Weng edited comment on FLINK-16636 at 4/21/20, 9:19 AM: --- I've tried to run the IT cases on my local machine and read some surefire source code. Although I'm not 100% sure about the reason of this problem, the following is my guessing: Surefire plugin needs to periodically check whether the maven process (its parent process) is still running so that it can stop itself when the maven process unexpectedly exits (see {{processCheckerJob}} in [this code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]). To achieve this, it periodically calls [{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112] and finally goes into [{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163] method. In this method, it builds up a shell command with a {{ProcessBuilder}} and executes it with {{UNIXProcess.forkAndExec}}. It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap size). But as the shell command is executed by forking a new process, when this surefire check happens, it is possible that the memory usage doubles, thus reaching the hard 8GB limit of the container. There are some IT cases which uses a real lot of heap memory (especially {{HashAggITCase}}), but according to my local {{top}} command, currently the memory usage of all tests seems to be precisely a bit lower than what is needed to trigger a full GC (memory usage of each JVM goes up to about 2000MB). As {{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two test cases to run, the memory failure is most likely to happen in their turn. So it seems that the only thing that goes wrong is the heap limit we apply to the testing JVMs. I think lowering the heap limit a bit can solve this issue. was (Author: tsreaper): I've tried to run the IT cases on my local machine and read some surefire source code. Although I'm not 100% sure about the reason of this problem, the following is my guessing: Surefire plugin needs to periodically check whether the maven process (its parent process) is still running so that it can stop itself when the maven process unexpectedly exits (see {{processCheckerJob}} in [this code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]). To achieve this, it periodically calls [{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112] and finally goes into [{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163] method. In this method, it builds up a shell command with a {{ProcessBuilder}} and executes it with {{UNIXProcess.forkAndExec}}. It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap size). But as the shell command is executed by forking a new process, when this surefire check happens, it is possible that the memory usage doubles, thus reaching the hard 8GB limit of the container. There are some IT cases which uses a real lot of heap memory (especially {{HashAggITCase}}), but according to my local {{top}} command, currently the memory usage of all tests seems to be precisely a bit lower than what is needed to trigger a full GC (memory usage of each JVM goes up to about 2200MB). As {{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two test cases to run, the memory failure is most likely to happen in their turn. So it seems that the only thing that goes wrong is the heap limit we apply to the testing JVMs. I think lowering the heap limit a bit can solve this issue. > TableEnvironmentITCase is crashing on Travis > > > Key: FLINK-16636 > URL: https://issues.apache.org/jira/browse/FLINK-16636 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Jark Wu >Assignee: Caizhi Weng >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Here is the instance and exception stack: > https://api.travis-ci.org/v3/job/663408376/log.txt > But there is not too much helpful information there, maybe a accidental
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088467#comment-17088467 ] Caizhi Weng edited comment on FLINK-16636 at 4/21/20, 9:18 AM: --- I've tried to run the IT cases on my local machine and read some surefire source code. Although I'm not 100% sure about the reason of this problem, the following is my guessing: Surefire plugin needs to periodically check whether the maven process (its parent process) is still running so that it can stop itself when the maven process unexpectedly exits (see {{processCheckerJob}} in [this code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]). To achieve this, it periodically calls [{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112] and finally goes into [{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163] method. In this method, it builds up a shell command with a {{ProcessBuilder}} and executes it with {{UNIXProcess.forkAndExec}}. It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap size). But as the shell command is executed by forking a new process, when this surefire check happens, it is possible that the memory usage doubles, thus reaching the hard 8GB limit of the container. There are some IT cases which uses a real lot of heap memory (especially {{HashAggITCase}}), but according to my local {{top}} command, currently the memory usage of all tests seems to be precisely a bit lower than what is needed to trigger a full GC (memory usage of each JVM goes up to about 2200MB). As {{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two test cases to run, the memory failure is most likely to happen in their turn. So it seems that the only thing that goes wrong is the heap limit we apply to the testing JVMs. I think lowering the heap limit a bit can solve this issue. was (Author: tsreaper): I've tried to run the IT cases on my local machine and read some surefire source code. Although I'm not 100% sure about the reason of this problem, the following is my guessing: Surefire plugin needs to periodically check whether the maven process (its parent process) is still running so that it can stop itself when the maven process unexpectedly exits (see {{processCheckerJob}} in [this code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]). To achieve this, it periodically calls [{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112] and finally goes into [{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163] method. In this method, it builds up a shell command with a {{ProcessBuilder}} and executes it with {{UNIXProcess.forkAndExec}}. It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap size). But as the shell command is executed by forking a new process, when this surefire check happens, it is possible that the memory usage doubles, thus reaching the hard 8GB limit of the container. There are some IT cases which uses a real lot of heap memory (especially {{HashAggITCase}}), but according to my local {{top}} command, currently the memory usage of all tests seems to be precisely a bit lower than what is needed to trigger a full GC. As {{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two test cases to run, the memory failure is most likely to happen in their turn. So it seems that the only thing that goes wrong is the heap limit we apply to the testing JVMs. I think lowering the heap limit a bit can solve this issue. > TableEnvironmentITCase is crashing on Travis > > > Key: FLINK-16636 > URL: https://issues.apache.org/jira/browse/FLINK-16636 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Jark Wu >Assignee: Caizhi Weng >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Here is the instance and exception stack: > https://api.travis-ci.org/v3/job/663408376/log.txt > But there is not too much helpful information there, maybe a accidental maven > problem. > {code} > 09:55:07.703 [ERROR]
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088330#comment-17088330 ] Robert Metzger edited comment on FLINK-16636 at 4/21/20, 6:17 AM: -- bq. About the memory usage, what's the total memory size for each of the testing Linux machines? What's the memory size each JVM requires and how many JVMs will run in parallel when testing? On travis, 8 GB of main memory. By default, surefire configures 2GB heap limit, and it runs 2 forks per CPU See: https://github.com/apache/flink/blob/master/pom.xml#L1486 / https://github.com/apache/flink/blob/master/pom.xml#L99 / https://maven.apache.org/surefire/maven-surefire-plugin/test-mojo.html#forkCount was (Author: rmetzger): About the memory usage, what's the total memory size for each of the testing Linux machines? What's the memory size each JVM requires and how many JVMs will run in parallel when testing? On travis, 8 GB of main memory. By default, surefire configures 2GB heap limit, and it runs 2 forks per CPU See: https://github.com/apache/flink/blob/master/pom.xml#L1486 / https://github.com/apache/flink/blob/master/pom.xml#L99 / https://maven.apache.org/surefire/maven-surefire-plugin/test-mojo.html#forkCount > TableEnvironmentITCase is crashing on Travis > > > Key: FLINK-16636 > URL: https://issues.apache.org/jira/browse/FLINK-16636 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Jark Wu >Assignee: Caizhi Weng >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Here is the instance and exception stack: > https://api.travis-ci.org/v3/job/663408376/log.txt > But there is not too much helpful information there, maybe a accidental maven > problem. > {code} > 09:55:07.703 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test > (integration-tests) on project flink-table-planner-blink_2.11: There are test > failures. > 09:55:07.703 [ERROR] > 09:55:07.703 [ERROR] Please refer to > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports > for the individual test results. > 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, > [date]-jvmRun[N].dump and [date].dumpstream. > 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without > properly saying goodbye. VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:382) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297) > 09:55:07.704 [ERROR] at >
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088308#comment-17088308 ] Caizhi Weng edited comment on FLINK-16636 at 4/21/20, 6:06 AM: --- Hi [~rmetzger] thanks for reporting. Is the heap dump available now? I didn't find heap dump files in the .tar.gz. >From my current investigation it seems to be caused by memory leaks. According >to >[this|https://stackoverflow.com/questions/54755846/killing-self-fork-jvm-ping-timeout-elapsed] > and >[this|https://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error-12-cannot-allocate-memory-calling-run] > stack overflow posts, maven will fork another JVM process to run tests. So if the current JVM grabs too much memory, the forked JVM will also require this much memory. If the OS cannot allocate this much memory, "java.io.IOException: error=12, Cannot allocate memory" will occur. I'm going to investigate on why the memory usage is high. It would be helpful if a heap dump from jmap or such is available. About the memory usage, what's the total memory size for each of the testing Linux machines? What's the memory size each JVM requires and how many JVMs will run in parallel when testing? was (Author: tsreaper): Hi [~rmetzger] thanks for reporting. Is the heap dump available now? I didn't find heap dump files in the .tar.gz. >From my current investigation it seems to be caused by memory leaks. According >to >[this|https://stackoverflow.com/questions/54755846/killing-self-fork-jvm-ping-timeout-elapsed] > and >[this|https://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error-12-cannot-allocate-memory-calling-run] > stack overflow posts, maven will fork another JVM process to run tests. So if the current JVM grabs too much memory, the forked JVM will also require this much memory. If the OS cannot allocate this much memory, "java.io.IOException: error=12, Cannot allocate memory" will occur. I'm going to investigate on why the memory usage is high. It would be helpful if a heap dump from jmap or such is available. > TableEnvironmentITCase is crashing on Travis > > > Key: FLINK-16636 > URL: https://issues.apache.org/jira/browse/FLINK-16636 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Jark Wu >Assignee: Caizhi Weng >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Here is the instance and exception stack: > https://api.travis-ci.org/v3/job/663408376/log.txt > But there is not too much helpful information there, maybe a accidental maven > problem. > {code} > 09:55:07.703 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test > (integration-tests) on project flink-table-planner-blink_2.11: There are test > failures. > 09:55:07.703 [ERROR] > 09:55:07.703 [ERROR] Please refer to > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports > for the individual test results. > 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, > [date]-jvmRun[N].dump and [date].dumpstream. > 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without > properly saying goodbye. VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar >
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088308#comment-17088308 ] Caizhi Weng edited comment on FLINK-16636 at 4/21/20, 5:58 AM: --- Hi [~rmetzger] thanks for reporting. Is the heap dump available now? I didn't find heap dump files in the .tar.gz. >From my current investigation it seems to be caused by memory leaks. According >to >[this|https://stackoverflow.com/questions/54755846/killing-self-fork-jvm-ping-timeout-elapsed] > and >[this|https://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error-12-cannot-allocate-memory-calling-run] > stack overflow posts, maven will fork another JVM process to run tests. So if the current JVM grabs too much memory, the forked JVM will also require this much memory. If the OS cannot allocate this much memory, "java.io.IOException: error=12, Cannot allocate memory" will occur. I'm going to investigate on why the memory usage is high. It would be helpful if a heap dump from jmap or such is available. was (Author: tsreaper): Hi [~rmetzger] thanks for reporting. Is the heap dump available now? I didn't find heap dump files in the .tar.gz > TableEnvironmentITCase is crashing on Travis > > > Key: FLINK-16636 > URL: https://issues.apache.org/jira/browse/FLINK-16636 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Jark Wu >Assignee: Caizhi Weng >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Here is the instance and exception stack: > https://api.travis-ci.org/v3/job/663408376/log.txt > But there is not too much helpful information there, maybe a accidental maven > problem. > {code} > 09:55:07.703 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test > (integration-tests) on project flink-table-planner-blink_2.11: There are test > failures. > 09:55:07.703 [ERROR] > 09:55:07.703 [ERROR] Please refer to > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports > for the individual test results. > 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, > [date]-jvmRun[N].dump and [date].dumpstream. > 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without > properly saying goodbye. VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:382) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246) >
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087385#comment-17087385 ] Robert Metzger edited comment on FLINK-16636 at 4/20/20, 5:51 AM: -- Instance in master build on Travis: https://travis-ci.org/github/apache/flink/jobs/676881597 The debugging files (https://s3.amazonaws.com/flink-logs-us/travis-artifacts/apache/flink/43406/43406.5.tar.gz) contain this: {code} # Created at 2020-04-19T23:07:44.872 java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) at java.io.BufferedInputStream.read1(BufferedInputStream.java:283) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.Reader.read(Reader.java:100) at java.util.Scanner.readInput(Scanner.java:804) at java.util.Scanner.findWithinHorizon(Scanner.java:1685) at java.util.Scanner.hasNextLine(Scanner.java:1500) at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354) at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190) at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123) at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) # Created at 2020-04-19T23:07:44.879 System.exit() or native command error interrupted process checker. java.lang.IllegalStateException: error [STOPPED] to read process 14686 at org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145) at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124) at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) # Created at 2020-04-19T23:20:36.789 java.io.IOException: Cannot run program "/bin/sh": error=12, Cannot allocate memory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:351) at org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190) at org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123) at org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: error=12, Cannot allocate memory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 11 more #
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085786#comment-17085786 ] Yu Li edited comment on FLINK-16636 at 4/17/20, 2:01 PM: - Another instance in release-1.10 crone build: https://api.travis-ci.org/v3/job/675870831/log.txt was (Author: carp84): Another instance in release-1.10 crone job: https://api.travis-ci.org/v3/job/675870831/log.txt > TableEnvironmentITCase is crashing on Travis > > > Key: FLINK-16636 > URL: https://issues.apache.org/jira/browse/FLINK-16636 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Jark Wu >Assignee: Caizhi Weng >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Here is the instance and exception stack: > https://api.travis-ci.org/v3/job/663408376/log.txt > But there is not too much helpful information there, maybe a accidental maven > problem. > {code} > 09:55:07.703 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test > (integration-tests) on project flink-table-planner-blink_2.11: There are test > failures. > 09:55:07.703 [ERROR] > 09:55:07.703 [ERROR] Please refer to > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports > for the individual test results. > 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, > [date]-jvmRun[N].dump and [date].dumpstream. > 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without > properly saying goodbye. VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:382) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) > 09:55:07.704 [ERROR] at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) > 09:55:07.704 [ERROR] at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) > 09:55:07.704 [ERROR] at >
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079109#comment-17079109 ] Caizhi Weng edited comment on FLINK-16636 at 4/9/20, 9:40 AM: -- [~ykt836] NPE is caused by {{ScalarQueryITCase#testScalarSubQueryException}}. This is a test case with one node intended to fail. If the node containing the sink is up but not initialized when the job fails, {{Utils.CollectHelper#open}} will not be called but {{Utils.CollectHelper#close}} will be called soon, thus causing the NPE. But this does not seem to be related with the crashing of {{TableEnvironmentITCase}} (as this NPE also occurs when the blink_planner test module passes all the tests). Currently I suspect that {{TableUtilsStreamingITCase}} is just accidental. I'm going to fix the NPE problem first and then extend {{TableEnvironmentITCase}}, {{BatchAbstractTestBase}} and {{TableUtilsStreamingITCase}} with {{TestLogger}} class so that when this problem arises next time, we can have a more detailed log. [~rmetzger] Did you notice anything abnormal with the JVM heap? was (Author: tsreaper): [~ykt836] NPE is caused by {{ScalarQueryITCase#testScalarSubQueryException}}. This is a test case with one node intended to fail. If the node containing the sink is up but not initialized when the job fails, {{Utils.CollectHelper#open}} will not be called but {{Utils.CollectHelper#close}} will be called soon, thus causing the NPE. But this does not seem to be related with the crashing of {{TableEnvironmentITCase}}. Currently I suspect that {{TableUtilsStreamingITCase}} is just accidental. I'm going to fix the NPE problem first and then extend {{TableEnvironmentITCase}}, {{BatchAbstractTestBase}} and {{TableUtilsStreamingITCase}} with {{TestLogger}} class so that when this problem arises next time, we can have a more detailed log. [~rmetzger] Did you notice anything abnormal with the JVM heap? > TableEnvironmentITCase is crashing on Travis > > > Key: FLINK-16636 > URL: https://issues.apache.org/jira/browse/FLINK-16636 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Jark Wu >Assignee: Caizhi Weng >Priority: Blocker > Labels: test-stability > Fix For: 1.11.0 > > > Here is the instance and exception stack: > https://api.travis-ci.org/v3/job/663408376/log.txt > But there is not too much helpful information there, maybe a accidental maven > problem. > {code} > 09:55:07.703 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test > (integration-tests) on project flink-table-planner-blink_2.11: There are test > failures. > 09:55:07.703 [ERROR] > 09:55:07.703 [ERROR] Please refer to > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports > for the individual test results. > 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, > [date]-jvmRun[N].dump and [date].dumpstream. > 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without > properly saying goodbye. VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting
[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis
[ https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077172#comment-17077172 ] Robert Metzger edited comment on FLINK-16636 at 4/7/20, 12:35 PM: -- The change is merged. Builds coming in tonight might reveal additional information. Another failure: https://dev.azure.com/rmetzger/Flink/_build/results?buildId=7139=logs=e25d5e7e-2a9c-5589-4940-0b638d75a414=294c2388-20e6-57a2-5721-91db544b1e69 https://travis-ci.org/github/apache/flink/jobs/671996427?utm_medium=notification_source=slack was (Author: rmetzger): The change is merged. Builds coming in tonight might reveal additional information. Another failure: https://dev.azure.com/rmetzger/Flink/_build/results?buildId=7139=logs=e25d5e7e-2a9c-5589-4940-0b638d75a414=294c2388-20e6-57a2-5721-91db544b1e69 > TableEnvironmentITCase is crashing on Travis > > > Key: FLINK-16636 > URL: https://issues.apache.org/jira/browse/FLINK-16636 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.0 >Reporter: Jark Wu >Priority: Blocker > Labels: test-stability > Fix For: 1.11.0 > > > Here is the instance and exception stack: > https://api.travis-ci.org/v3/job/663408376/log.txt > But there is not too much helpful information there, maybe a accidental maven > problem. > {code} > 09:55:07.703 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test > (integration-tests) on project flink-table-planner-blink_2.11: There are test > failures. > 09:55:07.703 [ERROR] > 09:55:07.703 [ERROR] Please refer to > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports > for the individual test results. > 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, > [date]-jvmRun[N].dump and [date].dumpstream. > 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without > properly saying goodbye. VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > 09:55:07.703 [ERROR] Command was /bin/sh -c cd > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target > && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar > > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire > 2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp > surefire_43192129054983363633tmp > 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log > 09:55:07.703 [ERROR] Process Exit Code: 137 > 09:55:07.703 [ERROR] Crashed tests: > 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase > 09:55:07.703 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:382) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011) > 09:55:07.704 [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857) > 09:55:07.704 [ERROR] at >