[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-05-06 Thread Caizhi Weng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100592#comment-17100592
 ] 

Caizhi Weng edited comment on FLINK-16636 at 5/6/20, 8:58 AM:
--

Hi,

After a few more investigations I'm afraid I have to conclude that this is not 
a bug. It's just that the memory size of our testing container is too small.

I use the [native memory tracking 
tool|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html]
 to track the native memory usage of all test cases, and I'll post the final 
memory usage below. Click 
[here|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr022.html]
 to see the explanation for each category.

{code}
RSS: 2928128 (in 1KB blocks)

Total: reserved=4679436KB, committed=3198040KB
- Java Heap (reserved=2097152KB, committed=1740800KB)
(mmap: reserved=2097152KB, committed=1740800KB)

- Class (reserved=1257953KB, committed=248449KB)
(classes #29687)
(malloc=25057KB #113855)
(mmap: reserved=1232896KB, committed=223392KB)

-Thread (reserved=56902KB, committed=56902KB)
(thread #56)
(stack: reserved=55456KB, committed=55456KB)
(malloc=167KB #287)
(arena=1279KB #110)

-  Code (reserved=279028KB, committed=180816KB)
(malloc=29428KB #38259)
(mmap: reserved=249600KB, committed=151388KB)

-GC (reserved=139125KB, committed=125901KB)
(malloc=28533KB #81265)
(mmap: reserved=110592KB, committed=97368KB)

-  Compiler (reserved=179KB, committed=179KB)
(malloc=48KB #444)
(arena=131KB #3)

-  Internal (reserved=801672KB, committed=801664KB)
(malloc=801632KB #83153)
(mmap: reserved=40KB, committed=32KB)

-Symbol (reserved=33730KB, committed=33730KB)
(malloc=32059KB #274943)
(arena=1670KB #1)

-Native Memory Tracking (reserved=9283KB, committed=9283KB)
(malloc=21KB #254)
(tracking overhead=9262KB)

-   Arena Chunk (reserved=316KB, committed=316KB)
(malloc=316KB)

-   Unknown (reserved=4096KB, committed=0KB)
(mmap: reserved=4096KB, committed=0KB)
{code}

We see that besides heap memory, we have another 1GB+ native memory usage. What 
seems to be the most suspicious is the "Internal" memory which uses up to 800MB 
native memory, but I don't know what this "Internal" is (it's explained very 
roughly in the category documentation) and more detailed stack trace doesn't 
give me any information either. Besides, this "Internal" memory will drop from 
time to time to a small value, so I don't think there is a native memory leak 
here.

We also have a somewhat large "Code" and "Class" memory usage but this is also 
normal, as we generate lots of Java code when running SQL.

Note that besides the two surefire process, maven process and other process 
will also consume memory. So it just seems that we should enlarge the memory 
size of the container, or make the heap size limit smaller, or just to run 
these test cases with one single process.

It's true that JDK8 has some native memory leaks, but they're not big deal for 
tests running in about 30 minutes. Those leaks will take tens of hours to 
finally eat up native memories.

I'm not familiar with changing these test settings. [~rmetzger] could you point 
out what changes should I make if I want to run tests for 
flink-table-planner-blink with only 1 process?


was (Author: tsreaper):
Hi,

After a few more investigations I'm afraid I have to conclude that this is not 
a bug. It's just that the memory size of our testing container is too small.

I use the [native memory tracking 
tool|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html]
 to track the native memory usage of all test cases, and I'll post the final 
memory usage below. Click 
[here|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr022.html]
 to see the explanation for each category.

{code}
RSS: 2928128 (in 1KB blocks)

Total: reserved=4679436KB, committed=3198040KB
- Java Heap (reserved=2097152KB, committed=1740800KB)
(mmap: reserved=2097152KB, committed=1740800KB)

- Class (reserved=1257953KB, 

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-05-06 Thread Caizhi Weng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100592#comment-17100592
 ] 

Caizhi Weng edited comment on FLINK-16636 at 5/6/20, 8:46 AM:
--

Hi,

After a few more investigations I'm afraid I have to conclude that this is not 
a bug. It's just that the memory size of our testing container is too small.

I use the [native memory tracking 
tool|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html]
 to track the native memory usage of all test cases, and I'll post the final 
memory usage below. Click 
[here|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr022.html]
 to see the explanation for each category.

{code}
RSS: 2928128 (in 1KB blocks)

Total: reserved=4679436KB, committed=3198040KB
- Java Heap (reserved=2097152KB, committed=1740800KB)
(mmap: reserved=2097152KB, committed=1740800KB)

- Class (reserved=1257953KB, committed=248449KB)
(classes #29687)
(malloc=25057KB #113855)
(mmap: reserved=1232896KB, committed=223392KB)

-Thread (reserved=56902KB, committed=56902KB)
(thread #56)
(stack: reserved=55456KB, committed=55456KB)
(malloc=167KB #287)
(arena=1279KB #110)

-  Code (reserved=279028KB, committed=180816KB)
(malloc=29428KB #38259)
(mmap: reserved=249600KB, committed=151388KB)

-GC (reserved=139125KB, committed=125901KB)
(malloc=28533KB #81265)
(mmap: reserved=110592KB, committed=97368KB)

-  Compiler (reserved=179KB, committed=179KB)
(malloc=48KB #444)
(arena=131KB #3)

-  Internal (reserved=801672KB, committed=801664KB)
(malloc=801632KB #83153)
(mmap: reserved=40KB, committed=32KB)

-Symbol (reserved=33730KB, committed=33730KB)
(malloc=32059KB #274943)
(arena=1670KB #1)

-Native Memory Tracking (reserved=9283KB, committed=9283KB)
(malloc=21KB #254)
(tracking overhead=9262KB)

-   Arena Chunk (reserved=316KB, committed=316KB)
(malloc=316KB)

-   Unknown (reserved=4096KB, committed=0KB)
(mmap: reserved=4096KB, committed=0KB)
{code}

We see that besides heap memory, we have another 1GB+ native memory usage. What 
seems to be the most suspicious is the "Internal" memory which uses up to 800MB 
native memory, but I don't know what this "Internal" is (it's explained very 
roughly in the category documentation) and more detailed stack trace doesn't 
give me any information either. Besides, this "Internal" memory will drop from 
time to time to a small value, so I don't think there is a native memory leak 
here.

We also have a somewhat large "Code" and "Class" memory usage but this is also 
normal, as we generate lots of Java code when running SQL.

Note that besides the two surefire process, maven process and other process 
will also consume memory. So it just seems that we should enlarge the memory 
size of the container, or make the heap size limit smaller, or just to run 
these test cases with one single process.

It's true that JDK8 has some native memory leaks, but they're not big deal for 
tests running in about 30 minutes. Those leaks will take tens of hours to 
finally eat up native memories.


was (Author: tsreaper):
Hi,

After a few more investigations I'm afraid I have to conclude that this is not 
a bug. It's just that the memory size of our testing container is too small.

I use the [native memory tracking 
tool|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html]
 to track the native memory usage of all test cases, and I'll post the final 
memory usage below. Click 
[here|https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr022.html]
 to see the explanation for each category.

{code}
RSS: 2928128 (in 1KB blocks)

Total: reserved=4679436KB, committed=3198040KB
- Java Heap (reserved=2097152KB, committed=1740800KB)
(mmap: reserved=2097152KB, committed=1740800KB)

- Class (reserved=1257953KB, committed=248449KB)
(classes #29687)
(malloc=25057KB #113855)
(mmap: reserved=1232896KB, committed=223392KB)

- 

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-27 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093099#comment-17093099
 ] 

Robert Metzger edited comment on FLINK-16636 at 4/27/20, 7:24 AM:
--

[~TsReaper]: Yes, you can log the memory usage. Either through Java, or from 
bash.
I've used something similar recently to debug build failures (in that case to 
log the disk space usage): 
https://github.com/rmetzger/flink/commit/10a3af2a83745c5868034de24b613289fe81cda2

[~ykt836]: Yes, we could run the tests in JDK9, however if we want to run above 
JDK8, I would propose JDK11, because we've set up the infrastructure for JDK11 
already.
However, I'm generally not convinced that this is the right approach to fix the 
issue. We would loose test coverage for JDK8 for the blink planner tests. I 
would rather first look into running the tests serially. The blink planner 
tests currently need ~30m. I believe it would not be an issue to double that 
time.



was (Author: rmetzger):
[~TsReaper]: Yes, you can log the memory usage. Either through Java, or from 
bash.
I've used something similar recently to debug build failures (in that case to 
log the disk space usage): 
https://github.com/rmetzger/flink/commit/10a3af2a83745c5868034de24b613289fe81cda2

[~ykt836]: We could run the tests in JDK9, however if we want to run above 
JDK8, I would propose JDK11, because we've set up the infrastructure for JDK11 
already.
However, I'm generally not convinced that this is the right approach to fix the 
issue. We would loose test coverage for JDK8 for the blink planner tests. I 
would rather first look into running the tests serially. The blink planner 
tests currently need ~30m. I believe it would not be an issue to double that 
time.


> TableEnvironmentITCase is crashing on Travis
> 
>
> Key: FLINK-16636
> URL: https://issues.apache.org/jira/browse/FLINK-16636
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Jark Wu
>Assignee: Caizhi Weng
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Here is the instance and exception stack: 
> https://api.travis-ci.org/v3/job/663408376/log.txt
> But there is not too much helpful information there, maybe a accidental maven 
> problem.
> {code}
> 09:55:07.703 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test 
> (integration-tests) on project flink-table-planner-blink_2.11: There are test 
> failures.
> 09:55:07.703 [ERROR] 
> 09:55:07.703 [ERROR] Please refer to 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports
>  for the individual test results.
> 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, 
> [date]-jvmRun[N].dump and [date].dumpstream.
> 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without 
> properly saying goodbye. VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-23 Thread Kurt Young (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090646#comment-17090646
 ] 

Kurt Young edited comment on FLINK-16636 at 4/23/20, 2:13 PM:
--

[~rmetzger] Is it possible to use JDK9 for executing these cases?


was (Author: ykt836):
[~rmetzger] Is it possible to use JDK9 to executing these cases?

> TableEnvironmentITCase is crashing on Travis
> 
>
> Key: FLINK-16636
> URL: https://issues.apache.org/jira/browse/FLINK-16636
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Jark Wu
>Assignee: Caizhi Weng
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Here is the instance and exception stack: 
> https://api.travis-ci.org/v3/job/663408376/log.txt
> But there is not too much helpful information there, maybe a accidental maven 
> problem.
> {code}
> 09:55:07.703 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test 
> (integration-tests) on project flink-table-planner-blink_2.11: There are test 
> failures.
> 09:55:07.703 [ERROR] 
> 09:55:07.703 [ERROR] Please refer to 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports
>  for the individual test results.
> 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, 
> [date]-jvmRun[N].dump and [date].dumpstream.
> 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without 
> properly saying goodbye. VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:382)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
> 09:55:07.704 [ERROR] at 
> 

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-23 Thread Caizhi Weng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090348#comment-17090348
 ] 

Caizhi Weng edited comment on FLINK-16636 at 4/23/20, 7:39 AM:
---

For just solving this issue we can force a full GC after large IT cases such as 
{{HashAggITCase}}. But I'm afraid other test cases will eventually trigger 
similar issues with the developing going on.

[~rmetzger] is it possible to log current memory usage (including heap, 
total_vm, rss, etc.) after each test case (maybe add some functionality into 
{{TestLogger}})? I see that in daily cron tests blink planner tests will run 
with both jdk8 and jdk11. The memory leak of {{ProcessBuilder}} is fixed in 
jdk9 according to the [bug 
report|https://bugs.openjdk.java.net/browse/JDK-8054841], if the memory usage 
with different jdk varies greatly then it might be the case. With this we can 
also discover which test case is using a lot of native memory.


was (Author: tsreaper):
For just solving this issue we can force a full GC after large IT cases such as 
{{HashAggITCase}}. But I'm afraid other test cases will eventually trigger 
similar issues with the developing going on.

[~rmetzger] is it possible to log current memory usage (including heap, 
total_vm, rss, etc.) after each test case (maybe add some functionality into 
{{TestLogger}})? I see that in daily cron tests blink planner tests will run 
with both jdk8 and jdk11. The memory leak of {{ProcessBuilder}} is fixed in 
jdk9 according to the [bug 
report|https://bugs.openjdk.java.net/browse/JDK-8054841], if the memory usage 
with different jdk varies greatly then it might be the case.

> TableEnvironmentITCase is crashing on Travis
> 
>
> Key: FLINK-16636
> URL: https://issues.apache.org/jira/browse/FLINK-16636
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Jark Wu
>Assignee: Caizhi Weng
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Here is the instance and exception stack: 
> https://api.travis-ci.org/v3/job/663408376/log.txt
> But there is not too much helpful information there, maybe a accidental maven 
> problem.
> {code}
> 09:55:07.703 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test 
> (integration-tests) on project flink-table-planner-blink_2.11: There are test 
> failures.
> 09:55:07.703 [ERROR] 
> 09:55:07.703 [ERROR] Please refer to 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports
>  for the individual test results.
> 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, 
> [date]-jvmRun[N].dump and [date].dumpstream.
> 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without 
> properly saying goodbye. VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-23 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090334#comment-17090334
 ] 

Robert Metzger edited comment on FLINK-16636 at 4/23/20, 7:17 AM:
--

Thanks a lot for looking so deeply into this. Sadly, I don't know enough about 
JVM memory management. Maybe [~azagrebin] has an idea, as he has worked on 
Flink's memory management a lot recently. 


was (Author: rmetzger):
Thanks a lot for looking to deeply into this. Sadly, I don't know enough about 
JVM memory management. Maybe [~azagrebin] has an idea, as he has worked on 
Flink's memory management a lot recently. 

> TableEnvironmentITCase is crashing on Travis
> 
>
> Key: FLINK-16636
> URL: https://issues.apache.org/jira/browse/FLINK-16636
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Jark Wu
>Assignee: Caizhi Weng
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Here is the instance and exception stack: 
> https://api.travis-ci.org/v3/job/663408376/log.txt
> But there is not too much helpful information there, maybe a accidental maven 
> problem.
> {code}
> 09:55:07.703 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test 
> (integration-tests) on project flink-table-planner-blink_2.11: There are test 
> failures.
> 09:55:07.703 [ERROR] 
> 09:55:07.703 [ERROR] Please refer to 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports
>  for the individual test results.
> 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, 
> [date]-jvmRun[N].dump and [date].dumpstream.
> 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without 
> properly saying goodbye. VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:382)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
> 09:55:07.704 [ERROR] at 
> 

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-23 Thread Caizhi Weng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088467#comment-17088467
 ] 

Caizhi Weng edited comment on FLINK-16636 at 4/23/20, 6:38 AM:
---

I've tried to run the IT cases on my local machine and read some surefire 
source code. Although I'm not 100% sure  about the reason of this problem, the 
following is my guessing:

Surefire plugin needs to periodically check whether the maven process (its 
parent process) is still running so that it can stop itself when the maven 
process unexpectedly exits (see {{processCheckerJob}} in [this 
code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]).
 To achieve this, it periodically calls 
[{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112]
 and finally goes into 
[{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163]
 method. In this method, it builds up a shell command with a {{ProcessBuilder}} 
and executes it with {{UNIXProcess.forkAndExec}} (see stack in [[~rmetzger]'s 
post|https://issues.apache.org/jira/browse/FLINK-16636?focusedCommentId=17087385=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17087385]).

-It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap 
size). But as the shell command is executed by forking a new process, when this 
surefire check happens, it is possible that the memory usage doubles, thus 
reaching the hard 8GB limit of the container.-

-There are some IT cases which uses a real lot of heap memory (especially 
{{HashAggITCase}}), but according to my local {{top}} command, currently the 
memory usage of all tests seems to be precisely a bit lower than what is needed 
to trigger a full GC (memory usage of each JVM goes up to about 2000MB). As 
{{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two 
test cases to run, the memory failure is most likely to happen in their turn.-

-So it seems that the only thing that goes wrong is the heap limit we apply to 
the testing JVMs. I think lowering the heap limit a bit can solve this issue.-


was (Author: tsreaper):
I've tried to run the IT cases on my local machine and read some surefire 
source code. Although I'm not 100% sure  about the reason of this problem, the 
following is my guessing:

Surefire plugin needs to periodically check whether the maven process (its 
parent process) is still running so that it can stop itself when the maven 
process unexpectedly exits (see {{processCheckerJob}} in [this 
code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]).
 To achieve this, it periodically calls 
[{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112]
 and finally goes into 
[{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163]
 method. In this method, it builds up a shell command with a {{ProcessBuilder}} 
and executes it with {{UNIXProcess.forkAndExec}} (see stack in [[~rmetzger]'s 
post|https://issues.apache.org/jira/browse/FLINK-16636?focusedCommentId=17087385=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17087385]).

It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap 
size). But as the shell command is executed by forking a new process, when this 
surefire check happens, it is possible that the memory usage doubles, thus 
reaching the hard 8GB limit of the container.

There are some IT cases which uses a real lot of heap memory (especially 
{{HashAggITCase}}), but according to my local {{top}} command, currently the 
memory usage of all tests seems to be precisely a bit lower than what is needed 
to trigger a full GC (memory usage of each JVM goes up to about 2000MB). As 
{{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two 
test cases to run, the memory failure is most likely to happen in their turn.

So it seems that the only thing that goes wrong is the heap limit we apply to 
the testing JVMs. I think lowering the heap limit a bit can solve this issue.

> TableEnvironmentITCase is crashing on Travis
> 
>
> Key: FLINK-16636
> URL: https://issues.apache.org/jira/browse/FLINK-16636
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-21 Thread Caizhi Weng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088467#comment-17088467
 ] 

Caizhi Weng edited comment on FLINK-16636 at 4/21/20, 9:39 AM:
---

I've tried to run the IT cases on my local machine and read some surefire 
source code. Although I'm not 100% sure  about the reason of this problem, the 
following is my guessing:

Surefire plugin needs to periodically check whether the maven process (its 
parent process) is still running so that it can stop itself when the maven 
process unexpectedly exits (see {{processCheckerJob}} in [this 
code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]).
 To achieve this, it periodically calls 
[{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112]
 and finally goes into 
[{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163]
 method. In this method, it builds up a shell command with a {{ProcessBuilder}} 
and executes it with {{UNIXProcess.forkAndExec}} (see stack in [[~rmetzger]'s 
post|https://issues.apache.org/jira/browse/FLINK-16636?focusedCommentId=17087385=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17087385]).

It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap 
size). But as the shell command is executed by forking a new process, when this 
surefire check happens, it is possible that the memory usage doubles, thus 
reaching the hard 8GB limit of the container.

There are some IT cases which uses a real lot of heap memory (especially 
{{HashAggITCase}}), but according to my local {{top}} command, currently the 
memory usage of all tests seems to be precisely a bit lower than what is needed 
to trigger a full GC (memory usage of each JVM goes up to about 2000MB). As 
{{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two 
test cases to run, the memory failure is most likely to happen in their turn.

So it seems that the only thing that goes wrong is the heap limit we apply to 
the testing JVMs. I think lowering the heap limit a bit can solve this issue.


was (Author: tsreaper):
I've tried to run the IT cases on my local machine and read some surefire 
source code. Although I'm not 100% sure  about the reason of this problem, the 
following is my guessing:

Surefire plugin needs to periodically check whether the maven process (its 
parent process) is still running so that it can stop itself when the maven 
process unexpectedly exits (see {{processCheckerJob}} in [this 
code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]).
 To achieve this, it periodically calls 
[{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112]
 and finally goes into 
[{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163]
 method. In this method, it builds up a shell command with a {{ProcessBuilder}} 
and executes it with {{UNIXProcess.forkAndExec}}.

It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap 
size). But as the shell command is executed by forking a new process, when this 
surefire check happens, it is possible that the memory usage doubles, thus 
reaching the hard 8GB limit of the container.

There are some IT cases which uses a real lot of heap memory (especially 
{{HashAggITCase}}), but according to my local {{top}} command, currently the 
memory usage of all tests seems to be precisely a bit lower than what is needed 
to trigger a full GC (memory usage of each JVM goes up to about 2000MB). As 
{{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two 
test cases to run, the memory failure is most likely to happen in their turn.

So it seems that the only thing that goes wrong is the heap limit we apply to 
the testing JVMs. I think lowering the heap limit a bit can solve this issue.

> TableEnvironmentITCase is crashing on Travis
> 
>
> Key: FLINK-16636
> URL: https://issues.apache.org/jira/browse/FLINK-16636
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Jark Wu
>Assignee: Caizhi Weng
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-21 Thread Caizhi Weng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088467#comment-17088467
 ] 

Caizhi Weng edited comment on FLINK-16636 at 4/21/20, 9:19 AM:
---

I've tried to run the IT cases on my local machine and read some surefire 
source code. Although I'm not 100% sure  about the reason of this problem, the 
following is my guessing:

Surefire plugin needs to periodically check whether the maven process (its 
parent process) is still running so that it can stop itself when the maven 
process unexpectedly exits (see {{processCheckerJob}} in [this 
code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]).
 To achieve this, it periodically calls 
[{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112]
 and finally goes into 
[{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163]
 method. In this method, it builds up a shell command with a {{ProcessBuilder}} 
and executes it with {{UNIXProcess.forkAndExec}}.

It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap 
size). But as the shell command is executed by forking a new process, when this 
surefire check happens, it is possible that the memory usage doubles, thus 
reaching the hard 8GB limit of the container.

There are some IT cases which uses a real lot of heap memory (especially 
{{HashAggITCase}}), but according to my local {{top}} command, currently the 
memory usage of all tests seems to be precisely a bit lower than what is needed 
to trigger a full GC (memory usage of each JVM goes up to about 2000MB). As 
{{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two 
test cases to run, the memory failure is most likely to happen in their turn.

So it seems that the only thing that goes wrong is the heap limit we apply to 
the testing JVMs. I think lowering the heap limit a bit can solve this issue.


was (Author: tsreaper):
I've tried to run the IT cases on my local machine and read some surefire 
source code. Although I'm not 100% sure  about the reason of this problem, the 
following is my guessing:

Surefire plugin needs to periodically check whether the maven process (its 
parent process) is still running so that it can stop itself when the maven 
process unexpectedly exits (see {{processCheckerJob}} in [this 
code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]).
 To achieve this, it periodically calls 
[{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112]
 and finally goes into 
[{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163]
 method. In this method, it builds up a shell command with a {{ProcessBuilder}} 
and executes it with {{UNIXProcess.forkAndExec}}.

It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap 
size). But as the shell command is executed by forking a new process, when this 
surefire check happens, it is possible that the memory usage doubles, thus 
reaching the hard 8GB limit of the container.

There are some IT cases which uses a real lot of heap memory (especially 
{{HashAggITCase}}), but according to my local {{top}} command, currently the 
memory usage of all tests seems to be precisely a bit lower than what is needed 
to trigger a full GC (memory usage of each JVM goes up to about 2200MB). As 
{{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two 
test cases to run, the memory failure is most likely to happen in their turn.

So it seems that the only thing that goes wrong is the heap limit we apply to 
the testing JVMs. I think lowering the heap limit a bit can solve this issue.

> TableEnvironmentITCase is crashing on Travis
> 
>
> Key: FLINK-16636
> URL: https://issues.apache.org/jira/browse/FLINK-16636
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Jark Wu
>Assignee: Caizhi Weng
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Here is the instance and exception stack: 
> https://api.travis-ci.org/v3/job/663408376/log.txt
> But there is not too much helpful information there, maybe a accidental 

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-21 Thread Caizhi Weng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088467#comment-17088467
 ] 

Caizhi Weng edited comment on FLINK-16636 at 4/21/20, 9:18 AM:
---

I've tried to run the IT cases on my local machine and read some surefire 
source code. Although I'm not 100% sure  about the reason of this problem, the 
following is my guessing:

Surefire plugin needs to periodically check whether the maven process (its 
parent process) is still running so that it can stop itself when the maven 
process unexpectedly exits (see {{processCheckerJob}} in [this 
code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]).
 To achieve this, it periodically calls 
[{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112]
 and finally goes into 
[{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163]
 method. In this method, it builds up a shell command with a {{ProcessBuilder}} 
and executes it with {{UNIXProcess.forkAndExec}}.

It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap 
size). But as the shell command is executed by forking a new process, when this 
surefire check happens, it is possible that the memory usage doubles, thus 
reaching the hard 8GB limit of the container.

There are some IT cases which uses a real lot of heap memory (especially 
{{HashAggITCase}}), but according to my local {{top}} command, currently the 
memory usage of all tests seems to be precisely a bit lower than what is needed 
to trigger a full GC (memory usage of each JVM goes up to about 2200MB). As 
{{TableEnvironmentITCase}} and {{TableUtilsStreamingITCase}} are the last two 
test cases to run, the memory failure is most likely to happen in their turn.

So it seems that the only thing that goes wrong is the heap limit we apply to 
the testing JVMs. I think lowering the heap limit a bit can solve this issue.


was (Author: tsreaper):
I've tried to run the IT cases on my local machine and read some surefire 
source code. Although I'm not 100% sure  about the reason of this problem, the 
following is my guessing:

Surefire plugin needs to periodically check whether the maven process (its 
parent process) is still running so that it can stop itself when the maven 
process unexpectedly exits (see {{processCheckerJob}} in [this 
code|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/ForkedBooter.java]).
 To achieve this, it periodically calls 
[{{PpidChecker#isProcessAlive}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L112]
 and finally goes into 
[{{unix}}|https://github.com/apache/maven-surefire/blob/master/surefire-booter/src/main/java/org/apache/maven/surefire/booter/PpidChecker.java#L163]
 method. In this method, it builds up a shell command with a {{ProcessBuilder}} 
and executes it with {{UNIXProcess.forkAndExec}}.

It's true that we've limited each JVM with 2GB heap size (a total of 4GB heap 
size). But as the shell command is executed by forking a new process, when this 
surefire check happens, it is possible that the memory usage doubles, thus 
reaching the hard 8GB limit of the container.

There are some IT cases which uses a real lot of heap memory (especially 
{{HashAggITCase}}), but according to my local {{top}} command, currently the 
memory usage of all tests seems to be precisely a bit lower than what is needed 
to trigger a full GC. As {{TableEnvironmentITCase}} and 
{{TableUtilsStreamingITCase}} are the last two test cases to run, the memory 
failure is most likely to happen in their turn.

So it seems that the only thing that goes wrong is the heap limit we apply to 
the testing JVMs. I think lowering the heap limit a bit can solve this issue.

> TableEnvironmentITCase is crashing on Travis
> 
>
> Key: FLINK-16636
> URL: https://issues.apache.org/jira/browse/FLINK-16636
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Jark Wu
>Assignee: Caizhi Weng
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Here is the instance and exception stack: 
> https://api.travis-ci.org/v3/job/663408376/log.txt
> But there is not too much helpful information there, maybe a accidental maven 
> problem.
> {code}
> 09:55:07.703 [ERROR] 

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-21 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088330#comment-17088330
 ] 

Robert Metzger edited comment on FLINK-16636 at 4/21/20, 6:17 AM:
--

bq. About the memory usage, what's the total memory size for each of the 
testing Linux machines? What's the memory size each JVM requires and how many 
JVMs will run in parallel when testing?

On travis, 8 GB of main memory.

By default, surefire configures 2GB heap limit, and it runs 2 forks per CPU
See: https://github.com/apache/flink/blob/master/pom.xml#L1486 / 
https://github.com/apache/flink/blob/master/pom.xml#L99 / 
https://maven.apache.org/surefire/maven-surefire-plugin/test-mojo.html#forkCount


was (Author: rmetzger):
About the memory usage, what's the total memory size for each of the testing 
Linux machines? What's the memory size each JVM requires and how many JVMs will 
run in parallel when testing?

On travis, 8 GB of main memory.

By default, surefire configures 2GB heap limit, and it runs 2 forks per CPU
See: https://github.com/apache/flink/blob/master/pom.xml#L1486 / 
https://github.com/apache/flink/blob/master/pom.xml#L99 / 
https://maven.apache.org/surefire/maven-surefire-plugin/test-mojo.html#forkCount

> TableEnvironmentITCase is crashing on Travis
> 
>
> Key: FLINK-16636
> URL: https://issues.apache.org/jira/browse/FLINK-16636
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Jark Wu
>Assignee: Caizhi Weng
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Here is the instance and exception stack: 
> https://api.travis-ci.org/v3/job/663408376/log.txt
> But there is not too much helpful information there, maybe a accidental maven 
> problem.
> {code}
> 09:55:07.703 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test 
> (integration-tests) on project flink-table-planner-blink_2.11: There are test 
> failures.
> 09:55:07.703 [ERROR] 
> 09:55:07.703 [ERROR] Please refer to 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports
>  for the individual test results.
> 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, 
> [date]-jvmRun[N].dump and [date].dumpstream.
> 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without 
> properly saying goodbye. VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:382)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297)
> 09:55:07.704 [ERROR] at 
> 

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-21 Thread Caizhi Weng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088308#comment-17088308
 ] 

Caizhi Weng edited comment on FLINK-16636 at 4/21/20, 6:06 AM:
---

Hi [~rmetzger] thanks for reporting. Is the heap dump available now? I didn't 
find heap dump files in the .tar.gz.

>From my current investigation it seems to be caused by memory leaks. According 
>to 
>[this|https://stackoverflow.com/questions/54755846/killing-self-fork-jvm-ping-timeout-elapsed]
> and 
>[this|https://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error-12-cannot-allocate-memory-calling-run]
> stack overflow posts, maven will fork another JVM process to run tests.

So if the current JVM grabs too much memory, the forked JVM will also require 
this much memory. If the OS cannot allocate this much memory, 
"java.io.IOException: error=12, Cannot allocate memory" will occur. I'm going 
to investigate on why the memory usage is high. It would be helpful if a heap 
dump from jmap or such is available.

About the memory usage, what's the total memory size for each of the testing 
Linux machines? What's the memory size each JVM requires and how many JVMs will 
run in parallel when testing?


was (Author: tsreaper):
Hi [~rmetzger] thanks for reporting. Is the heap dump available now? I didn't 
find heap dump files in the .tar.gz.

>From my current investigation it seems to be caused by memory leaks. According 
>to 
>[this|https://stackoverflow.com/questions/54755846/killing-self-fork-jvm-ping-timeout-elapsed]
> and 
>[this|https://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error-12-cannot-allocate-memory-calling-run]
> stack overflow posts, maven will fork another JVM process to run tests.

So if the current JVM grabs too much memory, the forked JVM will also require 
this much memory. If the OS cannot allocate this much memory, 
"java.io.IOException: error=12, Cannot allocate memory" will occur. I'm going 
to investigate on why the memory usage is high. It would be helpful if a heap 
dump from jmap or such is available.

> TableEnvironmentITCase is crashing on Travis
> 
>
> Key: FLINK-16636
> URL: https://issues.apache.org/jira/browse/FLINK-16636
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Jark Wu
>Assignee: Caizhi Weng
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Here is the instance and exception stack: 
> https://api.travis-ci.org/v3/job/663408376/log.txt
> But there is not too much helpful information there, maybe a accidental maven 
> problem.
> {code}
> 09:55:07.703 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test 
> (integration-tests) on project flink-table-planner-blink_2.11: There are test 
> failures.
> 09:55:07.703 [ERROR] 
> 09:55:07.703 [ERROR] Please refer to 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports
>  for the individual test results.
> 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, 
> [date]-jvmRun[N].dump and [date].dumpstream.
> 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without 
> properly saying goodbye. VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> 

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-20 Thread Caizhi Weng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088308#comment-17088308
 ] 

Caizhi Weng edited comment on FLINK-16636 at 4/21/20, 5:58 AM:
---

Hi [~rmetzger] thanks for reporting. Is the heap dump available now? I didn't 
find heap dump files in the .tar.gz.

>From my current investigation it seems to be caused by memory leaks. According 
>to 
>[this|https://stackoverflow.com/questions/54755846/killing-self-fork-jvm-ping-timeout-elapsed]
> and 
>[this|https://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error-12-cannot-allocate-memory-calling-run]
> stack overflow posts, maven will fork another JVM process to run tests.

So if the current JVM grabs too much memory, the forked JVM will also require 
this much memory. If the OS cannot allocate this much memory, 
"java.io.IOException: error=12, Cannot allocate memory" will occur. I'm going 
to investigate on why the memory usage is high. It would be helpful if a heap 
dump from jmap or such is available.


was (Author: tsreaper):
Hi [~rmetzger] thanks for reporting. Is the heap dump available now? I didn't 
find heap dump files in the .tar.gz

> TableEnvironmentITCase is crashing on Travis
> 
>
> Key: FLINK-16636
> URL: https://issues.apache.org/jira/browse/FLINK-16636
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Jark Wu
>Assignee: Caizhi Weng
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Here is the instance and exception stack: 
> https://api.travis-ci.org/v3/job/663408376/log.txt
> But there is not too much helpful information there, maybe a accidental maven 
> problem.
> {code}
> 09:55:07.703 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test 
> (integration-tests) on project flink-table-planner-blink_2.11: There are test 
> failures.
> 09:55:07.703 [ERROR] 
> 09:55:07.703 [ERROR] Please refer to 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports
>  for the individual test results.
> 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, 
> [date]-jvmRun[N].dump and [date].dumpstream.
> 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without 
> properly saying goodbye. VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:382)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
> 

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-19 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087385#comment-17087385
 ] 

Robert Metzger edited comment on FLINK-16636 at 4/20/20, 5:51 AM:
--

Instance in master build on Travis: 
https://travis-ci.org/github/apache/flink/jobs/676881597
The debugging files 
(https://s3.amazonaws.com/flink-logs-us/travis-artifacts/apache/flink/43406/43406.5.tar.gz)
 contain this:
{code}
# Created at 2020-04-19T23:07:44.872
java.io.IOException: Stream closed
at 
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:283)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.Reader.read(Reader.java:100)
at java.util.Scanner.readInput(Scanner.java:804)
at java.util.Scanner.findWithinHorizon(Scanner.java:1685)
at java.util.Scanner.hasNextLine(Scanner.java:1500)
at 
org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
at 
org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
at 
org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
at 
org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


# Created at 2020-04-19T23:07:44.879
System.exit() or native command error interrupted process checker.
java.lang.IllegalStateException: error [STOPPED] to read process 14686
at 
org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
at 
org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


# Created at 2020-04-19T23:20:36.789
java.io.IOException: Cannot run program "/bin/sh": error=12, Cannot allocate 
memory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at 
org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:351)
at 
org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
at 
org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
at 
org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=12, Cannot allocate memory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 11 more


# 

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-17 Thread Yu Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085786#comment-17085786
 ] 

Yu Li edited comment on FLINK-16636 at 4/17/20, 2:01 PM:
-

Another instance in release-1.10 crone build: 
https://api.travis-ci.org/v3/job/675870831/log.txt


was (Author: carp84):
Another instance in release-1.10 crone job: 
https://api.travis-ci.org/v3/job/675870831/log.txt

> TableEnvironmentITCase is crashing on Travis
> 
>
> Key: FLINK-16636
> URL: https://issues.apache.org/jira/browse/FLINK-16636
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Jark Wu
>Assignee: Caizhi Weng
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Here is the instance and exception stack: 
> https://api.travis-ci.org/v3/job/663408376/log.txt
> But there is not too much helpful information there, maybe a accidental maven 
> problem.
> {code}
> 09:55:07.703 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test 
> (integration-tests) on project flink-table-planner-blink_2.11: There are test 
> failures.
> 09:55:07.703 [ERROR] 
> 09:55:07.703 [ERROR] Please refer to 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports
>  for the individual test results.
> 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, 
> [date]-jvmRun[N].dump and [date].dumpstream.
> 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without 
> properly saying goodbye. VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:382)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
> 09:55:07.704 [ERROR] at 
> 

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-09 Thread Caizhi Weng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079109#comment-17079109
 ] 

Caizhi Weng edited comment on FLINK-16636 at 4/9/20, 9:40 AM:
--

[~ykt836] NPE is caused by {{ScalarQueryITCase#testScalarSubQueryException}}. 
This is a test case with one node intended to fail. If the node containing the 
sink is up but not initialized when the job fails, {{Utils.CollectHelper#open}} 
will not be called but {{Utils.CollectHelper#close}} will be called soon, thus 
causing the NPE.

But this does not seem to be related with the crashing of 
{{TableEnvironmentITCase}} (as this NPE also occurs when the blink_planner test 
module passes all the tests). Currently I suspect that 
{{TableUtilsStreamingITCase}} is just accidental. I'm going to fix the NPE 
problem first and then extend {{TableEnvironmentITCase}}, 
{{BatchAbstractTestBase}} and {{TableUtilsStreamingITCase}} with {{TestLogger}} 
class so that when this problem arises next time, we can have a more detailed 
log.

[~rmetzger] Did you notice anything abnormal with the JVM heap?


was (Author: tsreaper):
[~ykt836] NPE is caused by {{ScalarQueryITCase#testScalarSubQueryException}}. 
This is a test case with one node intended to fail. If the node containing the 
sink is up but not initialized when the job fails, {{Utils.CollectHelper#open}} 
will not be called but {{Utils.CollectHelper#close}} will be called soon, thus 
causing the NPE.

But this does not seem to be related with the crashing of 
{{TableEnvironmentITCase}}. Currently I suspect that 
{{TableUtilsStreamingITCase}} is just accidental. I'm going to fix the NPE 
problem first and then extend {{TableEnvironmentITCase}}, 
{{BatchAbstractTestBase}} and {{TableUtilsStreamingITCase}} with {{TestLogger}} 
class so that when this problem arises next time, we can have a more detailed 
log.

[~rmetzger] Did you notice anything abnormal with the JVM heap?

> TableEnvironmentITCase is crashing on Travis
> 
>
> Key: FLINK-16636
> URL: https://issues.apache.org/jira/browse/FLINK-16636
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Jark Wu
>Assignee: Caizhi Weng
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.11.0
>
>
> Here is the instance and exception stack: 
> https://api.travis-ci.org/v3/job/663408376/log.txt
> But there is not too much helpful information there, maybe a accidental maven 
> problem.
> {code}
> 09:55:07.703 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test 
> (integration-tests) on project flink-table-planner-blink_2.11: There are test 
> failures.
> 09:55:07.703 [ERROR] 
> 09:55:07.703 [ERROR] Please refer to 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports
>  for the individual test results.
> 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, 
> [date]-jvmRun[N].dump and [date].dumpstream.
> 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without 
> properly saying goodbye. VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting 

[jira] [Comment Edited] (FLINK-16636) TableEnvironmentITCase is crashing on Travis

2020-04-07 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077172#comment-17077172
 ] 

Robert Metzger edited comment on FLINK-16636 at 4/7/20, 12:35 PM:
--

The change is merged. Builds coming in tonight might reveal additional 
information.

Another failure: 
https://dev.azure.com/rmetzger/Flink/_build/results?buildId=7139=logs=e25d5e7e-2a9c-5589-4940-0b638d75a414=294c2388-20e6-57a2-5721-91db544b1e69

https://travis-ci.org/github/apache/flink/jobs/671996427?utm_medium=notification_source=slack


was (Author: rmetzger):
The change is merged. Builds coming in tonight might reveal additional 
information.

Another failure: 
https://dev.azure.com/rmetzger/Flink/_build/results?buildId=7139=logs=e25d5e7e-2a9c-5589-4940-0b638d75a414=294c2388-20e6-57a2-5721-91db544b1e69

> TableEnvironmentITCase is crashing on Travis
> 
>
> Key: FLINK-16636
> URL: https://issues.apache.org/jira/browse/FLINK-16636
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Jark Wu
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.11.0
>
>
> Here is the instance and exception stack: 
> https://api.travis-ci.org/v3/job/663408376/log.txt
> But there is not too much helpful information there, maybe a accidental maven 
> problem.
> {code}
> 09:55:07.703 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test 
> (integration-tests) on project flink-table-planner-blink_2.11: There are test 
> failures.
> 09:55:07.703 [ERROR] 
> 09:55:07.703 [ERROR] Please refer to 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire-reports
>  for the individual test results.
> 09:55:07.703 [ERROR] Please refer to dump files (if any exist) [date].dump, 
> [date]-jvmRun[N].dump and [date].dumpstream.
> 09:55:07.703 [ERROR] ExecutionException The forked VM terminated without 
> properly saying goodbye. VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> 09:55:07.703 [ERROR] Command was /bin/sh -c cd 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target 
> && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m 
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire/surefirebooter714252487017838305.jar
>  
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/target/surefire
>  2020-03-17T09-34-41_826-jvmRun1 surefire4625103637332937565tmp 
> surefire_43192129054983363633tmp
> 09:55:07.703 [ERROR] Error occurred in starting fork, check output in log
> 09:55:07.703 [ERROR] Process Exit Code: 137
> 09:55:07.703 [ERROR] Crashed tests:
> 09:55:07.703 [ERROR] org.apache.flink.table.api.TableEnvironmentITCase
> 09:55:07.703 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:382)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
> 09:55:07.704 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
> 09:55:07.704 [ERROR] at 
>