[ 
https://issues.apache.org/jira/browse/FLINK-25480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510448#comment-17510448
 ] 

Aitozi edited comment on FLINK-25480 at 3/22/22, 12:16 PM:
-----------------------------------------------------------

FYI, I encounter the same problem with 1.14.4 when running test in container. I 
test in 16C 32G container, and {{mvn -Dflink.forkCount=2 
-Dflink.forkCountTestPackage=2 -Dmaven.test.failure.ignore=true verify}} 
command exit 137 finally.
At the meantime, I opened another screen to run  {{vsar --cpu --mem -l}} to 
monitor the memory usage. But I still not catch the memory stroke. 
Hope to be helpful to you guys. I'm curious about the root cause, because it 
stopped me from building our stable CI pipeline.


was (Author: aitozi):
FYI, I encounter the same problem with 1.14.4 when running test in container. I 
test in 16C 32G container, and {{mvn -Dflink.forkCount=2 
-Dflink.forkCountTestPackage=2 -Dmaven.test.failure.ignore=true verify}} 
command exit 137 finally. At the meantime, I opened another screen to run  
{{vsar --cpu --mem -l}} to monitor the memory usage. But I still not catch the 
memory stroke. Hope to be helpful to your guys. I'm curious about it, because 
it stop me from building our stable CI pipeline.

> Create dashboard/monitoring to see resource usage per E2E test
> --------------------------------------------------------------
>
>                 Key: FLINK-25480
>                 URL: https://issues.apache.org/jira/browse/FLINK-25480
>             Project: Flink
>          Issue Type: Improvement
>          Components: Test Infrastructure
>    Affects Versions: 1.15.0, 1.13.6, 1.14.3
>            Reporter: Martijn Visser
>            Priority: Critical
>              Labels: test-stability
>
> Over the past couple of weeks, we've encountered multiple problems with tests 
> failing due to out-of-memory errors and/or exit code 137 happening. These are 
> happening both on Alibaba CI machines, as well as Azure hosted agents. For 
> the Alibaba CI machines, we've mitigated the problem by reducing the number 
> of workers per CI machine from 7 to 5. These workers can spin up multiple 
> Docker containers, especially with Testcontainers getting used more and more. 
> If we can get insights in the resource usage per end-to-end test, it will 
> also help in debugging test infrastructure problems more quickly. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to