[ https://issues.apache.org/jira/browse/FLINK-19436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228947#comment-17228947 ]
Leonard Xu edited comment on FLINK-19436 at 11/10/20, 3:27 AM: --------------------------------------------------------------- Hi, [~rmetzger] I'm pretty sure that, all failed log showed there're more than one TMS, but the TPC-DS test only starts 1 TM. !image-2020-11-10-11-08-53-199.png! And from my PR, you can see the only one TM is closing. !image-2020-11-10-11-09-20-534.png! If e2e bash script doesn't `call clean_up()` the JM and TMs it started, the running JM and TMs will be closed in `test-runner-common.sh#shutdown_all()`, `test-runner-common.sh#shutdown_all()` using `kill -9 ` to stop the process without any check which may fail, I think this is why the test is unstable. was (Author: leonard xu): Hi, [~rmetzger] I'm pretty sure that, all failed log showed there're more than one TMS, but the TPC-DS test only start 1 TM. !image-2020-11-10-11-08-53-199.png! And from my PR, you can see the only one TM is closing. !image-2020-11-10-11-09-20-534.png! If e2e bash script doesn't call clean up the JM and TMs it starts, the running JM and TMs will close in test-runner-common.sh#shutdown_all(), test-runner-common.sh#shutdown_all() using `kill -9 ` without check to stop the process which may fail, I think this is why the test is unstable. > TPC-DS end-to-end test (Blink planner) failed during shutdown > ------------------------------------------------------------- > > Key: FLINK-19436 > URL: https://issues.apache.org/jira/browse/FLINK-19436 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner, Tests > Affects Versions: 1.11.0, 1.12.0 > Reporter: Dian Fu > Assignee: Leonard Xu > Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.12.0 > > Attachments: image-2020-11-10-11-08-53-199.png, > image-2020-11-10-11-09-20-534.png > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=7009&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=2b7514ee-e706-5046-657b-3430666e7bd9 > {code} > 2020-09-27T22:37:53.2236467Z Stopping taskexecutor daemon (pid: 2992) on host > fv-az655. > 2020-09-27T22:37:53.4450715Z Stopping standalonesession daemon (pid: 2699) on > host fv-az655. > 2020-09-27T22:37:53.8014537Z Skipping taskexecutor daemon (pid: 11173), > because it is not running anymore on fv-az655. > 2020-09-27T22:37:53.8019740Z Skipping taskexecutor daemon (pid: 11561), > because it is not running anymore on fv-az655. > 2020-09-27T22:37:53.8022857Z Skipping taskexecutor daemon (pid: 11849), > because it is not running anymore on fv-az655. > 2020-09-27T22:37:53.8023616Z Skipping taskexecutor daemon (pid: 12180), > because it is not running anymore on fv-az655. > 2020-09-27T22:37:53.8024327Z Skipping taskexecutor daemon (pid: 12950), > because it is not running anymore on fv-az655. > 2020-09-27T22:37:53.8025027Z Skipping taskexecutor daemon (pid: 13472), > because it is not running anymore on fv-az655. > 2020-09-27T22:37:53.8025727Z Skipping taskexecutor daemon (pid: 16577), > because it is not running anymore on fv-az655. > 2020-09-27T22:37:53.8026417Z Skipping taskexecutor daemon (pid: 16959), > because it is not running anymore on fv-az655. > 2020-09-27T22:37:53.8027086Z Skipping taskexecutor daemon (pid: 17250), > because it is not running anymore on fv-az655. > 2020-09-27T22:37:53.8027770Z Skipping taskexecutor daemon (pid: 17601), > because it is not running anymore on fv-az655. > 2020-09-27T22:37:53.8028400Z Stopping taskexecutor daemon (pid: 18438) on > host fv-az655. > 2020-09-27T22:37:53.8029314Z > /home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/bin/taskmanager.sh: > line 99: 18438 Terminated "${FLINK_BIN_DIR}"/flink-daemon.sh > $STARTSTOP $ENTRYPOINT "${ARGS[@]}" > 2020-09-27T22:37:53.8029895Z [FAIL] Test script contains errors. > 2020-09-27T22:37:53.8032092Z Checking for errors... > 2020-09-27T22:37:55.3713368Z No errors in log files. > 2020-09-27T22:37:55.3713935Z Checking for exceptions... > 2020-09-27T22:37:56.9046391Z No exceptions in log files. > 2020-09-27T22:37:56.9047333Z Checking for non-empty .out files... > 2020-09-27T22:37:56.9064402Z No non-empty .out files. > 2020-09-27T22:37:56.9064859Z > 2020-09-27T22:37:56.9065588Z [FAIL] 'TPC-DS end-to-end test (Blink planner)' > failed after 16 minutes and 54 seconds! Test exited with exit code 1 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)