[ https://issues.apache.org/jira/browse/MESOS-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958884#comment-16958884 ]
Andrei Sekretenko edited comment on MESOS-9483 at 10/24/19 1:15 PM: -------------------------------------------------------------------- Observed a similar failure again in the same environment: {noformat} [ RUN ] ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors I1023 17:37:33.530347 384742848 zookeeper_test_server.cpp:156] Started ZooKeeperTestServer on port 54301 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:54301 sessionTimeout=10000 watcher=0x119c86ac0 sessionId=0 sessionPasswd=<null> context=0x7fd9500bad50 flags=0 2019-10-23 17:37:33,531:26960(0x70000b717000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54301] 2019-10-23 17:37:33,533:26960(0x70000b717000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:54301], sessionId=0x16dfb327ed90000, negotiated timeout=10000 I1023 17:37:33.533334 179191808 group.cpp:341] Group process (zookeeper-group(14)@10.0.49.4:53408) connected to ZooKeeper I1023 17:37:33.533380 179191808 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0) I1023 17:37:33.533393 179191808 group.cpp:395] Authenticating with ZooKeeper using digest 2019-10-23 17:37:42,337:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 5474ms 2019-10-23 17:37:42,337:26960(0x70000b717000):ZOO_ERROR@handle_socket_error_msg@1702: Socket [127.0.0.1:54301] zk retcode=-7, errno=60(Operation timed out): connection to 127.0.0.1:54301 timed out (exceeded timeout by 2139ms) 2019-10-23 17:37:42,337:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 5474ms I1023 17:37:42.337549 179191808 group.cpp:452] Lost connection to ZooKeeper, attempting to reconnect ... 2019-10-23 17:37:45,679:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 8816ms 2019-10-23 17:37:45,679:26960(0x70000b717000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54301] 2019-10-23 17:37:45,680:26960(0x70000b717000):ZOO_ERROR@handle_socket_error_msg@1800: Socket [127.0.0.1:54301] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x16dfb327ed90000 has expired. I1023 17:37:45.680534 181874688 group.cpp:511] ZooKeeper session expired 2019-10-23 17:37:45,680:26960(0x70000ab67000):ZOO_INFO@zookeeper_close@2579: Freeing zookeeper resources for sessionId=0x16dfb327ed90000 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:54301 sessionTimeout=10000 watcher=0x119c86ac0 sessionId=0 sessionPasswd=<null> context=0x7fd94fc98990 flags=0 2019-10-23 17:37:45,681:26960(0x70000b717000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54301] 2019-10-23 17:37:45,682:26960(0x70000b717000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:54301], sessionId=0x16dfb327ed90001, negotiated timeout=10000 I1023 17:37:45.682644 179191808 group.cpp:341] Group process (zookeeper-group(14)@10.0.49.4:53408) connected to ZooKeeper I1023 17:37:45.682677 179191808 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0) I1023 17:37:45.682689 179191808 group.cpp:395] Authenticating with ZooKeeper using digest ../../src/tests/master_contender_detector_tests.cpp:409: Failure Failed to wait 15secs for group1.join("data") 2019-10-23 17:37:49,793:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 778ms 2019-10-23 17:37:49,832:26960(0x70000b717000):ZOO_INFO@auth_completion_func@1327: Authentication scheme digest succeeded I1023 17:37:49.832679 179191808 group.cpp:419] Trying to create path '/mesos' in ZooKeeper 2019-10-23 17:37:49,836:26960(0x116eeb5c0):ZOO_INFO@zookeeper_close@2562: Closing zookeeper sessionId=0x16dfb327ed90001 to [127.0.0.1:54301] I1023 17:37:49.836292 384742848 zookeeper_test_server.cpp:116] Shutting down ZooKeeperTestServer on port 54301 [ FAILED ] ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors (16311 ms) {noformat} was (Author: asekretenko): Observed a similar failure again: {noformat} [ RUN ] ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors I1023 17:37:33.530347 384742848 zookeeper_test_server.cpp:156] Started ZooKeeperTestServer on port 54301 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build 2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:54301 sessionTimeout=10000 watcher=0x119c86ac0 sessionId=0 sessionPasswd=<null> context=0x7fd9500bad50 flags=0 2019-10-23 17:37:33,531:26960(0x70000b717000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54301] 2019-10-23 17:37:33,533:26960(0x70000b717000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:54301], sessionId=0x16dfb327ed90000, negotiated timeout=10000 I1023 17:37:33.533334 179191808 group.cpp:341] Group process (zookeeper-group(14)@10.0.49.4:53408) connected to ZooKeeper I1023 17:37:33.533380 179191808 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0) I1023 17:37:33.533393 179191808 group.cpp:395] Authenticating with ZooKeeper using digest 2019-10-23 17:37:42,337:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 5474ms 2019-10-23 17:37:42,337:26960(0x70000b717000):ZOO_ERROR@handle_socket_error_msg@1702: Socket [127.0.0.1:54301] zk retcode=-7, errno=60(Operation timed out): connection to 127.0.0.1:54301 timed out (exceeded timeout by 2139ms) 2019-10-23 17:37:42,337:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 5474ms I1023 17:37:42.337549 179191808 group.cpp:452] Lost connection to ZooKeeper, attempting to reconnect ... 2019-10-23 17:37:45,679:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 8816ms 2019-10-23 17:37:45,679:26960(0x70000b717000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54301] 2019-10-23 17:37:45,680:26960(0x70000b717000):ZOO_ERROR@handle_socket_error_msg@1800: Socket [127.0.0.1:54301] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x16dfb327ed90000 has expired. I1023 17:37:45.680534 181874688 group.cpp:511] ZooKeeper session expired 2019-10-23 17:37:45,680:26960(0x70000ab67000):ZOO_INFO@zookeeper_close@2579: Freeing zookeeper resources for sessionId=0x16dfb327ed90000 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build 2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:54301 sessionTimeout=10000 watcher=0x119c86ac0 sessionId=0 sessionPasswd=<null> context=0x7fd94fc98990 flags=0 2019-10-23 17:37:45,681:26960(0x70000b717000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54301] 2019-10-23 17:37:45,682:26960(0x70000b717000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:54301], sessionId=0x16dfb327ed90001, negotiated timeout=10000 I1023 17:37:45.682644 179191808 group.cpp:341] Group process (zookeeper-group(14)@10.0.49.4:53408) connected to ZooKeeper I1023 17:37:45.682677 179191808 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0) I1023 17:37:45.682689 179191808 group.cpp:395] Authenticating with ZooKeeper using digest ../../src/tests/master_contender_detector_tests.cpp:409: Failure Failed to wait 15secs for group1.join("data") 2019-10-23 17:37:49,793:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 778ms 2019-10-23 17:37:49,832:26960(0x70000b717000):ZOO_INFO@auth_completion_func@1327: Authentication scheme digest succeeded I1023 17:37:49.832679 179191808 group.cpp:419] Trying to create path '/mesos' in ZooKeeper 2019-10-23 17:37:49,836:26960(0x116eeb5c0):ZOO_INFO@zookeeper_close@2562: Closing zookeeper sessionId=0x16dfb327ed90001 to [127.0.0.1:54301] I1023 17:37:49.836292 384742848 zookeeper_test_server.cpp:116] Shutting down ZooKeeperTestServer on port 54301 [ FAILED ] ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors (16311 ms) {noformat} > ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors is flaky > ---------------------------------------------------------------- > > Key: MESOS-9483 > URL: https://issues.apache.org/jira/browse/MESOS-9483 > Project: Mesos > Issue Type: Bug > Environment: Mac OSX w/ libevent > Reporter: Benno Evers > Priority: Major > Labels: flaky-test, foundations > > Observed a failure with the following error: > {noformat} > ../../src/tests/master_contender_detector_tests.cpp:409: Failure > Failed to wait 15secs for group1.join("data") > {noformat} > Full log: > {noformat} > [ RUN ] ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors > I1214 15:03:56.036525 398710208 zookeeper_test_server.cpp:156] Started > ZooKeeperTestServer on port 50199 > 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@753: Client > environment:zookeeper.version=zookeeper C client 3.4.8 > 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@757: Client > environment:host.name=Jenkinss-Mac-mini.local > 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@764: Client > environment:os.name=Darwin > 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@765: Client > environment:os.arch=18.2.0 > 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@766: Client > environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST > 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 > 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@774: Client > environment:user.name=jenkins > 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@782: Client > environment:user.home=/Users/jenkins > 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@794: Client > environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build > 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@zookeeper_init@827: > Initiating client connection, host=127.0.0.1:50199 sessionTimeout=10000 > watcher=0x11a65f9a0 sessionId=0 sessionPasswd=<null> context=0x7fcd061125a0 > flags=0 > 2018-12-14 15:03:56,037:69505(0x700004415000):ZOO_INFO@check_events@1764: > initiated connection to server [127.0.0.1:50199] > 2018-12-14 15:03:56,039:69505(0x700004415000):ZOO_INFO@check_events@1811: > session establishment complete on server [127.0.0.1:50199], > sessionId=0x167aef64b830000, negotiated timeout=10000 > I1214 15:03:56.039242 60207104 group.cpp:341] Group process > (zookeeper-group(14)@10.0.49.4:49309) connected to ZooKeeper > I1214 15:03:56.039286 60207104 group.cpp:831] Syncing group operations: queue > size (joins, cancels, datas) = (1, 0, 0) > I1214 15:03:56.039309 60207104 group.cpp:395] Authenticating with ZooKeeper > using digest > 2018-12-14 > 15:04:05,989:69505(0x700004415000):ZOO_WARN@zookeeper_interest@1597: Exceeded > deadline by 6619ms > 2018-12-14 > 15:04:05,989:69505(0x700004415000):ZOO_ERROR@handle_socket_error_msg@1702: > Socket [127.0.0.1:50199] zk retcode=-7, errno=60(Operation timed out): > connection to 127.0.0.1:50199 timed out (exceeded timeout by 3284ms) > 2018-12-14 > 15:04:05,989:69505(0x700004415000):ZOO_WARN@zookeeper_interest@1597: Exceeded > deadline by 6619ms > I1214 15:04:05.990031 60207104 group.cpp:452] Lost connection to ZooKeeper, > attempting to reconnect ... > 2018-12-14 > 15:04:09,332:69505(0x700004415000):ZOO_WARN@zookeeper_interest@1597: Exceeded > deadline by 9963ms > 2018-12-14 15:04:09,332:69505(0x700004415000):ZOO_INFO@check_events@1764: > initiated connection to server [127.0.0.1:50199] > 2018-12-14 > 15:04:09,333:69505(0x700004415000):ZOO_ERROR@handle_socket_error_msg@1800: > Socket [127.0.0.1:50199] zk retcode=-112, errno=70(Stale NFS file handle): > sessionId=0x167aef64b830000 has expired. > I1214 15:04:09.333552 59670528 group.cpp:511] ZooKeeper session expired > 2018-12-14 15:04:09,333:69505(0x7000038e8000):ZOO_INFO@zookeeper_close@2579: > Freeing zookeeper resources for sessionId=0x167aef64b830000 > 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@753: Client > environment:zookeeper.version=zookeeper C client 3.4.8 > 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@757: Client > environment:host.name=Jenkinss-Mac-mini.local > 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@764: Client > environment:os.name=Darwin > 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@765: Client > environment:os.arch=18.2.0 > 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@766: Client > environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST > 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 > 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@774: Client > environment:user.name=jenkins > 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@782: Client > environment:user.home=/Users/jenkins > 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@794: Client > environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build > 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@zookeeper_init@827: > Initiating client connection, host=127.0.0.1:50199 sessionTimeout=10000 > watcher=0x11a65f9a0 sessionId=0 sessionPasswd=<null> context=0x7fcd061125a0 > flags=0 > 2018-12-14 15:04:09,334:69505(0x700004415000):ZOO_INFO@check_events@1764: > initiated connection to server [127.0.0.1:50199] > 2018-12-14 15:04:09,335:69505(0x700004415000):ZOO_INFO@check_events@1811: > session establishment complete on server [127.0.0.1:50199], > sessionId=0x167aef64b830001, negotiated timeout=10000 > I1214 15:04:09.335834 58060800 group.cpp:341] Group process > (zookeeper-group(14)@10.0.49.4:49309) connected to ZooKeeper > I1214 15:04:09.335880 58060800 group.cpp:831] Syncing group operations: queue > size (joins, cancels, datas) = (1, 0, 0) > I1214 15:04:09.335896 58060800 group.cpp:395] Authenticating with ZooKeeper > using digest > ../../src/tests/master_contender_detector_tests.cpp:409: Failure > Failed to wait 15secs for group1.join("data") > 2018-12-14 > 15:04:12,784:69505(0x700004415000):ZOO_INFO@auth_completion_func@1327: > Authentication scheme digest succeeded > I1214 15:04:12.784991 58060800 group.cpp:419] Trying to create path '/mesos' > in ZooKeeper > 2018-12-14 15:04:12,788:69505(0x117c3d5c0):ZOO_INFO@zookeeper_close@2562: > Closing zookeeper sessionId=0x167aef64b830001 to [127.0.0.1:50199] > I1214 15:04:12.789038 398710208 zookeeper_test_server.cpp:116] Shutting down > ZooKeeperTestServer on port 50199 > [ FAILED ] ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors (16756 > ms) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)