[ https://issues.apache.org/jira/browse/MESOS-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15026938#comment-15026938 ]
Till Toenshoff edited comment on MESOS-3975 at 11/25/15 3:38 PM: ----------------------------------------------------------------- I can still see tests failing using the above Vagrantfile generator on both VMware-Fusion as well as on VirtualBox -- hosted on OSX and Linux. Just ran the test-suite again with a repeat-counter enabled and it stopped on the first {noformat} [ RUN ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem 2015-11-25 15:26:33,873:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:26:37,209:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:26:40,546:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:26:43,883:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client + /home/vagrant/mesos/build/src/mesos-containerizer mount --help=false --operation=make-rslave --path=/ + grep -E /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_FrPTNg/.+ /proc/self/mountinfo + grep -v 722234da-f06d-4c9c-95d9-9be998e69d5c + cut '-d ' -f5 + xargs --no-run-if-empty umount -l Changing root to /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_FrPTNg/provisioner/containers/722234da-f06d-4c9c-95d9-9be998e69d5c/backends/copy/rootfses/928eb0dc-228b-4e9a-80d4-de8fb86ff6ea 2015-11-25 15:26:47,221:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client [ OK ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem (16903 ms) [ RUN ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutor 2015-11-25 15:26:50,558:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:26:53,894:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client + /home/vagrant/mesos/build/src/mesos-containerizer mount --help=false --operation=make-rslave --path=/ + grep -E /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_nEk9PC/.+ /proc/self/mountinfo + grep -v 39ddf64a-d74e-44c9-a237-2d130c95e72d + cut '-d ' -f5 + xargs --no-run-if-empty umount -l + mount -n --rbind /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_nEk9PC/provisioner/containers/39ddf64a-d74e-44c9-a237-2d130c95e72d/backends/copy/rootfses/4eac79ca-c89f-4a1d-b190-9e11cb43ca15 /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_nEk9PC/slaves/f3615745-e347-4ffe-ba44-30cb0c245d76-S0/frameworks/f3615745-e347-4ffe-ba44-30cb0c245d76-0000/executors/226484c0-8df5-43fd-a62f-39b3b7bc4824/runs/39ddf64a-d74e-44c9-a237-2d130c95e72d/.rootfs Could not load cert file 2015-11-25 15:26:57,231:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client ../../src/tests/containerizer/filesystem_isolator_tests.cpp:354: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING 2015-11-25 15:27:00,568:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:27:03,906:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:27:07,243:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:27:10,580:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:27:13,916:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client ../../src/tests/containerizer/filesystem_isolator_tests.cpp:355: Failure Failed to wait 15secs for statusFinished ../../src/tests/containerizer/filesystem_isolator_tests.cpp:349: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(&driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active *** Aborted at 1448465234 (unix time) try "date -d @1448465234" if you are using GNU date *** PC: @ 0x0 (unknown) *** SIGSEGV (@0x0) received by PID 22205 (TID 0x7f347d3ab8c0) from PID 0; stack trace: *** @ 0x7f346a15bfbb (unknown) @ 0x7f346a160341 (unknown) @ 0x7f3477a5b130 (unknown) {noformat} was (Author: tillt): I can still see tests failing using the above Vagrantfile generator on both VMware-Fusion as well as on VirtualBox -- hosted on OSX and Linux. Just ran the test-suite again with a repeat-counter enabled and it stopped on the first iteration: ``` [ RUN ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem 2015-11-25 15:26:33,873:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:26:37,209:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:26:40,546:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:26:43,883:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client + /home/vagrant/mesos/build/src/mesos-containerizer mount --help=false --operation=make-rslave --path=/ + grep -E /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_FrPTNg/.+ /proc/self/mountinfo + grep -v 722234da-f06d-4c9c-95d9-9be998e69d5c + cut '-d ' -f5 + xargs --no-run-if-empty umount -l Changing root to /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_FrPTNg/provisioner/containers/722234da-f06d-4c9c-95d9-9be998e69d5c/backends/copy/rootfses/928eb0dc-228b-4e9a-80d4-de8fb86ff6ea 2015-11-25 15:26:47,221:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client [ OK ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem (16903 ms) [ RUN ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutor 2015-11-25 15:26:50,558:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:26:53,894:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client + /home/vagrant/mesos/build/src/mesos-containerizer mount --help=false --operation=make-rslave --path=/ + grep -E /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_nEk9PC/.+ /proc/self/mountinfo + grep -v 39ddf64a-d74e-44c9-a237-2d130c95e72d + cut '-d ' -f5 + xargs --no-run-if-empty umount -l + mount -n --rbind /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_nEk9PC/provisioner/containers/39ddf64a-d74e-44c9-a237-2d130c95e72d/backends/copy/rootfses/4eac79ca-c89f-4a1d-b190-9e11cb43ca15 /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_nEk9PC/slaves/f3615745-e347-4ffe-ba44-30cb0c245d76-S0/frameworks/f3615745-e347-4ffe-ba44-30cb0c245d76-0000/executors/226484c0-8df5-43fd-a62f-39b3b7bc4824/runs/39ddf64a-d74e-44c9-a237-2d130c95e72d/.rootfs Could not load cert file 2015-11-25 15:26:57,231:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client ../../src/tests/containerizer/filesystem_isolator_tests.cpp:354: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING 2015-11-25 15:27:00,568:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:27:03,906:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:27:07,243:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:27:10,580:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-25 15:27:13,916:22205(0x7f3435870700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37707] zk retcode=-4, errno=111(Connection refused): server refused to accept the client ../../src/tests/containerizer/filesystem_isolator_tests.cpp:355: Failure Failed to wait 15secs for statusFinished ../../src/tests/containerizer/filesystem_isolator_tests.cpp:349: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(&driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active *** Aborted at 1448465234 (unix time) try "date -d @1448465234" if you are using GNU date *** PC: @ 0x0 (unknown) *** SIGSEGV (@0x0) received by PID 22205 (TID 0x7f347d3ab8c0) from PID 0; stack trace: *** @ 0x7f346a15bfbb (unknown) @ 0x7f346a160341 (unknown) @ 0x7f3477a5b130 (unknown) ``` > SSL build of mesos causes flaky testsuite. > ------------------------------------------ > > Key: MESOS-3975 > URL: https://issues.apache.org/jira/browse/MESOS-3975 > Project: Mesos > Issue Type: Bug > Affects Versions: 0.26.0 > Environment: CentOS 7.1, Kernel 3.10.0-229.20.1.el7.x86_64, gcc > 4.8.3, Docker 1.9 > Reporter: Till Toenshoff > Assignee: Joris Van Remoortere > Labels: mesosphere > > When running the tests of an SSL build of Mesos on CentOS 7.1, I see spurious > test failures that are, so far, not reproducible. > The following tests did fail for me in complete runs but did seem fine when > running them individually, in repetition. > {noformat} > DockerTest.ROOT_DOCKER_CheckPortResource > {noformat} > {noformat} > ContainerizerTest.ROOT_CGROUPS_BalloonFramework > {noformat} > {noformat} > [ RUN ] > LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutor > 2015-11-20 > 19:08:38,826:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > + /home/vagrant/mesos/build/src/mesos-containerizer mount --help=false > --operation=make-rslave --path=/ > + grep -E > /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/.+ > /proc/self/mountinfo > + grep -v 2b98025c-74f1-41d2-b35a-ce2cdfae347e > + cut '-d ' -f5 > + xargs --no-run-if-empty umount -l > + mount -n --rbind > /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/provisioner/containers/2b98025c-74f1-41d2-b35a-ce2cdfae347e/backends/copy/rootfses/bed11080-474b-4c69-8e7f-0ab85e895b0d > > /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/slaves/830e842e-c36a-4e4c-bff4-5b9568d7df12-S0/frameworks/830e842e-c36a-4e4c-bff4-5b9568d7df12-0000/executors/c735be54-c47f-4645-bfc1-2f4647e2cddb/runs/2b98025c-74f1-41d2-b35a-ce2cdfae347e/.rootfs > Could not load cert file > ../../src/tests/containerizer/filesystem_isolator_tests.cpp:354: Failure > Value of: statusRunning.get().state() > Actual: TASK_FAILED > Expected: TASK_RUNNING > 2015-11-20 > 19:08:42,164:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > 2015-11-20 > 19:08:45,501:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > 2015-11-20 > 19:08:48,837:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > 2015-11-20 > 19:08:52,174:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > ../../src/tests/containerizer/filesystem_isolator_tests.cpp:355: Failure > Failed to wait 15secs for statusFinished > ../../src/tests/containerizer/filesystem_isolator_tests.cpp:349: Failure > Actual function call count doesn't match EXPECT_CALL(sched, > statusUpdate(&driver, _))... > Expected: to be called twice > Actual: called once - unsatisfied and active > 2015-11-20 > 19:08:55,511:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > *** Aborted at 1448046536 (unix time) try "date -d @1448046536" if you are > using GNU date *** > PC: @ 0x0 (unknown) > *** SIGSEGV (@0x0) received by PID 21380 (TID 0x7fa1549e68c0) from PID 0; > stack trace: *** > @ 0x7fa141796fbb (unknown) > @ 0x7fa14179b341 (unknown) > @ 0x7fa14f096130 (unknown) > {noformat} > Vagrantfile generator: > {noformat} > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.hostname = "centos71" > config.vm.box = "bento/centos-7.1" > config.vm.provider "virtualbox" do |vb| > vb.memory = 16384 > vb.cpus = 8 > end > config.vm.provider "vmware_fusion" do |vb| > vb.memory = 9216 > vb.cpus = 4 > end > config.vm.provision "shell", inline: <<-SHELL > sudo yum -y update systemd > sudo yum install -y tar wget > sudo wget > http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo > -O /etc/yum.repos.d/epel-apache-maven.repo > sudo yum groupinstall -y "Development Tools" > sudo yum install -y apache-maven python-devel java-1.7.0-openjdk-devel > zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 > apr-devel subversion-devel apr-util-devel > sudo yum install libevent-devel > sudo yum install -y git > sudo yum install -y docker > sudo service docker start > sudo docker info > #sudo wget -qO- https://get.docker.com/ | sh > SHELL > end > EOF > vagrant up > vagrant reload > vagrant ssh -c " > git clone https://github.com/apache/mesos.git mesos > cd mesos > git checkout -b 0.26.0-rc1 0.26.0-rc1 > ./bootstrap > mkdir build > cd build > ../configure --enable-libevent --enable-ssl > GTEST_FILTER="" make check > sudo ./bin/mesos-tests.sh > " > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)