[ https://issues.apache.org/jira/browse/MESOS-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031847#comment-15031847 ]
Jan Schlicht edited comment on MESOS-3975 at 11/30/15 2:35 PM: --------------------------------------------------------------- I could reproduce this under Fedora 23, running {{./bin/mesos-tests.sh --gtest_repeat=-1}} if compiled with {{--enable-libevent --enable-ssl}}, though for me DiskQuotaTest.DiskUsageExceedsQuota fails. It seems that the RegistryClientTest fixture sets some {{SSL_}} env variables but doesn't unset them. This enables SSL for subsequent fixtures and let's libprocess initialization fail with "Could not load cert file". I don't know how GTest orders/shuffles its tests, but the ordering is important here: {{./bin/mesos-tests.sh --gtest_repeat=2 --gtest_filter="RegistryClientTest.*:*.DiskUsageExceedsQuota" --gtest_break_on_failure}} can be used to reproduce this here. was (Author: nfnt): I could reproduce this under Fedora 23, running {{./bin/mesos-tests.sh --gtest_repeat=-1}} if compiled with {{--enable-libevent --enable-ssl}}, though for me DiskQuotaTest.DiskUsageExceedsQuota fails. It seems that the RegistryClientTest fixture sets some {{SSL_}} env variables but doesn't unset them. This enables SSL for subsequent fixtures and let's libprocess initialization fail with "Could not load cert file". I don't know how GTest orders/shuffles its tests, but the ordering is important here: Using {{./bin/mesos-tests.sh --gtest_repeat=2 --gtest_filter="RegistryClientTest.*:*.DiskUsageExceedsQuota" --gtest_break_on_failure}} can be used to reproduce this here. > SSL build of mesos causes flaky testsuite. > ------------------------------------------ > > Key: MESOS-3975 > URL: https://issues.apache.org/jira/browse/MESOS-3975 > Project: Mesos > Issue Type: Bug > Affects Versions: 0.26.0 > Environment: CentOS 7.1, Kernel 3.10.0-229.20.1.el7.x86_64, gcc > 4.8.3, Docker 1.9 > Reporter: Till Toenshoff > Assignee: Joseph Wu > Labels: mesosphere > > When running the tests of an SSL build of Mesos on CentOS 7.1, I see spurious > test failures that are, so far, not reproducible. > The following tests did fail for me in complete runs but did seem fine when > running them individually, in repetition. > {noformat} > DockerTest.ROOT_DOCKER_CheckPortResource > {noformat} > {noformat} > ContainerizerTest.ROOT_CGROUPS_BalloonFramework > {noformat} > {noformat} > [ RUN ] > LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutor > 2015-11-20 > 19:08:38,826:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > + /home/vagrant/mesos/build/src/mesos-containerizer mount --help=false > --operation=make-rslave --path=/ > + grep -E > /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/.+ > /proc/self/mountinfo > + grep -v 2b98025c-74f1-41d2-b35a-ce2cdfae347e > + cut '-d ' -f5 > + xargs --no-run-if-empty umount -l > + mount -n --rbind > /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/provisioner/containers/2b98025c-74f1-41d2-b35a-ce2cdfae347e/backends/copy/rootfses/bed11080-474b-4c69-8e7f-0ab85e895b0d > > /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/slaves/830e842e-c36a-4e4c-bff4-5b9568d7df12-S0/frameworks/830e842e-c36a-4e4c-bff4-5b9568d7df12-0000/executors/c735be54-c47f-4645-bfc1-2f4647e2cddb/runs/2b98025c-74f1-41d2-b35a-ce2cdfae347e/.rootfs > Could not load cert file > ../../src/tests/containerizer/filesystem_isolator_tests.cpp:354: Failure > Value of: statusRunning.get().state() > Actual: TASK_FAILED > Expected: TASK_RUNNING > 2015-11-20 > 19:08:42,164:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > 2015-11-20 > 19:08:45,501:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > 2015-11-20 > 19:08:48,837:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > 2015-11-20 > 19:08:52,174:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > ../../src/tests/containerizer/filesystem_isolator_tests.cpp:355: Failure > Failed to wait 15secs for statusFinished > ../../src/tests/containerizer/filesystem_isolator_tests.cpp:349: Failure > Actual function call count doesn't match EXPECT_CALL(sched, > statusUpdate(&driver, _))... > Expected: to be called twice > Actual: called once - unsatisfied and active > 2015-11-20 > 19:08:55,511:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > *** Aborted at 1448046536 (unix time) try "date -d @1448046536" if you are > using GNU date *** > PC: @ 0x0 (unknown) > *** SIGSEGV (@0x0) received by PID 21380 (TID 0x7fa1549e68c0) from PID 0; > stack trace: *** > @ 0x7fa141796fbb (unknown) > @ 0x7fa14179b341 (unknown) > @ 0x7fa14f096130 (unknown) > {noformat} > Vagrantfile generator: > {noformat} > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.hostname = "centos71" > config.vm.box = "bento/centos-7.1" > config.vm.provider "virtualbox" do |vb| > vb.memory = 16384 > vb.cpus = 8 > end > config.vm.provider "vmware_fusion" do |vb| > vb.memory = 9216 > vb.cpus = 4 > end > config.vm.provision "shell", inline: <<-SHELL > sudo yum -y update systemd > sudo yum install -y tar wget > sudo wget > http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo > -O /etc/yum.repos.d/epel-apache-maven.repo > sudo yum groupinstall -y "Development Tools" > sudo yum install -y apache-maven python-devel java-1.7.0-openjdk-devel > zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 > apr-devel subversion-devel apr-util-devel > sudo yum install libevent-devel > sudo yum install -y git > sudo yum install -y docker > sudo service docker start > sudo docker info > #sudo wget -qO- https://get.docker.com/ | sh > SHELL > end > EOF > vagrant up > vagrant reload > vagrant ssh -c " > git clone https://github.com/apache/mesos.git mesos > cd mesos > git checkout -b 0.26.0-rc1 0.26.0-rc1 > ./bootstrap > mkdir build > cd build > ../configure --enable-libevent --enable-ssl > GTEST_FILTER="" make check > sudo ./bin/mesos-tests.sh > " > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)