[jira] [Commented] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics

2015-12-01 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034263#comment-15034263
 ] 

Jan Schlicht commented on MESOS-3586:
-

I have to reopen this, as I've found the same behavior using the 0.26-rc2 on 
CentOS 7.1. Noticed some flakiness while running {{sudo ./bin/mesos-tests.sh}} 
and could reproduce it by running {{sudo ./bin/mesos-tests.sh - 
--gtest_filter="MemoryPressureMesosTest.CGROUPS_ROOT_Statistics" 
--gtest_repeat=-1 --gtest_break_on_failure}} until it breaks.

Here's a verbose output of a failing test:
{noformat}
[ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
I1201 18:07:51.136508 18883 cgroups.cpp:2429] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
I1201 18:07:51.144594 18886 cgroups.cpp:1411] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
 after 7.076864ms
I1201 18:07:51.151480 18882 cgroups.cpp:2447] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
I1201 18:07:51.162557 18886 cgroups.cpp:1440] Successfullly thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
 after 11.026944ms
I1201 18:07:51.172379 18887 cgroups.cpp:2429] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb
I1201 18:07:51.183791 18881 cgroups.cpp:1411] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb after 
7.8272ms
I1201 18:07:51.192354 18887 cgroups.cpp:2447] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb
I1201 18:07:51.199439 18885 cgroups.cpp:1440] Successfullly thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb after 
7.028224ms
I1201 18:07:51.332849 18866 leveldb.cpp:176] Opened db in 6.74674ms
I1201 18:07:51.335450 18866 leveldb.cpp:183] Compacted db in 2.554513ms
I1201 18:07:51.335539 18866 leveldb.cpp:198] Created db iterator in 53851ns
I1201 18:07:51.335556 18866 leveldb.cpp:204] Seeked to beginning of db in 3455ns
I1201 18:07:51.335561 18866 leveldb.cpp:273] Iterated through 0 keys in the db 
in 107ns
I1201 18:07:51.335666 18866 replica.cpp:780] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1201 18:07:51.337374 18881 recover.cpp:449] Starting replica recovery
I1201 18:07:51.338235 18881 recover.cpp:475] Replica is in EMPTY status
I1201 18:07:51.340142 18880 replica.cpp:676] Replica in EMPTY status received a 
broadcasted recover request from (14)@127.0.0.1:57652
I1201 18:07:51.340749 18882 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I1201 18:07:51.340975 18885 master.cpp:367] Master 
2f17d97c-de40-491e-9706-bf83a9ffd08c (centos71) started on 127.0.0.1:57652
I1201 18:07:51.341475 18884 recover.cpp:566] Updating replica status to STARTING
I1201 18:07:51.341152 18885 master.cpp:369] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/ap4rPt/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/ap4rPt/master" 
--zk_session_timeout="10secs"
W1201 18:07:51.341752 18885 master.cpp:372]
**
Master bound to loopback interface! Cannot communicate with remote schedulers 
or slaves. You might want to set '--ip' flag to a routable IP address.
**
I1201 18:07:51.341794 18885 master.cpp:414] Master only allowing authenticated 
frameworks to register
I1201 18:07:51.341804 18885 master.cpp:419] Master only allowing authenticated 
slaves to register
I1201 18:07:51.341879 18885 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/ap4rPt/credentials'
I1201 18:07:51.345211 18885 master.cpp:458] Using default 'crammd5' 
authenticator
I1201 18:07:51.345268 18882 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 3.5302ms
I1201 18:07:51.345289 18882 replica.cpp:323] Persisted replica status to 
STARTING
I1201 18:07:51.345350 18885 authenticator.cpp:520] Initializing server SASL
I1201 

[jira] [Commented] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics

2015-12-01 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034269#comment-15034269
 ] 

Jan Schlicht commented on MESOS-3586:
-

I used the following vagrant generator to setup a CentOS virt env:
{noformat}
cat << EOF > Vagrantfile
# -*- mode: ruby -*-" >
# vi: set ft=ruby :
Vagrant.configure(2) do |config|
  # Disable shared folder to prevent certain kernel module dependencies.
  config.vm.synced_folder ".", "/vagrant", disabled: true

  config.vm.hostname = "centos71"

  config.vm.box = "bento/centos-7.1"

  config.vm.provider "virtualbox" do |vb|
vb.memory = 8192
vb.cpus = 8
  end

  config.vm.provision "shell", inline: <<-SHELL
 yum -y update systemd

 yum install -y tar wget
 wget 
http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo 
-O /etc/yum.repos.d/epel-apache-maven.repo

 yum groupinstall -y "Development Tools"
 yum install -y apache-maven python-devel java-1.7.0-openjdk-devel 
zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 
apr-devel subversion-devel apr-util-devel

 yum install -y libevent-devel

 yum install -y perf nmap-ncat

 yum install -y git

 yum install -y docker
 systemctl start docker
 systemctl enable docker
 docker info

 #wget -qO- https://get.docker.com/ | sh

  SHELL
end
EOF

vagrant up
vagrant reload

vagrant ssh -c "
git clone  https://github.com/apache/mesos.git mesos
cd mesos
git checkout -b 0.26.0-rc2 0.26.0-rc2

./bootstrap
mkdir build
cd build

../configure
GTEST_FILTER="" make check
sudo ./bin/mesos-tests.sh
"
{noformat}

> Installing Mesos 0.24.0 on multiple systems. Failed test on 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ---
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
>Reporter: Miguel Bernadin
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN  ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics

2015-12-01 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034339#comment-15034339
 ] 

Jan Schlicht commented on MESOS-3586:
-

It seems like a timing problem in the test. It's making the assumption that 
{{os::sleep}} will sleep for the exact amount that it's provided with.

> Installing Mesos 0.24.0 on multiple systems. Failed test on 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ---
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
>Reporter: Miguel Bernadin
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN  ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics

2015-10-05 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944276#comment-14944276
 ] 

Marco Massenzio commented on MESOS-3586:


We would not recommend to install Mesos on a cluster by building it on *every* 
node: you would be probably better off using the pre-packaged binaries or by 
building it on a build machine and then distributing your binary(ies) with 
whatever deployment manager you prefer.

Having said that, I'm also guessing you were running {{make check}} as {{root}} 
on your system? (the {{*ROOT}} tests are only run when the user is the 
superuser on a system)

Can you please provide more details on OS/distribution/environment for the 
failure?

Finally - it would seem that the actual error is due to something to do with 
connecting to ZooKeeper (tests do that, they launch a local instance of ZK and 
then try to connect to it; if for whatever reason that fails, the tests will 
fail too).

> Installing Mesos 0.24.0 on multiple systems. Failed test on 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ---
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
>Reporter: Miguel Bernadin
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN  ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)