[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708001#comment-16708001 ] Till Toenshoff commented on MESOS-4646: --- We should try this on a more recent Kernel -- [~ipronin] suggested using a 4.9 or 4.14 -- will give that a spin as soon as possible. > PortMappingIsolatorTests get kernel stuck. > -- > > Key: MESOS-4646 > URL: https://issues.apache.org/jira/browse/MESOS-4646 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.8.0 > Environment: Linux Kernel 3.19.9-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff >Priority: Major > Labels: flaky, flaky-test > > {noformat} > $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*" > Source directory: /home/till/scratchpad/mesos > Build directory: /home/till/scratchpad/mesos/build > - > We cannot run any cgroups tests that require mounting > hierarchies because you have the following hierarchies mounted: > /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, > /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, > /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, > /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd > We'll disable the CgroupsNoHierarchyTest test fixture for now. > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > No 'perf' command found so no 'perf' tests will be run > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > The 'perf' command wasn't found so tests using it > to sample the 'cycles' hardware event will not be run. > - > /bin/nc > /usr/local/bin/curl > Note: Google Test filter = >
[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707993#comment-16707993 ] Till Toenshoff commented on MESOS-4646: --- [~ipronin] do you have any cycles for looking into this? > PortMappingIsolatorTests get kernel stuck. > -- > > Key: MESOS-4646 > URL: https://issues.apache.org/jira/browse/MESOS-4646 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.8.0 > Environment: Linux Kernel 3.19.9-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff >Priority: Major > Labels: flaky, flaky-test > > {noformat} > $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*" > Source directory: /home/till/scratchpad/mesos > Build directory: /home/till/scratchpad/mesos/build > - > We cannot run any cgroups tests that require mounting > hierarchies because you have the following hierarchies mounted: > /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, > /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, > /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, > /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd > We'll disable the CgroupsNoHierarchyTest test fixture for now. > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > No 'perf' command found so no 'perf' tests will be run > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > The 'perf' command wasn't found so tests using it > to sample the 'cycles' hardware event will not be run. > - > /bin/nc > /usr/local/bin/curl > Note: Google Test filter = >
[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702567#comment-16702567 ] Till Toenshoff commented on MESOS-4646: --- [~wangcong] we would love to get this solved as otherwise our CI keeps coming back red. Would be terrific if you could try the given setup on this test. > PortMappingIsolatorTests get kernel stuck. > -- > > Key: MESOS-4646 > URL: https://issues.apache.org/jira/browse/MESOS-4646 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.8.0 > Environment: Linux Kernel 3.19.9-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff >Priority: Major > Labels: flaky, flaky-test > > {noformat} > $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*" > Source directory: /home/till/scratchpad/mesos > Build directory: /home/till/scratchpad/mesos/build > - > We cannot run any cgroups tests that require mounting > hierarchies because you have the following hierarchies mounted: > /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, > /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, > /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, > /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd > We'll disable the CgroupsNoHierarchyTest test fixture for now. > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > No 'perf' command found so no 'perf' tests will be run > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > The 'perf' command wasn't found so tests using it > to sample the 'cycles' hardware event will not be run. > - > /bin/nc > /usr/local/bin/curl > Note: Google Test filter = >
[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682116#comment-16682116 ] Till Toenshoff commented on MESOS-4646: --- Just observed this again on Ubuntu16.04 (4.4 kernel) within our internal CI. > PortMappingIsolatorTests get kernel stuck. > -- > > Key: MESOS-4646 > URL: https://issues.apache.org/jira/browse/MESOS-4646 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.8.0 > Environment: Linux Kernel 3.19.9-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff >Priority: Major > > {noformat} > $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*" > Source directory: /home/till/scratchpad/mesos > Build directory: /home/till/scratchpad/mesos/build > - > We cannot run any cgroups tests that require mounting > hierarchies because you have the following hierarchies mounted: > /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, > /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, > /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, > /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd > We'll disable the CgroupsNoHierarchyTest test fixture for now. > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > No 'perf' command found so no 'perf' tests will be run > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > The 'perf' command wasn't found so tests using it > to sample the 'cycles' hardware event will not be run. > - > /bin/nc > /usr/local/bin/curl > Note: Google Test filter = >
[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143334#comment-15143334 ] Cong Wang commented on MESOS-4646: -- OK, the kernel stuck is probably another kernel bug, but without kernel stack trace, I have no idea what bug it is. Could you please try to setup kdump to capture the kernel crash/stuck? BTW, here at Twitter we use 4.1 kernel + the above fix, I just repeated the PortMappingIsolatorTest for 30 times, all passed. So maybe it is a new kernel bug I never see before. > PortMappingIsolatorTests get kernel stuck. > -- > > Key: MESOS-4646 > URL: https://issues.apache.org/jira/browse/MESOS-4646 > Project: Mesos > Issue Type: Bug > Environment: Linux Kernel 3.19.9-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff >Assignee: Cong Wang > > {noformat} > $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*" > Source directory: /home/till/scratchpad/mesos > Build directory: /home/till/scratchpad/mesos/build > - > We cannot run any cgroups tests that require mounting > hierarchies because you have the following hierarchies mounted: > /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, > /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, > /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, > /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd > We'll disable the CgroupsNoHierarchyTest test fixture for now. > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > No 'perf' command found so no 'perf' tests will be run > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > The 'perf' command wasn't found so tests using it > to sample the 'cycles' hardware event will not be run. > - > /bin/nc > /usr/local/bin/curl > Note: Google Test filter = >
[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142766#comment-15142766 ] Till Toenshoff commented on MESOS-4646: --- I now tried a 4.3 kernel. The results are just a bit better in that the kernel does not get stuck but the tests still fail utterly while getting stuck themselves. {noformat} [ RUN ] PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP I0211 05:56:11.255408 90890 port_mapping_tests.cpp:224] Using eth0 as the public interface I0211 05:56:11.255954 90890 port_mapping_tests.cpp:232] Using lo as the loopback interface I0211 05:56:13.144747 90890 port_mapping.cpp:1255] Using eth0 as the public interface I0211 05:56:13.145141 90890 port_mapping.cpp:1280] Using lo as the loopback interface I0211 05:56:13.146286 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024' I0211 05:56:13.146486 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128' I0211 05:56:13.146747 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_wmem = '4096 16384 4194304' I0211 05:56:13.147191 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_synack_retries = '5' I0211 05:56:13.147518 90890 port_mapping.cpp:1567] /proc/sys/net/core/somaxconn = '128' I0211 05:56:13.147707 90890 port_mapping.cpp:1567] /proc/sys/net/core/rmem_max = '212992' I0211 05:56:13.147971 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_rmem = '4096 87380 6291456' I0211 05:56:13.148393 90890 port_mapping.cpp:1567] /proc/sys/net/core/wmem_max = '212992' I0211 05:56:13.148653 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_keepalive_time = '7200' I0211 05:56:13.148808 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512' I0211 05:56:13.148962 90890 port_mapping.cpp:1567] /proc/sys/net/core/netdev_max_backlog = '1000' I0211 05:56:13.150074 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75' I0211 05:56:13.150271 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_keepalive_probes = '9' I0211 05:56:13.150394 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512' I0211 05:56:13.150619 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_retries2 = '15' I0211 05:56:17.074481 90890 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher I0211 05:56:17.078749 90909 port_mapping.cpp:2162] Using non-ephemeral ports {[31000,31500)} and ephemeral ports [30016,30032) for container container1 of executor '' I0211 05:56:17.334048 90890 linux_launcher.cpp:363] Cloning child process with flags = CLONE_NEWNET | CLONE_NEWNS ../../src/tests/containerizer/port_mapping_tests.cpp:507: Failure Failed to wait 15secs for isolator.get()->isolate(containerId1, pid.get()) I0211 05:56:34.901305 90907 port_mapping.cpp:2226] Bind mounted '/proc/90956/ns/net' to '/var/run/netns/90956' for container container1 [ FAILED ] PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP (29652 ms) [ RUN ] PortMappingIsolatorTest.ROOT_NC_ContainerToContainerUDP I0211 05:56:40.905812 90890 port_mapping_tests.cpp:224] Using eth0 as the public interface I0211 05:56:40.906904 90890 port_mapping_tests.cpp:232] Using lo as the loopback interface I0211 05:56:40.938251 90890 port_mapping.cpp:1255] Using eth0 as the public interface I0211 05:56:40.938639 90890 port_mapping.cpp:1280] Using lo as the loopback interface I0211 05:56:41.037220 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024' I0211 05:56:41.037513 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128' I0211 05:56:41.037768 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_wmem = '4096 16384 4194304' I0211 05:56:41.038230 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_synack_retries = '5' I0211 05:56:41.038434 90890 port_mapping.cpp:1567] /proc/sys/net/core/somaxconn = '128' I0211 05:56:41.038596 90890 port_mapping.cpp:1567] /proc/sys/net/core/rmem_max = '212992' I0211 05:56:41.051391 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_rmem = '4096 87380 6291456' I0211 05:56:41.051430 90890 port_mapping.cpp:1567] /proc/sys/net/core/wmem_max = '212992' I0211 05:56:41.051456 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_keepalive_time = '7200' I0211 05:56:41.051482 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512' I0211 05:56:41.051507 90890 port_mapping.cpp:1567] /proc/sys/net/core/netdev_max_backlog = '1000' I0211 05:56:41.051534 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75' I0211 05:56:41.051558 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_keepalive_probes = '9' I0211 05:56:41.051583 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512' I0211 05:56:41.051606 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_retries2 = '15'
[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142782#comment-15142782 ] Till Toenshoff commented on MESOS-4646: --- Ow, after having left the machine in that state for a few minutes, at some point the kernel got stuck as well, even with 4.3. > PortMappingIsolatorTests get kernel stuck. > -- > > Key: MESOS-4646 > URL: https://issues.apache.org/jira/browse/MESOS-4646 > Project: Mesos > Issue Type: Bug > Environment: Linux Kernel 3.19.9-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff >Assignee: Cong Wang > > {noformat} > $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*" > Source directory: /home/till/scratchpad/mesos > Build directory: /home/till/scratchpad/mesos/build > - > We cannot run any cgroups tests that require mounting > hierarchies because you have the following hierarchies mounted: > /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, > /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, > /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, > /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd > We'll disable the CgroupsNoHierarchyTest test fixture for now. > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > No 'perf' command found so no 'perf' tests will be run > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > The 'perf' command wasn't found so tests using it > to sample the 'cycles' hardware event will not be run. > - > /bin/nc > /usr/local/bin/curl > Note: Google Test filter = >
[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142231#comment-15142231 ] Cong Wang commented on MESOS-4646: -- This might be the bug I have already fixed: {noformat} commit 6bd00b850635abb0044e06101761533c8beba79c Author: WANG CongDate: Thu Oct 1 11:37:42 2015 -0700 act_mirred: fix a race condition on mirred_list {noformat} Could you try to setup kdump to capture the kernel stack trace? Or try a new kernel, 4.3 or above. > PortMappingIsolatorTests get kernel stuck. > -- > > Key: MESOS-4646 > URL: https://issues.apache.org/jira/browse/MESOS-4646 > Project: Mesos > Issue Type: Bug > Environment: Linux Kernel 3.19.9-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff > > {noformat} > $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*" > Source directory: /home/till/scratchpad/mesos > Build directory: /home/till/scratchpad/mesos/build > - > We cannot run any cgroups tests that require mounting > hierarchies because you have the following hierarchies mounted: > /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, > /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, > /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, > /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd > We'll disable the CgroupsNoHierarchyTest test fixture for now. > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > No 'perf' command found so no 'perf' tests will be run > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > The 'perf' command wasn't found so tests using it > to sample the 'cycles' hardware event will not be run. > - > /bin/nc > /usr/local/bin/curl > Note: Google Test filter = >