[jira] [Comment Edited] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682116#comment-16682116 ] Till Toenshoff edited comment on MESOS-4646 at 11/10/18 1:58 AM: - Just observed this again on Ubuntu16.04 (4.4 kernel) within our internal CI. The problems I see here manifest consistently as below: {noformat} 01:45:21 [ RUN ] PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP 01:45:21 I1110 01:45:21.625305 32279 port_mapping_tests.cpp:238] Using ens3 as the public interface 01:45:21 I1110 01:45:21.62 32279 port_mapping_tests.cpp:246] Using lo as the loopback interface 01:45:21 I1110 01:45:21.641434 32279 port_mapping.cpp:1570] Using ens3 as the public interface 01:45:21 I1110 01:45:21.641729 32279 port_mapping.cpp:1596] Using lo as the loopback interface 01:45:21 I1110 01:45:21.642727 32279 port_mapping.cpp:1895] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024' 01:45:21 I1110 01:45:21.642773 32279 port_mapping.cpp:1895] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128' 01:45:21 I1110 01:45:21.642808 32279 port_mapping.cpp:1895] /proc/sys/net/ipv4/tcp_wmem = '4096 16384 4194304' 01:45:21 I1110 01:45:21.642840 32279 port_mapping.cpp:1895] /proc/sys/net/ipv4/tcp_synack_retries = '5' 01:45:21 I1110 01:45:21.642879 32279 port_mapping.cpp:1895] /proc/sys/net/core/rmem_max = '212992' 01:45:21 I1110 01:45:21.642911 32279 port_mapping.cpp:1895] /proc/sys/net/core/somaxconn = '128' 01:45:21 I1110 01:45:21.642933 32279 port_mapping.cpp:1895] /proc/sys/net/core/wmem_max = '212992' 01:45:21 I1110 01:45:21.642961 32279 port_mapping.cpp:1895] /proc/sys/net/ipv4/tcp_rmem = '4096 87380 6291456' 01:45:21 I1110 01:45:21.642983 32279 port_mapping.cpp:1895] /proc/sys/net/ipv4/tcp_keepalive_time = '7200' 01:45:21 I1110 01:45:21.643013 32279 port_mapping.cpp:1895] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512' 01:45:21 I1110 01:45:21.643033 32279 port_mapping.cpp:1895] /proc/sys/net/core/netdev_max_backlog = '1000' 01:45:21 I1110 01:45:21.643064 32279 port_mapping.cpp:1895] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75' 01:45:21 I1110 01:45:21.643085 32279 port_mapping.cpp:1895] /proc/sys/net/ipv4/tcp_keepalive_probes = '9' 01:45:21 I1110 01:45:21.643112 32279 port_mapping.cpp:1895] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512' 01:45:21 I1110 01:45:21.643139 32279 port_mapping.cpp:1895] /proc/sys/net/ipv4/tcp_retries2 = '15' 01:45:21 I1110 01:45:21.646045 32279 linux_launcher.cpp:144] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher 01:45:21 I1110 01:45:21.646473 27877 port_mapping.cpp:2537] Using non-ephemeral ports {} and ephemeral ports [30016,30032) for container 0cb621ab-78d9-419a-9189-ff2856d4e559 of executor '' 01:45:21 I1110 01:45:21.647001 27878 linux_launcher.cpp:492] Launching container 0cb621ab-78d9-419a-9189-ff2856d4e559 and cloning with namespaces CLONE_NEWNS | CLONE_NEWNET 01:45:21 I1110 01:45:21.672200 27881 port_mapping.cpp:2602] Bind mounted '/proc/10192/ns/net' to '/run/netns/10192' for container 0cb621ab-78d9-419a-9189-ff2856d4e559 01:45:21 I1110 01:45:21.672251 27881 port_mapping.cpp:2618] Created network namespace handle symlink '/var/run/mesos/netns/0cb621ab-78d9-419a-9189-ff2856d4e559' -> '/run/netns/10192' 01:45:21 I1110 01:45:21.672857 27881 port_mapping.cpp:2678] Adding IP packet filters with ports [30016,30031] for container 0cb621ab-78d9-419a-9189-ff2856d4e559 01:45:21 Executing pre-exec command '{"shell":true,"value":"#!/bin/sh\nset -xe\nmount --make-rslave /run/netns\ntest -f /proc/sys/net/ipv6/conf/all/disable_ipv6 && echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6\nip link set lo address 12:3b:cf:2b:ed:2a mtu 9001 up\nethtool -K ens3 rx off\nip link set ens3 address 12:3b:cf:2b:ed:2a mtu 9001 up\nip addr add 172.16.10.54/24 dev ens3\nip route add default via 172.16.10.1\necho 30016 30031 > /proc/sys/net/ipv4/ip_local_port_range\necho 1 > /proc/sys/net/ipv4/conf/ens3/accept_local\necho 1 > /proc/sys/net/ipv4/conf/lo/accept_local\necho 1 > /proc/sys/net/ipv4/conf/lo/route_localnet\nif [ -f \"/proc/sys/net/ipv4/tcp_retries2\" ]; then\n echo '15' > /proc/sys/net/ipv4/tcp_retries2\nfi\nif [ -f \"/proc/sys/net/ipv4/tcp_max_syn_backlog\" ]; then\n echo '512' > /proc/sys/net/ipv4/tcp_max_syn_backlog\nfi\nif [ -f \"/proc/sys/net/ipv4/tcp_keepalive_probes\" ]; then\n echo '9' > /proc/sys/net/ipv4/tcp_keepalive_probes\nfi\nif [ -f \"/proc/sys/net/ipv4/tcp_synack_retries\" ]; then\n echo '5' > /proc/sys/net/ipv4/tcp_synack_retries\nfi\nif [ -f \"/proc/sys/net/ipv4/tcp_wmem\" ]; then\n echo '4096\t16384\t4194304' > /proc/sys/net/ipv4/tcp_wmem\nfi\nif [ -f \"/proc/sys/net/ipv4/neigh/default/gc_thresh1\" ]; then\n echo '128' > /proc/sys/net/ipv4/neigh/default/gc_thresh1\nfi\nif [ -f \"/proc/sys/net/ipv4/tcp_keepalive_intvl\" ]; then\n echo '75' >
[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682116#comment-16682116 ] Till Toenshoff commented on MESOS-4646: --- Just observed this again on Ubuntu16.04 (4.4 kernel) within our internal CI. > PortMappingIsolatorTests get kernel stuck. > -- > > Key: MESOS-4646 > URL: https://issues.apache.org/jira/browse/MESOS-4646 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.8.0 > Environment: Linux Kernel 3.19.9-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff >Priority: Major > > {noformat} > $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*" > Source directory: /home/till/scratchpad/mesos > Build directory: /home/till/scratchpad/mesos/build > - > We cannot run any cgroups tests that require mounting > hierarchies because you have the following hierarchies mounted: > /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, > /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, > /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, > /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd > We'll disable the CgroupsNoHierarchyTest test fixture for now. > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > No 'perf' command found so no 'perf' tests will be run > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > The 'perf' command wasn't found so tests using it > to sample the 'cycles' hardware event will not be run. > - > /bin/nc > /usr/local/bin/curl > Note: Google Test filter = >
[jira] [Assigned] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff reassigned MESOS-4646: - Assignee: (was: Cong Wang) > PortMappingIsolatorTests get kernel stuck. > -- > > Key: MESOS-4646 > URL: https://issues.apache.org/jira/browse/MESOS-4646 > Project: Mesos > Issue Type: Bug > Environment: Linux Kernel 3.19.9-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff >Priority: Major > > {noformat} > $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*" > Source directory: /home/till/scratchpad/mesos > Build directory: /home/till/scratchpad/mesos/build > - > We cannot run any cgroups tests that require mounting > hierarchies because you have the following hierarchies mounted: > /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, > /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, > /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, > /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd > We'll disable the CgroupsNoHierarchyTest test fixture for now. > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > No 'perf' command found so no 'perf' tests will be run > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > The 'perf' command wasn't found so tests using it > to sample the 'cycles' hardware event will not be run. > - > /bin/nc > /usr/local/bin/curl > Note: Google Test filter = >
[jira] [Commented] (MESOS-9258) Consider making Mesos subscribers send heartbeats
[ https://issues.apache.org/jira/browse/MESOS-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682078#comment-16682078 ] Joseph Wu commented on MESOS-9258: -- Alternative proposal for bounding the max number of subscribers: https://reviews.apache.org/r/69307/ This one requires almost no client-side changes (as long as clients already retry when disconnected) and the code changes are also somewhat minimal from a backporting perspective. > Consider making Mesos subscribers send heartbeats > - > > Key: MESOS-9258 > URL: https://issues.apache.org/jira/browse/MESOS-9258 > Project: Mesos > Issue Type: Improvement > Components: HTTP API >Reporter: Gastón Kleiman >Assignee: Joseph Wu >Priority: Critical > Labels: mesosphere > > Some reverse proxies (e.g., ELB using an HTTP listener) won't close the > upstream connection to Mesos when they detect that their client is > disconnected. > This can make Mesos leak subscribers, which generates unnecessary > authorization requests and affects performance. > We should evaluate methods (e.g., heartbeats) to enable Mesos to detect that > a subscriber is gone, even if the TCP connection is still open. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-9382) mesos-gtest-runner doesn't work on systems without ulimit binary
[ https://issues.apache.org/jira/browse/MESOS-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier reassigned MESOS-9382: --- Assignee: Benjamin Bannier > mesos-gtest-runner doesn't work on systems without ulimit binary > > > Key: MESOS-9382 > URL: https://issues.apache.org/jira/browse/MESOS-9382 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Ilya Pronin >Assignee: Benjamin Bannier >Priority: Major > > {{mesos-gtest-runner.py}} fails on systems without a separate ulimit binary > (i.e. CentOS 7). > {noformat} > /home/ipronin/mesos/build/../support/mesos-gtest-runner.py > --sequential=*ROOT_* ./mesos-tests > Could not check compatibility of ulimit settings: [Errno 2] No such file or > directory: 'ulimit' > {noformat} > The problem arises in [this > call|https://github.com/apache/mesos/blob/630d8938462381e8e7b0f44fa6434e47460fb178/support/mesos-gtest-runner.py#L209]. > Seems that it can be fixed by passing a {{shell=True}} argument to > {{subprocess.check_output()}}. > Another problem is {{ROOT_*}} tests which should be ran as root. For root > {{ulimit -u}} will most likely return "unlimited", which will again crash the > runner. > {noformat} > Could not check compatibility of ulimit settings: invalid literal for int() > with base 10: b'unlimited\n' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9382) mesos-gtest-runner doesn't work on systems without ulimit binary
[ https://issues.apache.org/jira/browse/MESOS-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682021#comment-16682021 ] Benjamin Bannier commented on MESOS-9382: - {noformat} commit 5367851e0e399d718df8f529ad977b6afd74bf99 Author: Benjamin Bannier Date: Fri Nov 9 21:50:42 2018 +0100 Fixed parallel test runner ulimit check for setups without limits. This patch adds special handling for the case where no process rlimit is in effect. For that we explicitly handle an output `unlimited` which can be present in e.g., `C` locales. Review: https://reviews.apache.org/r/69306 commit 9df27e3657aa44677668baaee70e46b7963b06e4 Author: Chun-Hung Hsiao Date: Fri Nov 9 11:47:01 2018 -0800 Fixed the ulimit validation in the parallel test runner. This patch fixes the following error: ``` $ support/mesos-gtest-runner.py build/bin/mesos-tests.sh Could not check compatibility of ulimit settings: \ [Errno 2] No such file or directory: 'ulimit': 'ulimit' ``` Review: https://reviews.apache.org/r/69301/ {noformat} > mesos-gtest-runner doesn't work on systems without ulimit binary > > > Key: MESOS-9382 > URL: https://issues.apache.org/jira/browse/MESOS-9382 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Ilya Pronin >Assignee: Benjamin Bannier >Priority: Major > > {{mesos-gtest-runner.py}} fails on systems without a separate ulimit binary > (i.e. CentOS 7). > {noformat} > /home/ipronin/mesos/build/../support/mesos-gtest-runner.py > --sequential=*ROOT_* ./mesos-tests > Could not check compatibility of ulimit settings: [Errno 2] No such file or > directory: 'ulimit' > {noformat} > The problem arises in [this > call|https://github.com/apache/mesos/blob/630d8938462381e8e7b0f44fa6434e47460fb178/support/mesos-gtest-runner.py#L209]. > Seems that it can be fixed by passing a {{shell=True}} argument to > {{subprocess.check_output()}}. > Another problem is {{ROOT_*}} tests which should be ran as root. For root > {{ulimit -u}} will most likely return "unlimited", which will again crash the > runner. > {noformat} > Could not check compatibility of ulimit settings: invalid literal for int() > with base 10: b'unlimited\n' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9382) mesos-gtest-runner doesn't work on systems without ulimit binary
Ilya Pronin created MESOS-9382: -- Summary: mesos-gtest-runner doesn't work on systems without ulimit binary Key: MESOS-9382 URL: https://issues.apache.org/jira/browse/MESOS-9382 Project: Mesos Issue Type: Bug Components: test Reporter: Ilya Pronin {{mesos-gtest-runner.py}} fails on systems without a separate ulimit binary (i.e. CentOS 7). {noformat} /home/ipronin/mesos/build/../support/mesos-gtest-runner.py --sequential=*ROOT_* ./mesos-tests Could not check compatibility of ulimit settings: [Errno 2] No such file or directory: 'ulimit' {noformat} The problem arises in [this call|https://github.com/apache/mesos/blob/630d8938462381e8e7b0f44fa6434e47460fb178/support/mesos-gtest-runner.py#L209]. Seems that it can be fixed by passing a {{shell=True}} argument to {{subprocess.check_output()}}. Another problem is {{ROOT_*}} tests which should be ran as root. For root {{ulimit -u}} will most likely return "unlimited", which will again crash the runner. {noformat} Could not check compatibility of ulimit settings: invalid literal for int() with base 10: b'unlimited\n' {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)