[jira] [Commented] (MESOS-8994) Ensure that the cmake build knows about all source files in the autotools build
[ https://issues.apache.org/jira/browse/MESOS-8994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529490#comment-16529490 ] Benjamin Bannier commented on MESOS-8994: - {noformat} commit c0488797eaacbf6c07cb79235d1174718d933a2c Author: Benjamin Bannier Date: Tue Jun 26 15:10:02 2018 -0700 Added a support script to check for files missing in CMake. This compares the sources listed in the Autotools and CMake build files, and emits the difference. We use this to check if the builds have diverged, and how to reconcile that divergence. Review: https://reviews.apache.org/r/67707/ {noformat} > Ensure that the cmake build knows about all source files in the autotools > build > --- > > Key: MESOS-8994 > URL: https://issues.apache.org/jira/browse/MESOS-8994 > Project: Mesos > Issue Type: Improvement > Components: build, cmake >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier >Priority: Critical > > We currently maintain two build systems in parallel with autotools still > being used by the larger part of contributors and cmake catching up in terms > of coverage and features. > > This has lead to situations where certain features were added only to the > autotools build while updating the cmake build was either implicitly (without > creating a ticket) deferred or forgotten. Identifying such missing coverage > makes it harder to gauge where the two build systems stand in terms of > feature parity and how much work is left before autotools can be retired. > We should update the cmake build setup to explicitly check whether any > sources files (headers and sources) unknown to it exist in the tree. Until > full parity is reached we would likely need to maintain a whitelist of files > known to be missing in the cmake build (this whitelist would at the same time > serve as a {{TODO}} list). The LLVM project uses the following function to > perform closely related work, > https://github.com/llvm-mirror/llvm/blob/master/cmake/modules/LLVMProcessSources.cmake#L70-L111. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9044) DefaultExecutorTest.ROOT_ContainerStatusForTask can segfault
Jan Schlicht created MESOS-9044: --- Summary: DefaultExecutorTest.ROOT_ContainerStatusForTask can segfault Key: MESOS-9044 URL: https://issues.apache.org/jira/browse/MESOS-9044 Project: Mesos Issue Type: Bug Components: test Affects Versions: 1.5.1 Environment: Ubuntu 16.04 Reporter: Jan Schlicht The following segfault occured when testing the {{1.5.x}} branch (SHA {{64341865d}}) on Ubuntu 16.04: {noformat} [ RUN ] MesosContainerizer/DefaultExecutorTest.ROOT_ContainerStatusForTask/0 I0702 08:32:25.241318 17172 cluster.cpp:172] Creating default 'local' authorizer I0702 08:32:25.242328 6510 master.cpp:457] Master be25b90e-f63d-4935-aaf3-cacfc7faacbf (ip-172-16-10-86.ec2.internal) started on 172.16.10.86:32891 I0702 08:32:25.242413 6510 master.cpp:459] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/I9TI6h/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/I9TI6h/master" --zk_session_timeout="10secs" I0702 08:32:25.242554 6510 master.cpp:508] Master only allowing authenticated frameworks to register I0702 08:32:25.242564 6510 master.cpp:514] Master only allowing authenticated agents to register I0702 08:32:25.242570 6510 master.cpp:520] Master only allowing authenticated HTTP frameworks to register I0702 08:32:25.242575 6510 credentials.hpp:37] Loading credentials for authentication from '/tmp/I9TI6h/credentials' I0702 08:32:25.242677 6510 master.cpp:564] Using default 'crammd5' authenticator I0702 08:32:25.242728 6510 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0702 08:32:25.242780 6510 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0702 08:32:25.242830 6510 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0702 08:32:25.242864 6510 master.cpp:643] Authorization enabled I0702 08:32:25.243048 6507 hierarchical.cpp:175] Initialized hierarchical allocator process I0702 08:32:25.243223 6507 whitelist_watcher.cpp:77] No whitelist given I0702 08:32:25.243743 6510 master.cpp:2210] Elected as the leading master! I0702 08:32:25.243768 6510 master.cpp:1690] Recovering from registrar I0702 08:32:25.243832 6511 registrar.cpp:347] Recovering registrar I0702 08:32:25.244055 6511 registrar.cpp:391] Successfully fetched the registry (0B) in 124928ns I0702 08:32:25.244096 6511 registrar.cpp:495] Applied 1 operations in 8690ns; attempting to update the registry I0702 08:32:25.244261 6511 registrar.cpp:552] Successfully updated the registry in 146944ns I0702 08:32:25.244302 6511 registrar.cpp:424] Successfully recovered registrar I0702 08:32:25.244416 6511 master.cpp:1803] Recovered 0 agents from the registry (172B); allowing 10mins for agents to re-register I0702 08:32:25.244556 6505 hierarchical.cpp:213] Skipping recovery of hierarchical allocator: nothing to recover W0702 08:32:25.246150 17172 process.cpp:2759] Attempted to spawn already running process files@172.16.10.86:32891 I0702 08:32:25.246560 17172 containerizer.cpp:304] Using isolation { environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni } I0702 08:32:25.250222 17172 linux_launcher.cpp:146] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher I0702 08:32:25.250689 17172 provisioner.cpp:299] Using default backend 'overlay' I0702 08:32:25.251200 17172 cluster.cpp:460] Creating default 'local' authorizer I0702 08:32:25.251788 6509 slave.cpp:262] Mesos agent started on (996)@172.16.10.86:32891 I0702 08:32:25.251878 6509 slave.cpp:263] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://"; --appc_store_dir="/tmp/
[jira] [Created] (MESOS-9045) LogZooKeeperTest.WriteRead can segfault
Jan Schlicht created MESOS-9045: --- Summary: LogZooKeeperTest.WriteRead can segfault Key: MESOS-9045 URL: https://issues.apache.org/jira/browse/MESOS-9045 Project: Mesos Issue Type: Bug Affects Versions: 1.5.1 Environment: macOS Reporter: Jan Schlicht The following segfault occured when testing the {{1.5.x}} branch (SHA {{64341865d}}) on macOS: {noformat} [ RUN ] LogZooKeeperTest.WriteRead I0702 00:49:46.259831 2560127808 jvm.cpp:590] Looking up method (Ljava/lang/String;)V I0702 00:49:46.260002 2560127808 jvm.cpp:590] Looking up method deleteOnExit()V I0702 00:49:46.260550 2560127808 jvm.cpp:590] Looking up method (Ljava/io/File;Ljava/io/File;)V log4j:WARN No appenders could be found for logger (org.apache.zookeeper.server.persistence.FileTxnSnapLog). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. I0702 00:49:46.305560 2560127808 jvm.cpp:590] Looking up method ()V I0702 00:49:46.306149 2560127808 jvm.cpp:590] Looking up method (Lorg/apache/zookeeper/server/persistence/FileTxnSnapLog;Lorg/apache/zookeeper/server/ZooKeeperServer$DataTreeBuilder;)V I0702 00:49:46.07 2560127808 jvm.cpp:590] Looking up method ()V I0702 00:49:46.343977 2560127808 jvm.cpp:590] Looking up method (I)V I0702 00:49:46.344200 2560127808 jvm.cpp:590] Looking up method configure(Ljava/net/InetSocketAddress;I)V I0702 00:49:46.357642 2560127808 jvm.cpp:590] Looking up method startup(Lorg/apache/zookeeper/server/ZooKeeperServer;)V I0702 00:49:46.437831 2560127808 jvm.cpp:590] Looking up method getClientPort()I I0702 00:49:46.437893 2560127808 zookeeper_test_server.cpp:156] Started ZooKeeperTestServer on port 54057 I0702 00:49:46.438153 2560127808 log_tests.cpp:2468] Using temporary directory '/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/LogZooKeeperTest_WriteRead_AKZArL' I0702 00:49:46.440680 2560127808 leveldb.cpp:174] Opened db in 2.415822ms I0702 00:49:46.441301 2560127808 leveldb.cpp:181] Compacted db in 584251ns I0702 00:49:46.441349 2560127808 leveldb.cpp:196] Created db iterator in 20482ns I0702 00:49:46.441380 2560127808 leveldb.cpp:202] Seeked to beginning of db in 14577ns I0702 00:49:46.441407 2560127808 leveldb.cpp:277] Iterated through 0 keys in the db in 16622ns I0702 00:49:46.441447 2560127808 replica.cpp:795] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0702 00:49:46.441737 207974400 leveldb.cpp:310] Persisting metadata (8 bytes) to leveldb took 157037ns I0702 00:49:46.441764 207974400 replica.cpp:322] Persisted replica status to VOTING I0702 00:49:46.443361 2560127808 leveldb.cpp:174] Opened db in 1.305425ms I0702 00:49:46.443821 2560127808 leveldb.cpp:181] Compacted db in 448477ns I0702 00:49:46.443871 2560127808 leveldb.cpp:196] Created db iterator in 12681ns I0702 00:49:46.443889 2560127808 leveldb.cpp:202] Seeked to beginning of db in 13291ns I0702 00:49:46.443914 2560127808 leveldb.cpp:277] Iterated through 0 keys in the db in 14460ns I0702 00:49:46.443944 2560127808 replica.cpp:795] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0702 00:49:46.444277 206901248 leveldb.cpp:310] Persisting metadata (8 bytes) to leveldb took 234740ns I0702 00:49:46.444317 206901248 replica.cpp:322] Persisted replica status to VOTING I0702 00:49:46.445854 2560127808 leveldb.cpp:174] Opened db in 1.253613ms I0702 00:49:46.446967 2560127808 leveldb.cpp:181] Compacted db in 1.096521ms I0702 00:49:46.447022 2560127808 leveldb.cpp:196] Created db iterator in 14312ns I0702 00:49:46.447048 2560127808 leveldb.cpp:202] Seeked to beginning of db in 16620ns I0702 00:49:46.447077 2560127808 leveldb.cpp:277] Iterated through 1 keys in the db in 21267ns I0702 00:49:46.447113 2560127808 replica.cpp:795] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned 2018-07-02 00:49:46,447:85946(0x7c6da000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8 2018-07-02 00:49:46,447:85946(0x7c6da000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local 2018-07-02 00:49:46,447:85946(0x7c657000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8 2018-07-02 00:49:46,447:85946(0x7c657000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local 2018-07-02 00:49:46,447:85946(0x7c6da000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin 2018-07-02 00:49:46,447:85946(0x7c6da000):ZOO_INFO@log_env@765: Client environment:os.arch=17.4.0 2018-07-02 00:49:46,447:85946(0x7c657000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin I0702 00:49:46.447453 206901248 log.cpp:108] Attempting to join replica to ZooKeeper group 2018-07-02 00:49:46,447:85946(0x7c6da000):ZOO_INFO@log_env@766: Client envi
[jira] [Commented] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network
[ https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529915#comment-16529915 ] Qian Zhang commented on MESOS-9031: --- [~Kirill P] Can you please elaborate a bit more on the reproduce steps? {quote}2 services running on the same mesos-slave using unified containerizer in different tasks and communicating via host ip and host port {quote} Did you mean that you launched two Mesos tasks via unified containerizer and one task listened and served on host IP & port and the other task failed to communicate with that IP & port due to timeout? Did these two tasks join the bridge network {{mesos-cni0}} or the host network? Can you provide the json of these two tasks? In another hand, the {{CNI-XXX}} chain is not created by {{mesos-cni-port-mapper}} plugin, it is actually created by the CNI bridge plugin, see [here|https://github.com/containernetworking/plugins/blob/v0.2.0/plugins/main/bridge/bridge.go#L223:L229] for details. > Mesos CNI portmap plugins' iptables rules doesn't allow connections via host > ip and port from the same bridge container network > --- > > Key: MESOS-9031 > URL: https://issues.apache.org/jira/browse/MESOS-9031 > Project: Mesos > Issue Type: Bug > Components: cni, containerization >Affects Versions: 1.6.0 >Reporter: Kirill Plyashkevich >Priority: Major > > using `mesos-cni-port-mapper` with folllowing config: > {noformat} > { > "name" : "dcos", > "type" : "mesos-cni-port-mapper", > "excludeDevices" : [], > "chain": "MESOS-CNI0-PORT-MAPPER", > "delegate": { > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "hairpinMode": true, > "ipam": { > "type": "host-local", > "ranges": [ > [{"subnet": "172.26.0.0/16"}] > ], > "routes": [ > {"dst": "0.0.0.0/0"} > ] > } > } > } > {noformat} > - 2 services running on the same mesos-slave using unified containerizer in > different tasks and communicating via host ip and host port > - connection timeouts due to iptables rules per container CNI-XXX chain > - actually timeouts are caused by > {noformat} > Chain CNI-XXX (1 references) > num target prot opt source destination > 1ACCEPT all -- anywhere 172.26.0.0/16/* name: > "dcos" id: "" */ > 2MASQUERADE all -- anywhere!base-address.mcast.net/4 /* > name: "dcos" id: "" */ > {noformat} > rule #1 is executed and no masquerading happens. > there are multiple solutions: > - simpliest and fastest one is not to add that ACCEPT > - perhaps, there's a better change in iptables rules that can fix it > - proper one (imho) is to finally implement cni spec 0.3.x in order to be > able to use chaining of plugins and use cni's `bridge` and `portmap` plugins > in chain (and get rid of mesos-cni-port-mapper completely eventually). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network
[ https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530017#comment-16530017 ] Kirill Plyashkevich commented on MESOS-9031: [~qianzhang], yes, of course: 2 standalone containers/tasks are launched on the same slave, joining the same `mesos-cni0` bridge network. tasks themselves are services of an akka cluster, so they communicate with other services.nodes of the cluster and between each other using host's ips and ports. both tasks fail to cummunicate between each other due to timeout (with `excludeDevices` set to one including `mesos-cni0` connection will just get refused). unfortunatelly striped json won't be a lot of help here, short interaction can be discribed as e.g.: node1@172.26.0.2:2552 tries to reach node2@192.168.1.123:31303 (host ip and other service port, which is effectively node2@172.26.0.3:2552) I've been digging into `bridge` recently as well and ACCEPT rule is added [here|[https://github.com/containernetworking/plugins/blob/master/pkg/ip/ipmasq_linux.go#L63].] that said it's related to `cni/bridge` plugin. > Mesos CNI portmap plugins' iptables rules doesn't allow connections via host > ip and port from the same bridge container network > --- > > Key: MESOS-9031 > URL: https://issues.apache.org/jira/browse/MESOS-9031 > Project: Mesos > Issue Type: Bug > Components: cni, containerization >Affects Versions: 1.6.0 >Reporter: Kirill Plyashkevich >Priority: Major > > using `mesos-cni-port-mapper` with folllowing config: > {noformat} > { > "name" : "dcos", > "type" : "mesos-cni-port-mapper", > "excludeDevices" : [], > "chain": "MESOS-CNI0-PORT-MAPPER", > "delegate": { > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "hairpinMode": true, > "ipam": { > "type": "host-local", > "ranges": [ > [{"subnet": "172.26.0.0/16"}] > ], > "routes": [ > {"dst": "0.0.0.0/0"} > ] > } > } > } > {noformat} > - 2 services running on the same mesos-slave using unified containerizer in > different tasks and communicating via host ip and host port > - connection timeouts due to iptables rules per container CNI-XXX chain > - actually timeouts are caused by > {noformat} > Chain CNI-XXX (1 references) > num target prot opt source destination > 1ACCEPT all -- anywhere 172.26.0.0/16/* name: > "dcos" id: "" */ > 2MASQUERADE all -- anywhere!base-address.mcast.net/4 /* > name: "dcos" id: "" */ > {noformat} > rule #1 is executed and no masquerading happens. > there are multiple solutions: > - simpliest and fastest one is not to add that ACCEPT > - perhaps, there's a better change in iptables rules that can fix it > - proper one (imho) is to finally implement cni spec 0.3.x in order to be > able to use chaining of plugins and use cni's `bridge` and `portmap` plugins > in chain (and get rid of mesos-cni-port-mapper completely eventually). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network
[ https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530038#comment-16530038 ] Qian Zhang commented on MESOS-9031: --- [~Kirill P] For the two tasks, do they have port mapping enabled (i.e., specify port mapping info in their {{ContainerInfo.network_infos)}}? Or they just joined {{mesos-cni0}} bridge network and cannot communicate with other akka service nodes running on other Mesos agent host via Mesos agent host IP & port? And what about the other akka service nodes? Do they join {{mesos-cni0}} bridge network with port mapping enabled or just join host network? > Mesos CNI portmap plugins' iptables rules doesn't allow connections via host > ip and port from the same bridge container network > --- > > Key: MESOS-9031 > URL: https://issues.apache.org/jira/browse/MESOS-9031 > Project: Mesos > Issue Type: Bug > Components: cni, containerization >Affects Versions: 1.6.0 >Reporter: Kirill Plyashkevich >Priority: Major > > using `mesos-cni-port-mapper` with folllowing config: > {noformat} > { > "name" : "dcos", > "type" : "mesos-cni-port-mapper", > "excludeDevices" : [], > "chain": "MESOS-CNI0-PORT-MAPPER", > "delegate": { > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "hairpinMode": true, > "ipam": { > "type": "host-local", > "ranges": [ > [{"subnet": "172.26.0.0/16"}] > ], > "routes": [ > {"dst": "0.0.0.0/0"} > ] > } > } > } > {noformat} > - 2 services running on the same mesos-slave using unified containerizer in > different tasks and communicating via host ip and host port > - connection timeouts due to iptables rules per container CNI-XXX chain > - actually timeouts are caused by > {noformat} > Chain CNI-XXX (1 references) > num target prot opt source destination > 1ACCEPT all -- anywhere 172.26.0.0/16/* name: > "dcos" id: "" */ > 2MASQUERADE all -- anywhere!base-address.mcast.net/4 /* > name: "dcos" id: "" */ > {noformat} > rule #1 is executed and no masquerading happens. > there are multiple solutions: > - simpliest and fastest one is not to add that ACCEPT > - perhaps, there's a better change in iptables rules that can fix it > - proper one (imho) is to finally implement cni spec 0.3.x in order to be > able to use chaining of plugins and use cni's `bridge` and `portmap` plugins > in chain (and get rid of mesos-cni-port-mapper completely eventually). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network
[ https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530058#comment-16530058 ] Kirill Plyashkevich commented on MESOS-9031: [~qianzhang], services are part of marathon's pod and get their ports mapped properly (port mapping is enabled). communication with external nodes on other agents and launched in host network is ok. the problem is only for services using port-maping on the same bridge network `mesos-cni0`. more I'm thinking about it, more it looks like a problem in `mesos-cni-port-mapper`. if you take a look at [cni portmap|https://github.com/containernetworking/plugins/tree/master/plugins/meta/portmap], it does MASQUARADE/SNAT on its own. so regardless of rules set by `bridge` plugin traffic still gets masqueraded, which is exactly what needed. > Mesos CNI portmap plugins' iptables rules doesn't allow connections via host > ip and port from the same bridge container network > --- > > Key: MESOS-9031 > URL: https://issues.apache.org/jira/browse/MESOS-9031 > Project: Mesos > Issue Type: Bug > Components: cni, containerization >Affects Versions: 1.6.0 >Reporter: Kirill Plyashkevich >Priority: Major > > using `mesos-cni-port-mapper` with folllowing config: > {noformat} > { > "name" : "dcos", > "type" : "mesos-cni-port-mapper", > "excludeDevices" : [], > "chain": "MESOS-CNI0-PORT-MAPPER", > "delegate": { > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "hairpinMode": true, > "ipam": { > "type": "host-local", > "ranges": [ > [{"subnet": "172.26.0.0/16"}] > ], > "routes": [ > {"dst": "0.0.0.0/0"} > ] > } > } > } > {noformat} > - 2 services running on the same mesos-slave using unified containerizer in > different tasks and communicating via host ip and host port > - connection timeouts due to iptables rules per container CNI-XXX chain > - actually timeouts are caused by > {noformat} > Chain CNI-XXX (1 references) > num target prot opt source destination > 1ACCEPT all -- anywhere 172.26.0.0/16/* name: > "dcos" id: "" */ > 2MASQUERADE all -- anywhere!base-address.mcast.net/4 /* > name: "dcos" id: "" */ > {noformat} > rule #1 is executed and no masquerading happens. > there are multiple solutions: > - simpliest and fastest one is not to add that ACCEPT > - perhaps, there's a better change in iptables rules that can fix it > - proper one (imho) is to finally implement cni spec 0.3.x in order to be > able to use chaining of plugins and use cni's `bridge` and `portmap` plugins > in chain (and get rid of mesos-cni-port-mapper completely eventually). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network
[ https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530120#comment-16530120 ] Qian Zhang commented on MESOS-9031: --- [~Kirill P] So there are two service nodes (i.e., two Mesos tasks) join the bridge network {{mesos-cni0}} on the same Mesos agent host, and both of the two services nodes have port mapping enabled, but they cannot communicate with the Mesos agent host IP & mapped port between each other, right? {quote}if you take a look at [cni portmap|https://github.com/containernetworking/plugins/tree/master/plugins/meta/portmap], it does MASQUARADE/SNAT on its own. so regardless of rules set by `bridge` plugin traffic still gets masqueraded, which is exactly what needed. {quote} So you think the timeout issue is not caused by rule #1 in the chain CNI-XXX set by the {{bridge}} plugin? But one of your proposed solution is not to add that rule. > Mesos CNI portmap plugins' iptables rules doesn't allow connections via host > ip and port from the same bridge container network > --- > > Key: MESOS-9031 > URL: https://issues.apache.org/jira/browse/MESOS-9031 > Project: Mesos > Issue Type: Bug > Components: cni, containerization >Affects Versions: 1.6.0 >Reporter: Kirill Plyashkevich >Priority: Major > > using `mesos-cni-port-mapper` with folllowing config: > {noformat} > { > "name" : "dcos", > "type" : "mesos-cni-port-mapper", > "excludeDevices" : [], > "chain": "MESOS-CNI0-PORT-MAPPER", > "delegate": { > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "hairpinMode": true, > "ipam": { > "type": "host-local", > "ranges": [ > [{"subnet": "172.26.0.0/16"}] > ], > "routes": [ > {"dst": "0.0.0.0/0"} > ] > } > } > } > {noformat} > - 2 services running on the same mesos-slave using unified containerizer in > different tasks and communicating via host ip and host port > - connection timeouts due to iptables rules per container CNI-XXX chain > - actually timeouts are caused by > {noformat} > Chain CNI-XXX (1 references) > num target prot opt source destination > 1ACCEPT all -- anywhere 172.26.0.0/16/* name: > "dcos" id: "" */ > 2MASQUERADE all -- anywhere!base-address.mcast.net/4 /* > name: "dcos" id: "" */ > {noformat} > rule #1 is executed and no masquerading happens. > there are multiple solutions: > - simpliest and fastest one is not to add that ACCEPT > - perhaps, there's a better change in iptables rules that can fix it > - proper one (imho) is to finally implement cni spec 0.3.x in order to be > able to use chaining of plugins and use cni's `bridge` and `portmap` plugins > in chain (and get rid of mesos-cni-port-mapper completely eventually). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network
[ https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530135#comment-16530135 ] Kirill Plyashkevich commented on MESOS-9031: [~qianzhang], {quote}So there are two service nodes (i.e., two Mesos tasks) join the bridge network mesos-cni0 on the same Mesos agent host, and both of the two services nodes have port mapping enabled, but they cannot communicate with the Mesos agent host IP & mapped port between each other, right?{quote} yes, that's correct {quote} So you think the timeout issue is not caused by rule #1 in the chain CNI-XXX set by the bridge plugin? But one of your proposed solution is not to add that rule. {quote} that was my initial assumption, and deeper investigation shows that my proposal #1 is not actually a solution here. the timeout is caused by missing snat/masquerade, which is not happening. `cni/portmap` has proper implementation with snat/masquerade. so, if `mesos-cni-port-mapper` does smth alike and do the snat/masquerade, issue will be solved. that said, IMHO, solutions #2 (with adding logic alike `cni/portmap` and #3 are the only left. > Mesos CNI portmap plugins' iptables rules doesn't allow connections via host > ip and port from the same bridge container network > --- > > Key: MESOS-9031 > URL: https://issues.apache.org/jira/browse/MESOS-9031 > Project: Mesos > Issue Type: Bug > Components: cni, containerization >Affects Versions: 1.6.0 >Reporter: Kirill Plyashkevich >Priority: Major > > using `mesos-cni-port-mapper` with folllowing config: > {noformat} > { > "name" : "dcos", > "type" : "mesos-cni-port-mapper", > "excludeDevices" : [], > "chain": "MESOS-CNI0-PORT-MAPPER", > "delegate": { > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "hairpinMode": true, > "ipam": { > "type": "host-local", > "ranges": [ > [{"subnet": "172.26.0.0/16"}] > ], > "routes": [ > {"dst": "0.0.0.0/0"} > ] > } > } > } > {noformat} > - 2 services running on the same mesos-slave using unified containerizer in > different tasks and communicating via host ip and host port > - connection timeouts due to iptables rules per container CNI-XXX chain > - actually timeouts are caused by > {noformat} > Chain CNI-XXX (1 references) > num target prot opt source destination > 1ACCEPT all -- anywhere 172.26.0.0/16/* name: > "dcos" id: "" */ > 2MASQUERADE all -- anywhere!base-address.mcast.net/4 /* > name: "dcos" id: "" */ > {noformat} > rule #1 is executed and no masquerading happens. > there are multiple solutions: > - simpliest and fastest one is not to add that ACCEPT > - perhaps, there's a better change in iptables rules that can fix it > - proper one (imho) is to finally implement cni spec 0.3.x in order to be > able to use chaining of plugins and use cni's `bridge` and `portmap` plugins > in chain (and get rid of mesos-cni-port-mapper completely eventually). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8935) Quota limit "chopping" can lead to cpu-only and memory-only offers.
[ https://issues.apache.org/jira/browse/MESOS-8935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530369#comment-16530369 ] Greg Mann commented on MESOS-8935: -- Backports: 1.6.x: {code} commit 0587245b66ad3f2209c66a67211d987d2abdd371 Author: Meng Zhu Date: Wed Jun 20 17:00:03 2018 -0700 Added a master flag to configure minimum allocatable resources. This patch adds a new master flag `min_allocatable_resources`. It specifies one or more resource quantities that define the minimum allocatable resources for the allocator. The allocator will only offer resources that contain at least one of the specified resource quantities. For example, the setting `disk:1000|cpus:1;mem:32` means that the allocator will only allocate resources when they contain 1000MB of disk, or when they contain both 1 cpu and 32MB of memory. The default value for this new flag is such that it maintains previous default behavior. Also fixed all related tests and updated documentation. Review: https://reviews.apache.org/r/67513/ commit ccb24bf3f7e098723179ee4b595aa99e3a0869e4 Author: Meng Zhu Date: Thu Jun 28 08:33:00 2018 -0700 Added a resource utility `isScalarQuantity`. `isScalarQuantity()` checks if a `Resources` object is "pure" scalar quantity i.e. its `Resource`(s) only has name, type (set to scalar) and scalar fields set. Also added tests. Review: https://reviews.apache.org/r/67516/ commit a615f36d9f10c92eaa4be95978987976dfc085e8 Author: Meng Zhu Date: Thu Jun 28 08:32:47 2018 -0700 Fixed a bug in `createStrippedScalarQuantity()`. This patch fixes `createStrippedScalarQuantity()` by stripping the revocable field in resources. Also added a test. Review: https://reviews.apache.org/r/67510/ {code} 1.5.x: {code} commit 2e16cdb16ee9cc4162fad8b3957d69b9af7dbd8b Author: Meng Zhu Date: Wed Jun 20 17:00:03 2018 -0700 Added a master flag to configure minimum allocatable resources. This patch adds a new master flag `min_allocatable_resources`. It specifies one or more resource quantities that define the minimum allocatable resources for the allocator. The allocator will only offer resources that contain at least one of the specified resource quantities. For example, the setting `disk:1000|cpus:1;mem:32` means that the allocator will only allocate resources when they contain 1000MB of disk, or when they contain both 1 cpu and 32MB of memory. The default value for this new flag is such that it maintains previous default behavior. Also fixed all related tests and updated documentation. Review: https://reviews.apache.org/r/67513/ commit be077099b4dfcc1f82fe7f5ed222567eeb0c082c Author: Meng Zhu Date: Wed Jun 20 16:59:58 2018 -0700 Added a resource utility `isScalarQuantity`. `isScalarQuantity()` checks if a `Resources` object is a "pure" scalar quantity; i.e., its resources only have name, type (set to scalar) and scalar fields set. Also added tests. Review: https://reviews.apache.org/r/67516/ commit 7a19d085c8aead7693c5d6212dbba7db771e60f6 Author: Meng Zhu Date: Wed Jun 20 16:59:54 2018 -0700 Fixed a bug in `createStrippedScalarQuantity()`. This patch fixes `createStrippedScalarQuantity()` by stripping the revocable field in resources. Also added a test. Review: https://reviews.apache.org/r/67510/ commit c9efa4048be540f6cee47c012a7637ffed5e203e Author: Benjamin Mahler Date: Mon Feb 5 13:32:37 2018 -0800 Introduced a CHECK_NOTERROR macro. Review: https://reviews.apache.org/r/65514 {code} 1.4.x: {code} commit 9cba3aa6dd571f8b92b46261d2e1256b0c47e338 (1.4.x-allocatable) Author: Meng Zhu Date: Wed Jun 20 17:00:03 2018 -0700 Added a master flag to configure minimum allocatable resources. This patch adds a new master flag `min_allocatable_resources`. It specifies one or more resource quantities that define the minimum allocatable resources for the allocator. The allocator will only offer resources that contain at least one of the specified resource quantities. For example, the setting `disk:1000|cpus:1;mem:32` means that the allocator will only allocate resources when they contain 1000MB of disk, or when they contain both 1 cpu and 32MB of memory. The default value for this new flag is such that it maintains previous default behavior. Also fixed all related tests and updated documentation. Review: https://reviews.apache.org/r/67513/ commit c45b4bdd50ee2fc5db7e3c2274ef2938a8999c22 Author: Meng Zhu Date: Wed Jun 20 16:59:58 2018 -0700 Added a resource utility `isScalarQuantity`. `isScalarQuantity()` checks if a `Resources` object is a "pure" scalar quantity; i.e., its resources only have name, type (set to scalar) and scalar
[jira] [Created] (MESOS-9046) Agent restart may fail on checkpointed resources.
Till Toenshoff created MESOS-9046: - Summary: Agent restart may fail on checkpointed resources. Key: MESOS-9046 URL: https://issues.apache.org/jira/browse/MESOS-9046 Project: Mesos Issue Type: Improvement Affects Versions: 1.6.0 Reporter: Till Toenshoff When the user changes the agent resources, the resulting error message does not help in getting the problem resolved. Consider a user having added or changed a mounted volume, then restart the agent while only having erased {{${MESOS_WORK_DIR}/meta/slaves/latest}} - the result may look as follows; {noformat} E0702 11:44:53.00 2278 slave.cpp:7305] EXIT with status 1: Failed to perform recovery: Checkpointed resources [...] [MOUNT:/dcos/volume1,5b0ca558-7e1f-463a-87ab-4c52899c4727:name-data]:5851 are incompatible with agent resources [...] {noformat} This error message, while certainly being correct, may not be as helpful as it could be. We should consider offering advice on how to work around or fix this very common issue. We may want to tell the user to: 1. {{rm -rf ${MESOS_WORK_DIR}/meta/slaves/latest}} 2. {{rm -rf ${MESOS_WORK_DIR}/meta/resources}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9047) ProvisionerDockerLocalStoreTest.MissingLayer is flaky.
Gilbert Song created MESOS-9047: --- Summary: ProvisionerDockerLocalStoreTest.MissingLayer is flaky. Key: MESOS-9047 URL: https://issues.apache.org/jira/browse/MESOS-9047 Project: Mesos Issue Type: Bug Components: containerization Environment: mesos-ec2-ubuntu-14.04-SSL Reporter: Gilbert Song {noformat} ../../src/tests/containerizer/provisioner_docker_tests.cpp:284 (imageInfo).failure(): Collect failed: Subprocess 'tar, tar, -x, -f, /tmp/nQt3Eu/store/staging/D1LuiF/123/layer.tar, -C, /tmp/nQt3Eu/store/staging/D1LuiF/123/rootfs' failed: tar: This does not look like a tar archive tar: Exiting with failure status due to previous errors {noformat} {noformat} agent log to be added ... {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9048) Build and persist quota headroom info across allocation cycle.
Meng Zhu created MESOS-9048: --- Summary: Build and persist quota headroom info across allocation cycle. Key: MESOS-9048 URL: https://issues.apache.org/jira/browse/MESOS-9048 Project: Mesos Issue Type: Improvement Reporter: Meng Zhu Assignee: Meng Zhu Currently, in the allocator, quota headroom info is built up from scratch at the beginning of each allocation iteration. This affects performance and increases code complexity. We should be able to track and persist this info as we make new allocations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network
[ https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Zhang reassigned MESOS-9031: - Assignee: Qian Zhang Sprint: Mesosphere Sprint 2018-23 > Mesos CNI portmap plugins' iptables rules doesn't allow connections via host > ip and port from the same bridge container network > --- > > Key: MESOS-9031 > URL: https://issues.apache.org/jira/browse/MESOS-9031 > Project: Mesos > Issue Type: Bug > Components: cni, containerization >Affects Versions: 1.6.0 >Reporter: Kirill Plyashkevich >Assignee: Qian Zhang >Priority: Major > > using `mesos-cni-port-mapper` with folllowing config: > {noformat} > { > "name" : "dcos", > "type" : "mesos-cni-port-mapper", > "excludeDevices" : [], > "chain": "MESOS-CNI0-PORT-MAPPER", > "delegate": { > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "hairpinMode": true, > "ipam": { > "type": "host-local", > "ranges": [ > [{"subnet": "172.26.0.0/16"}] > ], > "routes": [ > {"dst": "0.0.0.0/0"} > ] > } > } > } > {noformat} > - 2 services running on the same mesos-slave using unified containerizer in > different tasks and communicating via host ip and host port > - connection timeouts due to iptables rules per container CNI-XXX chain > - actually timeouts are caused by > {noformat} > Chain CNI-XXX (1 references) > num target prot opt source destination > 1ACCEPT all -- anywhere 172.26.0.0/16/* name: > "dcos" id: "" */ > 2MASQUERADE all -- anywhere!base-address.mcast.net/4 /* > name: "dcos" id: "" */ > {noformat} > rule #1 is executed and no masquerading happens. > there are multiple solutions: > - -simpliest and fastest one is not to add that ACCEPT- - NOT A SOLUTION. > it's happening in `bridge` plugin and `cni/portmap` shows that > snat/masquerade should be done during portmapping as well. > - perhaps, there's a better change in iptables rules that can fix it > - proper one (imho) is to finally implement cni spec 0.3.x in order to be > able to use chaining of plugins and use cni's `bridge` and `portmap` plugins > in chain (and get rid of mesos-cni-port-mapper completely eventually). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9039) CNI isolator recovery should wait until unknown orphan cleanup is done
[ https://issues.apache.org/jira/browse/MESOS-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530649#comment-16530649 ] Qian Zhang commented on MESOS-9039: --- The main purpose of this fix is to ensure the test {{CniIsolatorTest.ROOT_SlaveRecovery}} that we updated in [https://reviews.apache.org/r/67737/] can catch the regression described in MESOS-9025, I think this ticket will not cause any actual issues in a real environment, so I think we do not need to backport the fix. > CNI isolator recovery should wait until unknown orphan cleanup is done > -- > > Key: MESOS-9039 > URL: https://issues.apache.org/jira/browse/MESOS-9039 > Project: Mesos > Issue Type: Bug > Components: cni >Reporter: Qian Zhang >Assignee: Qian Zhang >Priority: Major > Fix For: 1.7.0 > > > Currently, CNI isolator will cleanup unknown orphaned containers in an > asynchronous way (see > [here|https://github.com/apache/mesos/blob/1.6.0/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp#L439] > for details) during recovery, that means agent recovery can finish while the > cleanup of unknown orphaned containers is still ongoing which is not ideal. > So we need to make CNI isolator recovery waits until unknown orphan cleanup > is done. -- This message was sent by Atlassian JIRA (v7.6.3#76005)