[jira] [Comment Edited] (MESOS-9518) CNI_NETNS should not be set for orphan containers that do not have network namespace
[ https://issues.apache.org/jira/browse/MESOS-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739761#comment-16739761 ] Jie Yu edited comment on MESOS-9518 at 1/11/19 6:19 AM: https://reviews.apache.org/r/69706/ https://reviews.apache.org/r/69710/ https://reviews.apache.org/r/69711/ https://reviews.apache.org/r/69712/ https://reviews.apache.org/r/69713/ https://reviews.apache.org/r/69714/ https://reviews.apache.org/r/69715/ was (Author: jieyu): Fix: https://reviews.apache.org/r/69706/ Adding tests now. > CNI_NETNS should not be set for orphan containers that do not have network > namespace > > > Key: MESOS-9518 > URL: https://issues.apache.org/jira/browse/MESOS-9518 > Project: Mesos > Issue Type: Bug > Components: cni >Affects Versions: 1.4.2, 1.5.1, 1.6.1, 1.7.0 >Reporter: Jie Yu >Assignee: Jie Yu >Priority: Major > > We introduced a new agent flag in MESOS-9492 so that CNI configs can be > persisted across reboot. This is for some CNI plugins to be able to cleanup > IP allocated to the containers after a sudden reboot of the host (not all CNI > plugins need this). > It's important to unset `CNI_NETNS` environment variable after reboot when > invoking CNI plugin "DEL" command so that it conforms to the spec: > {noformat} > When CNI_NETNS and/or prevResult are not provided, the plugin should clean up > as many resources as possible (e.g. releasing IPAM allocations) and return a > successful response. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9519) Unable to build Mesos with CMake on Ubuntu 14.04.
Chun-Hung Hsiao created MESOS-9519: -- Summary: Unable to build Mesos with CMake on Ubuntu 14.04. Key: MESOS-9519 URL: https://issues.apache.org/jira/browse/MESOS-9519 Project: Mesos Issue Type: Bug Components: build Affects Versions: 1.7.0 Reporter: Chun-Hung Hsiao Assignee: Chun-Hung Hsiao Running the following command to build Mesos on Ubuntu 14.04 will lead to the error shown below: {noformat} OS=ubuntu:14.04 BUILDTOOL=cmake COMPILER=gcc CONFIGURATION='--verbose --enable-libevent --enable-ssl' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1' JOBS=48 nice support/docker-build.sh{noformat} {noformat} /mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc: In function 'tsi_result ssl_handshaker_extract_peer(tsi_handshaker*, tsi_peer*)': /mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:1011:71: error: 'SSL_get0_alpn_selected' was not declared in this scope SSL_get0_alpn_selected(impl->ssl, &alpn_selected, &alpn_selected_len); ^ /mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc: In function 'tsi_result tsi_create_ssl_client_handshaker_factory(const tsi_ssl_pem_key_cert_pair*, const char*, const char*, const char**, uint16_t, tsi_ssl_client_handshaker_factory**)': /mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:1417:73: error: 'SSL_CTX_set_alpn_protos' was not declared in this scope static_cast(impl->alpn_protocol_list_length))) { ^ /mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc: In function 'tsi_result tsi_create_ssl_server_handshaker_factory_ex(const tsi_ssl_pem_key_cert_pair*, size_t, const char*, tsi_client_certificate_request_type, const char*, const char**, uint16_t, tsi_ssl_server_handshaker_factory**)': /mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:1557:79: error: 'SSL_CTX_set_alpn_select_cb' was not declared in this scope server_handshaker_factory_alpn_callback, impl); ^ make[7]: *** [CMakeFiles/grpc.dir/src/core/tsi/ssl_transport_security.cc.o] Error 1 make[7]: Leaving directory `/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build' make[6]: *** [CMakeFiles/grpc.dir/all] Error 2 make[6]: Leaving directory `/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build' make[5]: *** [CMakeFiles/grpc.dir/rule] Error 2 make[5]: Leaving directory `/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build' make[4]: *** [grpc] Error 2 make[4]: Leaving directory `/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build' make[3]: *** [3rdparty/grpc-1.10.0/src/grpc-1.10.0-stamp/grpc-1.10.0-build] Error 2 make[3]: Leaving directory `/mesos/build' make[2]: *** [3rdparty/CMakeFiles/grpc-1.10.0.dir/all] Error 2 make[2]: *** Waiting for unfinished jobs{noformat} The reason is that gRPC's CMake rules does not disable ALPN on systems with OpenSSL 1.0.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-9509) Benchmark command health checks in default executor
[ https://issues.apache.org/jira/browse/MESOS-9509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-9509: Shepherd: Vinod Kone Assignee: Greg Mann Sprint: Mesos Foundations R9 Sprint 37 Story Points: 5 Labels: default-executor foundations mesosphere perfomance (was: default-executor foundations perfomance) > Benchmark command health checks in default executor > --- > > Key: MESOS-9509 > URL: https://issues.apache.org/jira/browse/MESOS-9509 > Project: Mesos > Issue Type: Task > Components: executor >Reporter: Vinod Kone >Assignee: Greg Mann >Priority: Major > Labels: default-executor, foundations, mesosphere, perfomance > > TCP/HTTP health checks were extensively scale tested as part of > https://mesosphere.com/blog/introducing-mesos-native-health-checks-apache-mesos-part-2/. > > We should do the same for command checks by default executor because it uses > a very different mechanism (agent fork/execs the check command as a nested > container) and will have very different scalability characteristics. > We should also use these benchmarks as an opportunity to produce perf traces > of the Mesos agent (both with and without process inheritance) so that a > thorough analysis of the performance can be done as part of MESOS-9513. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9518) CNI_NETNS should not be set for orphan containers that do not have network namespace
[ https://issues.apache.org/jira/browse/MESOS-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739761#comment-16739761 ] Jie Yu commented on MESOS-9518: --- Fix: https://reviews.apache.org/r/69706/ Adding tests now. > CNI_NETNS should not be set for orphan containers that do not have network > namespace > > > Key: MESOS-9518 > URL: https://issues.apache.org/jira/browse/MESOS-9518 > Project: Mesos > Issue Type: Bug > Components: cni >Affects Versions: 1.4.2, 1.5.1, 1.6.1, 1.7.0 >Reporter: Jie Yu >Assignee: Jie Yu >Priority: Major > > We introduced a new agent flag in MESOS-9492 so that CNI configs can be > persisted across reboot. This is for some CNI plugins to be able to cleanup > IP allocated to the containers after a sudden reboot of the host (not all CNI > plugins need this). > It's important to unset `CNI_NETNS` environment variable after reboot when > invoking CNI plugin "DEL" command so that it conforms to the spec: > {noformat} > When CNI_NETNS and/or prevResult are not provided, the plugin should clean up > as many resources as possible (e.g. releasing IPAM allocations) and return a > successful response. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9224) De-duplicate read-only requests to master based on principal.
[ https://issues.apache.org/jira/browse/MESOS-9224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739688#comment-16739688 ] Greg Mann commented on MESOS-9224: -- Review chain ends here: https://reviews.apache.org/r/69064/ > De-duplicate read-only requests to master based on principal. > - > > Key: MESOS-9224 > URL: https://issues.apache.org/jira/browse/MESOS-9224 > Project: Mesos > Issue Type: Improvement > Components: HTTP API >Reporter: Alexander Rukletsov >Assignee: Benno Evers >Priority: Major > Labels: performance > > "Identical" read-only requests can be batched and answered together. With > batching available (MESOS-9158), we can now deduplicate requests based on > principal. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739635#comment-16739635 ] Andrei Budnik edited comment on MESOS-7971 at 1/10/19 5:40 PM: --- This is something different from previous ones. {code:java} E0110 17:13:09.326659 13916 master.cpp:8586] Failed to find the operation '' (uuid: 825f65eb-3ba1-4dfa-bdfa-8eb29194ace3) for an operator API call on agent ae22a9c8-0ef6-4f1e-b1eb-7b55f6e4508b-S0 {code} Full log: {code:java} [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove I0110 17:12:59.303460 13893 cluster.cpp:174] Creating default 'local' authorizer I0110 17:12:59.304430 13912 master.cpp:416] Master ae22a9c8-0ef6-4f1e-b1eb-7b55f6e4508b (ip-172-16-10-92.ec2.internal) started on 172.16.10.92:42320 I0110 17:12:59.304451 13912 master.cpp:419] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1000secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/PfFTwT/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_operator_event_stream_subscribers="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --publish_per_framework_metrics="true" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --roles="role1" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/PfFTwT/master" --zk_session_timeout="10secs" I0110 17:12:59.304585 13912 master.cpp:468] Master only allowing authenticated frameworks to register I0110 17:12:59.304595 13912 master.cpp:474] Master only allowing authenticated agents to register I0110 17:12:59.304603 13912 master.cpp:480] Master only allowing authenticated HTTP frameworks to register I0110 17:12:59.304615 13912 credentials.hpp:37] Loading credentials for authentication from '/tmp/PfFTwT/credentials' I0110 17:12:59.304684 13912 master.cpp:524] Using default 'crammd5' authenticator I0110 17:12:59.304744 13912 http.cpp:965] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0110 17:12:59.304831 13912 http.cpp:965] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0110 17:12:59.304889 13912 http.cpp:965] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0110 17:12:59.304941 13912 master.cpp:605] Authorization enabled W0110 17:12:59.304967 13912 master.cpp:668] The '--roles' flag is deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade notes for more information I0110 17:12:59.305047 13919 hierarchical.cpp:176] Initialized hierarchical allocator process I0110 17:12:59.305128 13918 whitelist_watcher.cpp:77] No whitelist given I0110 17:12:59.305600 13914 master.cpp:2085] Elected as the leading master! I0110 17:12:59.305622 13914 master.cpp:1640] Recovering from registrar I0110 17:12:59.305698 13913 registrar.cpp:339] Recovering registrar I0110 17:12:59.305853 13912 registrar.cpp:383] Successfully fetched the registry (0B) in 118016ns I0110 17:12:59.305899 13912 registrar.cpp:487] Applied 1 operations in 8238ns; attempting to update the registry I0110 17:12:59.306036 13912 registrar.cpp:544] Successfully updated the registry in 112128ns I0110 17:12:59.306092 13912 registrar.cpp:416] Successfully recovered registrar I0110 17:12:59.306217 13916 master.cpp:1754] Recovered 0 agents from the registry (172B); allowing 10mins for agents to reregister I0110 17:12:59.306258 13919 hierarchical.cpp:216] Skipping recovery of hierarchical allocator: nothing to recover W0110 17:12:59.307780 13893 process.cpp:2829] Attempted to spawn already running process files@172.16.10.92:42320 I0110 17:12:59.308149 13893 containerizer.cpp:305] Using isolation { environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni } I0110 17:12:59.310348 13893 linux_launcher.cpp:144] Using /sys/fs/cgroup/freezer as the freezer hierarchy for t
[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739635#comment-16739635 ] Andrei Budnik commented on MESOS-7971: -- This is something different from previous ones. {code:java} E0110 17:13:09.326659 13916 master.cpp:8586] Failed to find the operation '' (uuid: 825f65eb-3ba1-4dfa-bdfa-8eb29194ace3) for an operator API call on agent ae22a9c8-0ef6-4f1e-b1eb-7b55f6e4508b-S0 {code} Full log: {code:java} [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove I0110 17:12:59.303460 13893 cluster.cpp:174] Creating default 'local' authorizer I0110 17:12:59.304430 13912 master.cpp:416] Master ae22a9c8-0ef6-4f1e-b1eb-7b55f6e4508b (ip-172-16-10-92.ec2.internal) started on 172.16.10.92:42320 I0110 17:12:59.304451 13912 master.cpp:419] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1000secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/PfFTwT/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_operator_event_stream_subscribers="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --publish_per_framework_metrics="true" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --roles="role1" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/PfFTwT/master" --zk_session_timeout="10secs" I0110 17:12:59.304585 13912 master.cpp:468] Master only allowing authenticated frameworks to register I0110 17:12:59.304595 13912 master.cpp:474] Master only allowing authenticated agents to register I0110 17:12:59.304603 13912 master.cpp:480] Master only allowing authenticated HTTP frameworks to register I0110 17:12:59.304615 13912 credentials.hpp:37] Loading credentials for authentication from '/tmp/PfFTwT/credentials' I0110 17:12:59.304684 13912 master.cpp:524] Using default 'crammd5' authenticator I0110 17:12:59.304744 13912 http.cpp:965] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0110 17:12:59.304831 13912 http.cpp:965] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0110 17:12:59.304889 13912 http.cpp:965] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0110 17:12:59.304941 13912 master.cpp:605] Authorization enabled W0110 17:12:59.304967 13912 master.cpp:668] The '--roles' flag is deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade notes for more information I0110 17:12:59.305047 13919 hierarchical.cpp:176] Initialized hierarchical allocator process I0110 17:12:59.305128 13918 whitelist_watcher.cpp:77] No whitelist given I0110 17:12:59.305600 13914 master.cpp:2085] Elected as the leading master! I0110 17:12:59.305622 13914 master.cpp:1640] Recovering from registrar I0110 17:12:59.305698 13913 registrar.cpp:339] Recovering registrar I0110 17:12:59.305853 13912 registrar.cpp:383] Successfully fetched the registry (0B) in 118016ns I0110 17:12:59.305899 13912 registrar.cpp:487] Applied 1 operations in 8238ns; attempting to update the registry I0110 17:12:59.306036 13912 registrar.cpp:544] Successfully updated the registry in 112128ns I0110 17:12:59.306092 13912 registrar.cpp:416] Successfully recovered registrar I0110 17:12:59.306217 13916 master.cpp:1754] Recovered 0 agents from the registry (172B); allowing 10mins for agents to reregister I0110 17:12:59.306258 13919 hierarchical.cpp:216] Skipping recovery of hierarchical allocator: nothing to recover W0110 17:12:59.307780 13893 process.cpp:2829] Attempted to spawn already running process files@172.16.10.92:42320 I0110 17:12:59.308149 13893 containerizer.cpp:305] Using isolation { environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni } I0110 17:12:59.310348 13893 linux_launcher.cpp:144] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher I0110 17:12:59.310752 13893 pro
[jira] [Assigned] (MESOS-9518) CNI_NETNS should not be set for orphan containers that do not have network namespace
[ https://issues.apache.org/jira/browse/MESOS-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu reassigned MESOS-9518: - Assignee: Jie Yu > CNI_NETNS should not be set for orphan containers that do not have network > namespace > > > Key: MESOS-9518 > URL: https://issues.apache.org/jira/browse/MESOS-9518 > Project: Mesos > Issue Type: Bug > Components: cni >Affects Versions: 1.4.2, 1.5.1, 1.6.1, 1.7.0 >Reporter: Jie Yu >Assignee: Jie Yu >Priority: Major > > We introduced a new agent flag in MESOS-9492 so that CNI configs can be > persisted across reboot. This is for some CNI plugins to be able to cleanup > IP allocated to the containers after a sudden reboot of the host (not all CNI > plugins need this). > It's important to unset `CNI_NETNS` environment variable after reboot when > invoking CNI plugin "DEL" command so that it conforms to the spec: > {noformat} > When CNI_NETNS and/or prevResult are not provided, the plugin should clean up > as many resources as possible (e.g. releasing IPAM allocations) and return a > successful response. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)