[jira] [Commented] (MESOS-9966) Agent crashes when trying to destroy orphaned nested container if root container is orphaned as well
[ https://issues.apache.org/jira/browse/MESOS-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931972#comment-16931972 ] Qian Zhang commented on MESOS-9966: --- [~nfnt] If it's false, then I think we will not hit [the code that you mentioned|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L2966] since we only do that when `–gc_non_executor_container_sandboxes` is true. > Agent crashes when trying to destroy orphaned nested container if root > container is orphaned as well > > > Key: MESOS-9966 > URL: https://issues.apache.org/jira/browse/MESOS-9966 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.7.3 >Reporter: Jan Schlicht >Assignee: Qian Zhang >Priority: Major > > Noticed an agent crash-looping when trying to recover. It recognized a > container and its nested container as orphaned. When trying to destroy the > nested container, the agent crashes. Probably when trying to [get the sandbox > path of the root > container|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L2966]. > {noformat} > 2019-09-09 05:04:26: I0909 05:04:26.382326 89950 linux_launcher.cpp:286] > Recovering Linux launcher > 2019-09-09 05:04:26: I0909 05:04:26.383162 89950 linux_launcher.cpp:331] Not > recovering cgroup mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos > 2019-09-09 05:04:26: I0909 05:04:26.383199 89950 linux_launcher.cpp:343] > Recovered container > a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97 > 2019-09-09 05:04:26: I0909 05:04:26.383216 89950 linux_launcher.cpp:331] Not > recovering cgroup > mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97/mesos > 2019-09-09 05:04:26: I0909 05:04:26.383229 89950 linux_launcher.cpp:343] > Recovered container 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.383237 89950 linux_launcher.cpp:343] > Recovered container a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.383249 89950 linux_launcher.cpp:343] > Recovered container > 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 > 2019-09-09 05:04:26: I0909 05:04:26.383260 89950 linux_launcher.cpp:331] Not > recovering cgroup mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos > 2019-09-09 05:04:26: I0909 05:04:26.383271 89950 linux_launcher.cpp:331] Not > recovering cgroup > mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436/mesos > 2019-09-09 05:04:26: I0909 05:04:26.383280 89950 linux_launcher.cpp:437] > 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 is > a known orphaned container > 2019-09-09 05:04:26: I0909 05:04:26.383289 89950 linux_launcher.cpp:437] > a127917b-96fe-4100-b73d-5f876ce9ffc1 is a known orphaned container > 2019-09-09 05:04:26: I0909 05:04:26.383296 89950 linux_launcher.cpp:437] > 2ee154e2-3cc4-420a-99fb-065e740f3091 is a known orphaned container > 2019-09-09 05:04:26: I0909 05:04:26.383304 89950 linux_launcher.cpp:437] > a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97 is > a known orphaned container > 2019-09-09 05:04:26: I0909 05:04:26.383414 89950 containerizer.cpp:1092] > Recovering isolators > 2019-09-09 05:04:26: I0909 05:04:26.385931 89977 memory.cpp:478] Started > listening for OOM events for container a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.386118 89977 memory.cpp:590] Started > listening on 'low' memory pressure events for container > a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.386152 89977 memory.cpp:590] Started > listening on 'medium' memory pressure events for container > a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.386175 89977 memory.cpp:590] Started > listening on 'critical' memory pressure events for container > a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.386227 89977 memory.cpp:478] Started > listening for OOM events for container 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.386248 89977 memory.cpp:590] Started > listening on 'low' memory pressure events for container > 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.386270 89977 memory.cpp:590] Started > listening on 'medium' memory pressure events for container > 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.386376 89977 memory.cpp:590] Started > listening on 'critical' memory pressure events for container > 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.386694
[jira] [Commented] (MESOS-9798) How to reduce compile time after had changed/improved source code?
[ https://issues.apache.org/jira/browse/MESOS-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931672#comment-16931672 ] Benjamin Bannier commented on MESOS-9798: - [~rchatsiri], you should take advantage of parallelized processing when invoking {{make}}, i.e., your step (3) above could be (assuming your dev machine has 12 cores) {noformat} $ make -j 12 {noformat} This assumes that your {{MAKEFLAGS}} environment variable does not already contain {{-j 12}} or similar. With that {{make}} would perform e.g., up to 12 parallel compilation processes; steps like linking (e.g., of {{libmesos}}) are still mostly sequential and become a bottleneck for highly parallelized builds (linking {{libmesos}} can take up to a minute depending on your hardware, used linker, and flags). > How to reduce compile time after had changed/improved source code? > -- > > Key: MESOS-9798 > URL: https://issues.apache.org/jira/browse/MESOS-9798 > Project: Mesos > Issue Type: Improvement > Components: cmake >Affects Versions: 1.8.0 > Environment: Linux firework-vm01 4.9.0-9-amd64 #1 SMP Debian > 4.9.168-1+deb9u2 (2019-05-13) x86_64 GNU/Linux >Reporter: chatsiri >Priority: Minor > Labels: newbie > > Hello all, > I'm have changed variables in src/ directory finished, but compiler > using long time to finished build steps. How can reduces compile time per > component or source directory? Such as an simple steps below > # I'm add new member function to class Docker on docker.hpp. This class > declares on file at docker directory. > # Compile source again from build directory. This directory create on the > base source code directory same src/ , bin/ and include/. > # Come to build path with > ## $cd build > ## $../configure --disable-python --disable-java --enable-debug > --enable-fast-install > ## $make > ## $sudo make install. > In steps No.3. Compiler used long time compiles source code. How we can > reduce compile time per source directory that we had changed its? -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (MESOS-9969) Agent crashes when trying to clean up volue
[ https://issues.apache.org/jira/browse/MESOS-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931645#comment-16931645 ] Andrei Budnik commented on MESOS-9969: -- Could you please provide steps to reproduce this bug? > Agent crashes when trying to clean up volue > --- > > Key: MESOS-9969 > URL: https://issues.apache.org/jira/browse/MESOS-9969 > Project: Mesos > Issue Type: Bug > Components: agent >Affects Versions: 1.8.2 >Reporter: Tomas Barton >Priority: Major > > {code} > Sep 17 13:49:26 w03 mesos-agent[21803]: I0917 13:49:26.081748 21828 > linux_launcher.cpp:650] Destroying cgroup > '/sys/fs/cgroup/systemd/mesos/370ed262-4041-4180-a7e1-9ea78070e3a6' > Sep 17 13:49:26 w03 mesos-agent[21803]: I0917 13:49:26.081876 21832 > containerizer.cpp:2907] Checkpointing termination state to nested container's > runtime directory > '/var/run/mesos/containers/8e3997e7-c53a-4043-9a7e-26a2e436a041/containers/ae0bdc6d-c738-4352-b5d4-7572182671d5/termination' > Sep 17 13:49:26 w03 mesos-agent[21803]: mesos-agent: > /pkg/src/mesos/3rdparty/stout/include/stout/option.hpp:120: T& > Option::get() & [with T = std::basic_string]: Assertion `isSome()' > failed. > Sep 17 13:49:26 w03 mesos-agent[21803]: *** Aborted at 1568728166 (unix time) > try "date -d @1568728166" if you are using GNU date *** > Sep 17 13:49:26 w03 mesos-agent[21803]: W0917 13:49:26.082281 21835 > disk.cpp:453] Ignoring cleanup for unknown container > a9ba6959-ea02-4543-b7d5-92a63940 > Sep 17 13:49:26 w03 mesos-agent[21803]: PC: @ 0x7f16a3867fff (unknown) > Sep 17 13:49:26 w03 mesos-agent[21803]: *** SIGABRT (@0x552b) received by PID > 21803 (TID 0x7f169e47d700) from PID 21803; stack trace: *** > Sep 17 13:49:26 w03 mesos-agent[21803]: E0917 13:49:26.082608 21835 > memory.cpp:501] Listening on OOM events failed for container > a9ba6959-ea02-4543-b7d5-92a63940: Event listener is terminating > Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3be50e0 (unknown) > Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3867fff (unknown) > Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a386942a (unknown) > Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3860e67 (unknown) > Sep 17 13:49:26 w03 mesos-agent[21803]: I0917 13:49:26.083741 21835 > linux.cpp:1074] Unmounting volume > '/var/lib/mesos/slave/slaves/04e596b7-f03d-4cba-bbbc-fa9e0aebb5d2-S17/frameworks/04e596b7-f03d-4cba-bbbc-fa9e0aebb5d2-0003/executors/es01__coordinator__8591ac8e-3d9d-45ac-bb68-bee379c8c4a4/runs/a9ba6959-ea02-4543-b7d5-92a63940/container-path' > for con > Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3860f12 (unknown) > Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a7654f13 > _ZNR6OptionISsE3getEv.part.152 > Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a7666b2f > mesos::internal::slave::MesosContainerizerProcess::__destroy() > Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a861cb41 > process::ProcessBase::consume() > Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a8633c9c > process::ProcessManager::resume() > Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a86398a6 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv > Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a43c6200 (unknown) > Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3bdb4a4 start_thread > Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a391dd0f (unknown) > Sep 17 13:49:26 w03 systemd[1]: dcos-mesos-slave.service: Main process > exited, code=killed, status=6/ABRT > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess
[ https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931509#comment-16931509 ] Abel S commented on MESOS-5342: --- Could someone provide updates on this issue/feature? > CPU pinning/binding support for CgroupsCpushareIsolatorProcess > -- > > Key: MESOS-5342 > URL: https://issues.apache.org/jira/browse/MESOS-5342 > Project: Mesos > Issue Type: Improvement > Components: containerization >Affects Versions: 0.28.1 >Reporter: Chris >Priority: Major > Labels: cgroups, cpu, cpu-usage, gpu, isolation, isolator, mentor > > The cgroups isolator currently lacks support for binding (also called > pinning) containers to a set of cores. The GNU/Linux kernel is known to make > sub-optimal core assignments for processes and threads. Poor assignments > impact program performance, specifically in terms of cache locality. > Applications requiring GPU resources can benefit from this feature by getting > access to cores closest to the GPU hardware, which reduces cpu-gpu copy > latency. > Most cluster management systems from the HPC community (SLURM) provide both > cgroup isolation and cpu binding. This feature would provide similar > capabilities. The current interest in supporting Intel's Cache Allocation > Technology, and the advent of Intel's Knights-series processors, will require > making choices about where container's are going to run on the mesos-agent's > processor(s) cores - this feature is a step toward developing a robust > solution. > The improvement in this JIRA ticket will handle hardware topology detection, > track container-to-core utilization in a histogram, and use a mathematical > optimization technique to select cores for container assignment based on > latency and the container-to-core utilization histogram. > For GPU tasks, the improvement will prioritize selection of cores based on > latency between the GPU and cores in an effort to minimize copy latency. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (MESOS-9970) Add Java example frameworks to CMake build.
Andrei Sekretenko created MESOS-9970: Summary: Add Java example frameworks to CMake build. Key: MESOS-9970 URL: https://issues.apache.org/jira/browse/MESOS-9970 Project: Mesos Issue Type: Improvement Reporter: Andrei Sekretenko Currently they are simply not built: https://github.com/apache/mesos/blob/master/src/examples/CMakeLists.txt As a result, CMake build fails tests based on these frameworks. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (MESOS-9969) Agent crashes when trying to clean up volue
Tomas Barton created MESOS-9969: --- Summary: Agent crashes when trying to clean up volue Key: MESOS-9969 URL: https://issues.apache.org/jira/browse/MESOS-9969 Project: Mesos Issue Type: Bug Components: agent Affects Versions: 1.8.2 Reporter: Tomas Barton {code} Sep 17 13:49:26 w03 mesos-agent[21803]: I0917 13:49:26.081748 21828 linux_launcher.cpp:650] Destroying cgroup '/sys/fs/cgroup/systemd/mesos/370ed262-4041-4180-a7e1-9ea78070e3a6' Sep 17 13:49:26 w03 mesos-agent[21803]: I0917 13:49:26.081876 21832 containerizer.cpp:2907] Checkpointing termination state to nested container's runtime directory '/var/run/mesos/containers/8e3997e7-c53a-4043-9a7e-26a2e436a041/containers/ae0bdc6d-c738-4352-b5d4-7572182671d5/termination' Sep 17 13:49:26 w03 mesos-agent[21803]: mesos-agent: /pkg/src/mesos/3rdparty/stout/include/stout/option.hpp:120: T& Option::get() & [with T = std::basic_string]: Assertion `isSome()' failed. Sep 17 13:49:26 w03 mesos-agent[21803]: *** Aborted at 1568728166 (unix time) try "date -d @1568728166" if you are using GNU date *** Sep 17 13:49:26 w03 mesos-agent[21803]: W0917 13:49:26.082281 21835 disk.cpp:453] Ignoring cleanup for unknown container a9ba6959-ea02-4543-b7d5-92a63940 Sep 17 13:49:26 w03 mesos-agent[21803]: PC: @ 0x7f16a3867fff (unknown) Sep 17 13:49:26 w03 mesos-agent[21803]: *** SIGABRT (@0x552b) received by PID 21803 (TID 0x7f169e47d700) from PID 21803; stack trace: *** Sep 17 13:49:26 w03 mesos-agent[21803]: E0917 13:49:26.082608 21835 memory.cpp:501] Listening on OOM events failed for container a9ba6959-ea02-4543-b7d5-92a63940: Event listener is terminating Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3be50e0 (unknown) Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3867fff (unknown) Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a386942a (unknown) Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3860e67 (unknown) Sep 17 13:49:26 w03 mesos-agent[21803]: I0917 13:49:26.083741 21835 linux.cpp:1074] Unmounting volume '/var/lib/mesos/slave/slaves/04e596b7-f03d-4cba-bbbc-fa9e0aebb5d2-S17/frameworks/04e596b7-f03d-4cba-bbbc-fa9e0aebb5d2-0003/executors/es01__coordinator__8591ac8e-3d9d-45ac-bb68-bee379c8c4a4/runs/a9ba6959-ea02-4543-b7d5-92a63940/container-path' for con Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3860f12 (unknown) Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a7654f13 _ZNR6OptionISsE3getEv.part.152 Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a7666b2f mesos::internal::slave::MesosContainerizerProcess::__destroy() Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a861cb41 process::ProcessBase::consume() Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a8633c9c process::ProcessManager::resume() Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a86398a6 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a43c6200 (unknown) Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3bdb4a4 start_thread Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a391dd0f (unknown) Sep 17 13:49:26 w03 systemd[1]: dcos-mesos-slave.service: Main process exited, code=killed, status=6/ABRT {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (MESOS-9879) Create a unit test ensuring that a client certificate requests are properly ignored
[ https://issues.apache.org/jira/browse/MESOS-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931390#comment-16931390 ] Benno Evers commented on MESOS-9879: Given that the behaviour described here is mandated by the TLS spec and testing it would require implementing a custom, buggy TLS implementation, I think it's safe to say the costs outweigh the benefits here. Closing this as "Wont fix". > Create a unit test ensuring that a client certificate requests are properly > ignored > --- > > Key: MESOS-9879 > URL: https://issues.apache.org/jira/browse/MESOS-9879 > Project: Mesos > Issue Type: Improvement >Reporter: Benno Evers >Priority: Major > Labels: libprocess, ssl, tls > > When a TLS server sends a Client Certificate Request as part of the handshake > and the client does not have a certificate available, the TLS specification > mandates that the client shall attempt to continue the connection attempt > sending a zero-length certificate. > We should write a unit test verifying libprocess handles this correctly when > acting as a client, although it's not completely clear how this might be > implemented. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (MESOS-9966) Agent crashes when trying to destroy orphaned nested container if root container is orphaned as well
[ https://issues.apache.org/jira/browse/MESOS-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931187#comment-16931187 ] Jan Schlicht commented on MESOS-9966: - The flag wasn't set so it's at its default value which is {{false}}. > Agent crashes when trying to destroy orphaned nested container if root > container is orphaned as well > > > Key: MESOS-9966 > URL: https://issues.apache.org/jira/browse/MESOS-9966 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.7.3 >Reporter: Jan Schlicht >Assignee: Qian Zhang >Priority: Major > > Noticed an agent crash-looping when trying to recover. It recognized a > container and its nested container as orphaned. When trying to destroy the > nested container, the agent crashes. Probably when trying to [get the sandbox > path of the root > container|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L2966]. > {noformat} > 2019-09-09 05:04:26: I0909 05:04:26.382326 89950 linux_launcher.cpp:286] > Recovering Linux launcher > 2019-09-09 05:04:26: I0909 05:04:26.383162 89950 linux_launcher.cpp:331] Not > recovering cgroup mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos > 2019-09-09 05:04:26: I0909 05:04:26.383199 89950 linux_launcher.cpp:343] > Recovered container > a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97 > 2019-09-09 05:04:26: I0909 05:04:26.383216 89950 linux_launcher.cpp:331] Not > recovering cgroup > mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97/mesos > 2019-09-09 05:04:26: I0909 05:04:26.383229 89950 linux_launcher.cpp:343] > Recovered container 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.383237 89950 linux_launcher.cpp:343] > Recovered container a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.383249 89950 linux_launcher.cpp:343] > Recovered container > 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 > 2019-09-09 05:04:26: I0909 05:04:26.383260 89950 linux_launcher.cpp:331] Not > recovering cgroup mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos > 2019-09-09 05:04:26: I0909 05:04:26.383271 89950 linux_launcher.cpp:331] Not > recovering cgroup > mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436/mesos > 2019-09-09 05:04:26: I0909 05:04:26.383280 89950 linux_launcher.cpp:437] > 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 is > a known orphaned container > 2019-09-09 05:04:26: I0909 05:04:26.383289 89950 linux_launcher.cpp:437] > a127917b-96fe-4100-b73d-5f876ce9ffc1 is a known orphaned container > 2019-09-09 05:04:26: I0909 05:04:26.383296 89950 linux_launcher.cpp:437] > 2ee154e2-3cc4-420a-99fb-065e740f3091 is a known orphaned container > 2019-09-09 05:04:26: I0909 05:04:26.383304 89950 linux_launcher.cpp:437] > a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97 is > a known orphaned container > 2019-09-09 05:04:26: I0909 05:04:26.383414 89950 containerizer.cpp:1092] > Recovering isolators > 2019-09-09 05:04:26: I0909 05:04:26.385931 89977 memory.cpp:478] Started > listening for OOM events for container a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.386118 89977 memory.cpp:590] Started > listening on 'low' memory pressure events for container > a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.386152 89977 memory.cpp:590] Started > listening on 'medium' memory pressure events for container > a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.386175 89977 memory.cpp:590] Started > listening on 'critical' memory pressure events for container > a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.386227 89977 memory.cpp:478] Started > listening for OOM events for container 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.386248 89977 memory.cpp:590] Started > listening on 'low' memory pressure events for container > 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.386270 89977 memory.cpp:590] Started > listening on 'medium' memory pressure events for container > 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.386376 89977 memory.cpp:590] Started > listening on 'critical' memory pressure events for container > 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.386694 89921 containerizer.cpp:1131] > Recovering provisioner > 2019-09-09 05:04:26: I0909 05:04:26.388226 90010 metadata_manager.cpp:286] > Successfully loaded 64 Docker images > 2019-09-09
[jira] [Commented] (MESOS-9966) Agent crashes when trying to destroy orphaned nested container if root container is orphaned as well
[ https://issues.apache.org/jira/browse/MESOS-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931182#comment-16931182 ] Qian Zhang commented on MESOS-9966: --- [~nfnt] Can you please let me know if the agent flag `--gc_non_executor_container_sandboxes` was set to true or false when this issue occurred? > Agent crashes when trying to destroy orphaned nested container if root > container is orphaned as well > > > Key: MESOS-9966 > URL: https://issues.apache.org/jira/browse/MESOS-9966 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.7.3 >Reporter: Jan Schlicht >Assignee: Qian Zhang >Priority: Major > > Noticed an agent crash-looping when trying to recover. It recognized a > container and its nested container as orphaned. When trying to destroy the > nested container, the agent crashes. Probably when trying to [get the sandbox > path of the root > container|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L2966]. > {noformat} > 2019-09-09 05:04:26: I0909 05:04:26.382326 89950 linux_launcher.cpp:286] > Recovering Linux launcher > 2019-09-09 05:04:26: I0909 05:04:26.383162 89950 linux_launcher.cpp:331] Not > recovering cgroup mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos > 2019-09-09 05:04:26: I0909 05:04:26.383199 89950 linux_launcher.cpp:343] > Recovered container > a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97 > 2019-09-09 05:04:26: I0909 05:04:26.383216 89950 linux_launcher.cpp:331] Not > recovering cgroup > mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97/mesos > 2019-09-09 05:04:26: I0909 05:04:26.383229 89950 linux_launcher.cpp:343] > Recovered container 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.383237 89950 linux_launcher.cpp:343] > Recovered container a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.383249 89950 linux_launcher.cpp:343] > Recovered container > 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 > 2019-09-09 05:04:26: I0909 05:04:26.383260 89950 linux_launcher.cpp:331] Not > recovering cgroup mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos > 2019-09-09 05:04:26: I0909 05:04:26.383271 89950 linux_launcher.cpp:331] Not > recovering cgroup > mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436/mesos > 2019-09-09 05:04:26: I0909 05:04:26.383280 89950 linux_launcher.cpp:437] > 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 is > a known orphaned container > 2019-09-09 05:04:26: I0909 05:04:26.383289 89950 linux_launcher.cpp:437] > a127917b-96fe-4100-b73d-5f876ce9ffc1 is a known orphaned container > 2019-09-09 05:04:26: I0909 05:04:26.383296 89950 linux_launcher.cpp:437] > 2ee154e2-3cc4-420a-99fb-065e740f3091 is a known orphaned container > 2019-09-09 05:04:26: I0909 05:04:26.383304 89950 linux_launcher.cpp:437] > a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97 is > a known orphaned container > 2019-09-09 05:04:26: I0909 05:04:26.383414 89950 containerizer.cpp:1092] > Recovering isolators > 2019-09-09 05:04:26: I0909 05:04:26.385931 89977 memory.cpp:478] Started > listening for OOM events for container a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.386118 89977 memory.cpp:590] Started > listening on 'low' memory pressure events for container > a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.386152 89977 memory.cpp:590] Started > listening on 'medium' memory pressure events for container > a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.386175 89977 memory.cpp:590] Started > listening on 'critical' memory pressure events for container > a127917b-96fe-4100-b73d-5f876ce9ffc1 > 2019-09-09 05:04:26: I0909 05:04:26.386227 89977 memory.cpp:478] Started > listening for OOM events for container 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.386248 89977 memory.cpp:590] Started > listening on 'low' memory pressure events for container > 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.386270 89977 memory.cpp:590] Started > listening on 'medium' memory pressure events for container > 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.386376 89977 memory.cpp:590] Started > listening on 'critical' memory pressure events for container > 2ee154e2-3cc4-420a-99fb-065e740f3091 > 2019-09-09 05:04:26: I0909 05:04:26.386694 89921 containerizer.cpp:1131] > Recovering provisioner > 2019-09-09 05:04:26: I0909 05:04:26.388226 90010
[jira] [Created] (MESOS-9968) WWWAuthenticate header parsing fails when commas are in (quoted) realm
Jan Schlicht created MESOS-9968: --- Summary: WWWAuthenticate header parsing fails when commas are in (quoted) realm Key: MESOS-9968 URL: https://issues.apache.org/jira/browse/MESOS-9968 Project: Mesos Issue Type: Bug Components: HTTP API, libprocess Reporter: Jan Schlicht This was discovered when trying to launch the {{[nvcr.io/nvidia/tensorflow:19.08-py3|http://nvcr.io/nvidia/tensorflow:19.08-py3]}} image using the Mesos containerizer. This launch fails with {noformat} Failed to launch container: Failed to get WWW-Authenticate header: Unexpected auth-param format: 'realm="https://nvcr.io/proxy_auth?scope=repository:nvidia/tensorflow:pull' in 'realm="https://nvcr.io/proxy_auth?scope=repository:nvidia/tensorflow:pull,push;' {noformat} This is because the [header tokenization in libprocess|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/http.cpp#L640] can't handle commas in quoted realm values. -- This message was sent by Atlassian Jira (v8.3.2#803003)