[jira] [Commented] (MESOS-9966) Agent crashes when trying to destroy orphaned nested container if root container is orphaned as well

2019-09-17 Thread Qian Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931972#comment-16931972
 ] 

Qian Zhang commented on MESOS-9966:
---

[~nfnt] If it's false, then I think we will not hit [the code that you 
mentioned|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L2966]
 since we only do that when `–gc_non_executor_container_sandboxes` is true.

> Agent crashes when trying to destroy orphaned nested container if root 
> container is orphaned as well
> 
>
> Key: MESOS-9966
> URL: https://issues.apache.org/jira/browse/MESOS-9966
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.7.3
>Reporter: Jan Schlicht
>Assignee: Qian Zhang
>Priority: Major
>
> Noticed an agent crash-looping when trying to recover. It recognized a 
> container and its nested container as orphaned. When trying to destroy the 
> nested container, the agent crashes. Probably when trying to [get the sandbox 
> path of the root 
> container|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L2966].
> {noformat}
> 2019-09-09 05:04:26: I0909 05:04:26.382326 89950 linux_launcher.cpp:286] 
> Recovering Linux launcher
> 2019-09-09 05:04:26: I0909 05:04:26.383162 89950 linux_launcher.cpp:331] Not 
> recovering cgroup mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383199 89950 linux_launcher.cpp:343] 
> Recovered container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97
> 2019-09-09 05:04:26: I0909 05:04:26.383216 89950 linux_launcher.cpp:331] Not 
> recovering cgroup 
> mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383229 89950 linux_launcher.cpp:343] 
> Recovered container 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.383237 89950 linux_launcher.cpp:343] 
> Recovered container a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.383249 89950 linux_launcher.cpp:343] 
> Recovered container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436
> 2019-09-09 05:04:26: I0909 05:04:26.383260 89950 linux_launcher.cpp:331] Not 
> recovering cgroup mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383271 89950 linux_launcher.cpp:331] Not 
> recovering cgroup 
> mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383280 89950 linux_launcher.cpp:437] 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 is 
> a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383289 89950 linux_launcher.cpp:437] 
> a127917b-96fe-4100-b73d-5f876ce9ffc1 is a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383296 89950 linux_launcher.cpp:437] 
> 2ee154e2-3cc4-420a-99fb-065e740f3091 is a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383304 89950 linux_launcher.cpp:437] 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97 is 
> a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383414 89950 containerizer.cpp:1092] 
> Recovering isolators
> 2019-09-09 05:04:26: I0909 05:04:26.385931 89977 memory.cpp:478] Started 
> listening for OOM events for container a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386118 89977 memory.cpp:590] Started 
> listening on 'low' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386152 89977 memory.cpp:590] Started 
> listening on 'medium' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386175 89977 memory.cpp:590] Started 
> listening on 'critical' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386227 89977 memory.cpp:478] Started 
> listening for OOM events for container 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386248 89977 memory.cpp:590] Started 
> listening on 'low' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386270 89977 memory.cpp:590] Started 
> listening on 'medium' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386376 89977 memory.cpp:590] Started 
> listening on 'critical' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386694 

[jira] [Commented] (MESOS-9798) How to reduce compile time after had changed/improved source code?

2019-09-17 Thread Benjamin Bannier (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931672#comment-16931672
 ] 

Benjamin Bannier commented on MESOS-9798:
-

[~rchatsiri], you should take advantage of parallelized processing when 
invoking {{make}}, i.e., your step (3) above could be (assuming your dev 
machine has 12 cores)
{noformat}
$ make -j 12
{noformat}

This assumes that your {{MAKEFLAGS}} environment variable does not already 
contain {{-j 12}} or similar. With that {{make}} would perform e.g., up to 12 
parallel compilation processes; steps like linking (e.g., of {{libmesos}}) are 
still mostly sequential and become a bottleneck for highly parallelized builds 
(linking {{libmesos}} can take up to a minute depending on your hardware, used 
linker, and flags).

> How to reduce compile time after had changed/improved source code?
> --
>
> Key: MESOS-9798
> URL: https://issues.apache.org/jira/browse/MESOS-9798
> Project: Mesos
>  Issue Type: Improvement
>  Components: cmake
>Affects Versions: 1.8.0
> Environment: Linux firework-vm01 4.9.0-9-amd64 #1 SMP Debian 
> 4.9.168-1+deb9u2 (2019-05-13) x86_64 GNU/Linux
>Reporter: chatsiri
>Priority: Minor
>  Labels: newbie
>
> Hello all, 
>      I'm have changed variables in src/ directory finished, but compiler 
> using long time to finished build steps. How can reduces compile time per 
> component or source directory? Such as an simple steps below
>  # I'm add new member function to class Docker on docker.hpp. This class 
> declares on file at docker directory.
>  # Compile source again from build directory. This directory create on the 
> base source code directory same src/ , bin/ and include/.
>  # Come to build path with 
>  ## $cd build
>  ## $../configure --disable-python --disable-java --enable-debug 
> --enable-fast-install
>  ## $make
>  ## $sudo make install.   
> In steps No.3. Compiler used long time compiles source code. How we can 
> reduce compile time per source directory that we had changed its?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (MESOS-9969) Agent crashes when trying to clean up volue

2019-09-17 Thread Andrei Budnik (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931645#comment-16931645
 ] 

Andrei Budnik commented on MESOS-9969:
--

Could you please provide steps to reproduce this bug?

> Agent crashes when trying to clean up volue
> ---
>
> Key: MESOS-9969
> URL: https://issues.apache.org/jira/browse/MESOS-9969
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.8.2
>Reporter: Tomas Barton
>Priority: Major
>
> {code}
> Sep 17 13:49:26 w03 mesos-agent[21803]: I0917 13:49:26.081748 21828 
> linux_launcher.cpp:650] Destroying cgroup 
> '/sys/fs/cgroup/systemd/mesos/370ed262-4041-4180-a7e1-9ea78070e3a6'
> Sep 17 13:49:26 w03 mesos-agent[21803]: I0917 13:49:26.081876 21832 
> containerizer.cpp:2907] Checkpointing termination state to nested container's 
> runtime directory 
> '/var/run/mesos/containers/8e3997e7-c53a-4043-9a7e-26a2e436a041/containers/ae0bdc6d-c738-4352-b5d4-7572182671d5/termination'
> Sep 17 13:49:26 w03 mesos-agent[21803]: mesos-agent: 
> /pkg/src/mesos/3rdparty/stout/include/stout/option.hpp:120: T& 
> Option::get() & [with T = std::basic_string]: Assertion `isSome()' 
> failed.
> Sep 17 13:49:26 w03 mesos-agent[21803]: *** Aborted at 1568728166 (unix time) 
> try "date -d @1568728166" if you are using GNU date ***
> Sep 17 13:49:26 w03 mesos-agent[21803]: W0917 13:49:26.082281 21835 
> disk.cpp:453] Ignoring cleanup for unknown container 
> a9ba6959-ea02-4543-b7d5-92a63940
> Sep 17 13:49:26 w03 mesos-agent[21803]: PC: @ 0x7f16a3867fff (unknown)
> Sep 17 13:49:26 w03 mesos-agent[21803]: *** SIGABRT (@0x552b) received by PID 
> 21803 (TID 0x7f169e47d700) from PID 21803; stack trace: ***
> Sep 17 13:49:26 w03 mesos-agent[21803]: E0917 13:49:26.082608 21835 
> memory.cpp:501] Listening on OOM events failed for container 
> a9ba6959-ea02-4543-b7d5-92a63940: Event listener is terminating
> Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3be50e0 (unknown)
> Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3867fff (unknown)
> Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a386942a (unknown)
> Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3860e67 (unknown)
> Sep 17 13:49:26 w03 mesos-agent[21803]: I0917 13:49:26.083741 21835 
> linux.cpp:1074] Unmounting volume 
> '/var/lib/mesos/slave/slaves/04e596b7-f03d-4cba-bbbc-fa9e0aebb5d2-S17/frameworks/04e596b7-f03d-4cba-bbbc-fa9e0aebb5d2-0003/executors/es01__coordinator__8591ac8e-3d9d-45ac-bb68-bee379c8c4a4/runs/a9ba6959-ea02-4543-b7d5-92a63940/container-path'
>  for con
> Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3860f12 (unknown)
> Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a7654f13 
> _ZNR6OptionISsE3getEv.part.152
> Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a7666b2f 
> mesos::internal::slave::MesosContainerizerProcess::__destroy()
> Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a861cb41 
> process::ProcessBase::consume()
> Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a8633c9c 
> process::ProcessManager::resume()
> Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a86398a6 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a43c6200 (unknown)
> Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3bdb4a4 start_thread
> Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a391dd0f (unknown)
> Sep 17 13:49:26 w03 systemd[1]: dcos-mesos-slave.service: Main process 
> exited, code=killed, status=6/ABRT
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess

2019-09-17 Thread Abel S (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931509#comment-16931509
 ] 

Abel S commented on MESOS-5342:
---

Could someone provide updates on this issue/feature?

> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --
>
> Key: MESOS-5342
> URL: https://issues.apache.org/jira/browse/MESOS-5342
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Affects Versions: 0.28.1
>Reporter: Chris
>Priority: Major
>  Labels: cgroups, cpu, cpu-usage, gpu, isolation, isolator, mentor
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance, specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology, and the advent of Intel's Knights-series processors, will require 
> making choices about where container's are going to run on the mesos-agent's 
> processor(s) cores - this feature is a step toward developing a robust 
> solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (MESOS-9970) Add Java example frameworks to CMake build.

2019-09-17 Thread Andrei Sekretenko (Jira)
Andrei Sekretenko created MESOS-9970:


 Summary: Add Java example frameworks to CMake build.
 Key: MESOS-9970
 URL: https://issues.apache.org/jira/browse/MESOS-9970
 Project: Mesos
  Issue Type: Improvement
Reporter: Andrei Sekretenko


Currently they are simply not built: 
https://github.com/apache/mesos/blob/master/src/examples/CMakeLists.txt

 As a result, CMake build fails tests based on these frameworks.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (MESOS-9969) Agent crashes when trying to clean up volue

2019-09-17 Thread Tomas Barton (Jira)
Tomas Barton created MESOS-9969:
---

 Summary: Agent crashes when trying to clean up volue
 Key: MESOS-9969
 URL: https://issues.apache.org/jira/browse/MESOS-9969
 Project: Mesos
  Issue Type: Bug
  Components: agent
Affects Versions: 1.8.2
Reporter: Tomas Barton


{code}
Sep 17 13:49:26 w03 mesos-agent[21803]: I0917 13:49:26.081748 21828 
linux_launcher.cpp:650] Destroying cgroup 
'/sys/fs/cgroup/systemd/mesos/370ed262-4041-4180-a7e1-9ea78070e3a6'
Sep 17 13:49:26 w03 mesos-agent[21803]: I0917 13:49:26.081876 21832 
containerizer.cpp:2907] Checkpointing termination state to nested container's 
runtime directory 
'/var/run/mesos/containers/8e3997e7-c53a-4043-9a7e-26a2e436a041/containers/ae0bdc6d-c738-4352-b5d4-7572182671d5/termination'
Sep 17 13:49:26 w03 mesos-agent[21803]: mesos-agent: 
/pkg/src/mesos/3rdparty/stout/include/stout/option.hpp:120: T& Option::get() 
& [with T = std::basic_string]: Assertion `isSome()' failed.
Sep 17 13:49:26 w03 mesos-agent[21803]: *** Aborted at 1568728166 (unix time) 
try "date -d @1568728166" if you are using GNU date ***
Sep 17 13:49:26 w03 mesos-agent[21803]: W0917 13:49:26.082281 21835 
disk.cpp:453] Ignoring cleanup for unknown container 
a9ba6959-ea02-4543-b7d5-92a63940
Sep 17 13:49:26 w03 mesos-agent[21803]: PC: @ 0x7f16a3867fff (unknown)
Sep 17 13:49:26 w03 mesos-agent[21803]: *** SIGABRT (@0x552b) received by PID 
21803 (TID 0x7f169e47d700) from PID 21803; stack trace: ***
Sep 17 13:49:26 w03 mesos-agent[21803]: E0917 13:49:26.082608 21835 
memory.cpp:501] Listening on OOM events failed for container 
a9ba6959-ea02-4543-b7d5-92a63940: Event listener is terminating
Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3be50e0 (unknown)
Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3867fff (unknown)
Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a386942a (unknown)
Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3860e67 (unknown)
Sep 17 13:49:26 w03 mesos-agent[21803]: I0917 13:49:26.083741 21835 
linux.cpp:1074] Unmounting volume 
'/var/lib/mesos/slave/slaves/04e596b7-f03d-4cba-bbbc-fa9e0aebb5d2-S17/frameworks/04e596b7-f03d-4cba-bbbc-fa9e0aebb5d2-0003/executors/es01__coordinator__8591ac8e-3d9d-45ac-bb68-bee379c8c4a4/runs/a9ba6959-ea02-4543-b7d5-92a63940/container-path'
 for con
Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3860f12 (unknown)
Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a7654f13 
_ZNR6OptionISsE3getEv.part.152
Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a7666b2f 
mesos::internal::slave::MesosContainerizerProcess::__destroy()
Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a861cb41 
process::ProcessBase::consume()
Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a8633c9c 
process::ProcessManager::resume()
Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a86398a6 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a43c6200 (unknown)
Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a3bdb4a4 start_thread
Sep 17 13:49:26 w03 mesos-agent[21803]: @ 0x7f16a391dd0f (unknown)
Sep 17 13:49:26 w03 systemd[1]: dcos-mesos-slave.service: Main process exited, 
code=killed, status=6/ABRT

{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (MESOS-9879) Create a unit test ensuring that a client certificate requests are properly ignored

2019-09-17 Thread Benno Evers (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931390#comment-16931390
 ] 

Benno Evers commented on MESOS-9879:


Given that the behaviour described here is mandated by the TLS spec and testing 
it would require implementing a custom, buggy TLS implementation, I think it's 
safe to say the costs outweigh the benefits here. Closing this as "Wont fix".

> Create a unit test ensuring that a client certificate requests are properly 
> ignored
> ---
>
> Key: MESOS-9879
> URL: https://issues.apache.org/jira/browse/MESOS-9879
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benno Evers
>Priority: Major
>  Labels: libprocess, ssl, tls
>
> When a TLS server sends a Client Certificate Request as part of the handshake 
> and the client does not have a certificate available, the TLS specification 
> mandates that the client shall attempt to continue the connection attempt 
> sending a zero-length certificate.
> We should write a unit test verifying libprocess handles this correctly when 
> acting as a client, although it's not completely clear how this might be 
> implemented.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (MESOS-9966) Agent crashes when trying to destroy orphaned nested container if root container is orphaned as well

2019-09-17 Thread Jan Schlicht (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931187#comment-16931187
 ] 

Jan Schlicht commented on MESOS-9966:
-

The flag wasn't set so it's at its default value which is {{false}}.

> Agent crashes when trying to destroy orphaned nested container if root 
> container is orphaned as well
> 
>
> Key: MESOS-9966
> URL: https://issues.apache.org/jira/browse/MESOS-9966
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.7.3
>Reporter: Jan Schlicht
>Assignee: Qian Zhang
>Priority: Major
>
> Noticed an agent crash-looping when trying to recover. It recognized a 
> container and its nested container as orphaned. When trying to destroy the 
> nested container, the agent crashes. Probably when trying to [get the sandbox 
> path of the root 
> container|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L2966].
> {noformat}
> 2019-09-09 05:04:26: I0909 05:04:26.382326 89950 linux_launcher.cpp:286] 
> Recovering Linux launcher
> 2019-09-09 05:04:26: I0909 05:04:26.383162 89950 linux_launcher.cpp:331] Not 
> recovering cgroup mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383199 89950 linux_launcher.cpp:343] 
> Recovered container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97
> 2019-09-09 05:04:26: I0909 05:04:26.383216 89950 linux_launcher.cpp:331] Not 
> recovering cgroup 
> mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383229 89950 linux_launcher.cpp:343] 
> Recovered container 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.383237 89950 linux_launcher.cpp:343] 
> Recovered container a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.383249 89950 linux_launcher.cpp:343] 
> Recovered container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436
> 2019-09-09 05:04:26: I0909 05:04:26.383260 89950 linux_launcher.cpp:331] Not 
> recovering cgroup mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383271 89950 linux_launcher.cpp:331] Not 
> recovering cgroup 
> mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383280 89950 linux_launcher.cpp:437] 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 is 
> a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383289 89950 linux_launcher.cpp:437] 
> a127917b-96fe-4100-b73d-5f876ce9ffc1 is a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383296 89950 linux_launcher.cpp:437] 
> 2ee154e2-3cc4-420a-99fb-065e740f3091 is a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383304 89950 linux_launcher.cpp:437] 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97 is 
> a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383414 89950 containerizer.cpp:1092] 
> Recovering isolators
> 2019-09-09 05:04:26: I0909 05:04:26.385931 89977 memory.cpp:478] Started 
> listening for OOM events for container a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386118 89977 memory.cpp:590] Started 
> listening on 'low' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386152 89977 memory.cpp:590] Started 
> listening on 'medium' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386175 89977 memory.cpp:590] Started 
> listening on 'critical' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386227 89977 memory.cpp:478] Started 
> listening for OOM events for container 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386248 89977 memory.cpp:590] Started 
> listening on 'low' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386270 89977 memory.cpp:590] Started 
> listening on 'medium' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386376 89977 memory.cpp:590] Started 
> listening on 'critical' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386694 89921 containerizer.cpp:1131] 
> Recovering provisioner
> 2019-09-09 05:04:26: I0909 05:04:26.388226 90010 metadata_manager.cpp:286] 
> Successfully loaded 64 Docker images
> 2019-09-09 

[jira] [Commented] (MESOS-9966) Agent crashes when trying to destroy orphaned nested container if root container is orphaned as well

2019-09-17 Thread Qian Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931182#comment-16931182
 ] 

Qian Zhang commented on MESOS-9966:
---

[~nfnt] Can you please let me know if the agent flag 
`--gc_non_executor_container_sandboxes` was set to true or false when this 
issue occurred?

> Agent crashes when trying to destroy orphaned nested container if root 
> container is orphaned as well
> 
>
> Key: MESOS-9966
> URL: https://issues.apache.org/jira/browse/MESOS-9966
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.7.3
>Reporter: Jan Schlicht
>Assignee: Qian Zhang
>Priority: Major
>
> Noticed an agent crash-looping when trying to recover. It recognized a 
> container and its nested container as orphaned. When trying to destroy the 
> nested container, the agent crashes. Probably when trying to [get the sandbox 
> path of the root 
> container|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L2966].
> {noformat}
> 2019-09-09 05:04:26: I0909 05:04:26.382326 89950 linux_launcher.cpp:286] 
> Recovering Linux launcher
> 2019-09-09 05:04:26: I0909 05:04:26.383162 89950 linux_launcher.cpp:331] Not 
> recovering cgroup mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383199 89950 linux_launcher.cpp:343] 
> Recovered container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97
> 2019-09-09 05:04:26: I0909 05:04:26.383216 89950 linux_launcher.cpp:331] Not 
> recovering cgroup 
> mesos/a127917b-96fe-4100-b73d-5f876ce9ffc1/mesos/9783e2bb-7c2e-4930-9d39-4225bb6f1b97/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383229 89950 linux_launcher.cpp:343] 
> Recovered container 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.383237 89950 linux_launcher.cpp:343] 
> Recovered container a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.383249 89950 linux_launcher.cpp:343] 
> Recovered container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436
> 2019-09-09 05:04:26: I0909 05:04:26.383260 89950 linux_launcher.cpp:331] Not 
> recovering cgroup mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383271 89950 linux_launcher.cpp:331] Not 
> recovering cgroup 
> mesos/2ee154e2-3cc4-420a-99fb-065e740f3091/mesos/49fe2bf9-17af-415f-92b6-92a4db619436/mesos
> 2019-09-09 05:04:26: I0909 05:04:26.383280 89950 linux_launcher.cpp:437] 
> 2ee154e2-3cc4-420a-99fb-065e740f3091.49fe2bf9-17af-415f-92b6-92a4db619436 is 
> a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383289 89950 linux_launcher.cpp:437] 
> a127917b-96fe-4100-b73d-5f876ce9ffc1 is a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383296 89950 linux_launcher.cpp:437] 
> 2ee154e2-3cc4-420a-99fb-065e740f3091 is a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383304 89950 linux_launcher.cpp:437] 
> a127917b-96fe-4100-b73d-5f876ce9ffc1.9783e2bb-7c2e-4930-9d39-4225bb6f1b97 is 
> a known orphaned container
> 2019-09-09 05:04:26: I0909 05:04:26.383414 89950 containerizer.cpp:1092] 
> Recovering isolators
> 2019-09-09 05:04:26: I0909 05:04:26.385931 89977 memory.cpp:478] Started 
> listening for OOM events for container a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386118 89977 memory.cpp:590] Started 
> listening on 'low' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386152 89977 memory.cpp:590] Started 
> listening on 'medium' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386175 89977 memory.cpp:590] Started 
> listening on 'critical' memory pressure events for container 
> a127917b-96fe-4100-b73d-5f876ce9ffc1
> 2019-09-09 05:04:26: I0909 05:04:26.386227 89977 memory.cpp:478] Started 
> listening for OOM events for container 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386248 89977 memory.cpp:590] Started 
> listening on 'low' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386270 89977 memory.cpp:590] Started 
> listening on 'medium' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386376 89977 memory.cpp:590] Started 
> listening on 'critical' memory pressure events for container 
> 2ee154e2-3cc4-420a-99fb-065e740f3091
> 2019-09-09 05:04:26: I0909 05:04:26.386694 89921 containerizer.cpp:1131] 
> Recovering provisioner
> 2019-09-09 05:04:26: I0909 05:04:26.388226 90010 

[jira] [Created] (MESOS-9968) WWWAuthenticate header parsing fails when commas are in (quoted) realm

2019-09-17 Thread Jan Schlicht (Jira)
Jan Schlicht created MESOS-9968:
---

 Summary: WWWAuthenticate header parsing fails when commas are in 
(quoted) realm
 Key: MESOS-9968
 URL: https://issues.apache.org/jira/browse/MESOS-9968
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API, libprocess
Reporter: Jan Schlicht


This was discovered when trying to launch the 
{{[nvcr.io/nvidia/tensorflow:19.08-py3|http://nvcr.io/nvidia/tensorflow:19.08-py3]}}
 image using the Mesos containerizer. This launch fails with
{noformat}
Failed to launch container: Failed to get WWW-Authenticate header: Unexpected 
auth-param format: 
'realm="https://nvcr.io/proxy_auth?scope=repository:nvidia/tensorflow:pull' in 
'realm="https://nvcr.io/proxy_auth?scope=repository:nvidia/tensorflow:pull,push;'
{noformat}
This is because the [header tokenization in 
libprocess|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/http.cpp#L640]
 can't handle commas in quoted realm values.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)