[jira] [Commented] (MESOS-6918) Prometheus exporter endpoints for metrics

2017-10-05 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194161#comment-16194161
 ] 

Yan Xu commented on MESOS-6918:
---

[~bmahler] let's chat about the reviews? [~jpe...@apache.org] and I have 
already discussed this offline and I have added comments to the design doc and 
the reviews. Here's the summary:

- I am not convinced about the newly introduced {{enum Semantics \{COUNTER, 
GAUGE\}}}. We already have metric *types* that are called {{Counter}} and 
{{Gauge}} and I think people could be confused about Counter the semantics and 
Counter the type, for example.
-- I understand that the semantics is supposed to help express:
bq. {{Timer}}'s value should be cumulative / monotonically increasing
(because it's more useful that way, as explained in the design doc) but this 
enum seems to try to suggest that all metric types (potentially future ones as 
well) can and should be classified into one of the two buckets. But are we sure 
about this is the right/only criterion? (The examples cited in the design doc 
don't consistently define this and none defines it as "semantics") Could there 
be other dimensions / features to classify metrics? To me 
{{s/Semantics/Monotonicity/}} would have been clearer but I am not sure about 
the usefulness of that either.
-- The use of this enum right now is just to pass the metric type info down to 
the Prometheus formatter. We can just define {{enum Type \{COUNTER, GAUGE, 
TIMER\}}} and pass it down.
- I hope we confine the Prometheus logic in a 
`metrics/formatters/prometheus.hpp|cpp` file and keep the {{MetricsProcess}} 
logic generic.
- I think we can keep the meaning the existing field {{Timer.value()}} (the 
last sampled value). We can add a new field {{sum}} in the {{TimeSeries}} 
alongside the new {{total}} (can we name it something like {{totalCount}}?) to 
provide Prometheus its required info.

> Prometheus exporter endpoints for metrics
> -
>
> Key: MESOS-6918
> URL: https://issues.apache.org/jira/browse/MESOS-6918
> Project: Mesos
>  Issue Type: Bug
>  Components: statistics
>Reporter: James Peach
>Assignee: James Peach
>
> There are a couple of [Prometheus|https://prometheus.io] metrics exporters 
> for Mesos, of varying quality. Since the Mesos stats system actually knows 
> about statistics data types and semantics, and Mesos has reasonable HTTP 
> support we could add Prometheus metrics endpoints to directly expose 
> statistics in [Prometheus wire 
> format|https://prometheus.io/docs/instrumenting/exposition_formats/], 
> removing the need for operators to run separate exporter processes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7990) Support systemd named hierarchy (name=systemd) for Mesos Containerizer.

2017-10-05 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-7990:
-

Assignee: Jie Yu

> Support systemd named hierarchy (name=systemd) for Mesos Containerizer.
> ---
>
> Key: MESOS-7990
> URL: https://issues.apache.org/jira/browse/MESOS-7990
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Similar to docker's cgroupfs cgroup driver, we should create cgroups under 
> /sys/fs/cgroup/systemd (if it exists), and move container pid into the 
> corresponding cgroup ( /sys/fs/cgroup/systemd/mesos/).
> This can give us a bunch of benefits:
> 1) systemd-cgls can list mesos containers
> 2) systemd-cgtop can show stats for mesos containers
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7990) Support systemd named hierarchy (name=systemd) for Mesos Containerizer.

2017-10-05 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194123#comment-16194123
 ] 

Jie Yu commented on MESOS-7990:
---

https://reviews.apache.org/r/62798/
https://reviews.apache.org/r/62800/

> Support systemd named hierarchy (name=systemd) for Mesos Containerizer.
> ---
>
> Key: MESOS-7990
> URL: https://issues.apache.org/jira/browse/MESOS-7990
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Jie Yu
>
> Similar to docker's cgroupfs cgroup driver, we should create cgroups under 
> /sys/fs/cgroup/systemd (if it exists), and move container pid into the 
> corresponding cgroup ( /sys/fs/cgroup/systemd/mesos/).
> This can give us a bunch of benefits:
> 1) systemd-cgls can list mesos containers
> 2) systemd-cgtop can show stats for mesos containers
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8056) IOSwitchboard is not put into 'mesos_executor.slice'.

2017-10-05 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194122#comment-16194122
 ] 

Jie Yu commented on MESOS-8056:
---

https://reviews.apache.org/r/62799/

> IOSwitchboard is not put into 'mesos_executor.slice'.
> -
>
> Key: MESOS-8056
> URL: https://issues.apache.org/jira/browse/MESOS-8056
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.2, 1.3.1, 1.4.0
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> This might cause it being killed by systemd if the agent systemd killmode is 
> set to cgroup.
> We should do that consistently with other long running helpers like log 
> rotate process.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8056) IOSwitchboard is not put into 'mesos_executor.slice'.

2017-10-05 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-8056:
-

Assignee: Jie Yu

> IOSwitchboard is not put into 'mesos_executor.slice'.
> -
>
> Key: MESOS-8056
> URL: https://issues.apache.org/jira/browse/MESOS-8056
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.2, 1.3.1, 1.4.0
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> This might cause it being killed by systemd if the agent systemd killmode is 
> set to cgroup.
> We should do that consistently with other long running helpers like log 
> rotate process.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8056) IOSwitchboard is not put into 'mesos_executor.slice'.

2017-10-05 Thread Jie Yu (JIRA)
Jie Yu created MESOS-8056:
-

 Summary: IOSwitchboard is not put into 'mesos_executor.slice'.
 Key: MESOS-8056
 URL: https://issues.apache.org/jira/browse/MESOS-8056
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 1.4.0, 1.3.1, 1.2.2
Reporter: Jie Yu


This might cause it being killed by systemd if the agent systemd killmode is 
set to cgroup.

We should do that consistently with other long running helpers like log rotate 
process.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-3440) Port mac_tests

2017-10-05 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-3440:
---

Assignee: Jeff Coffler  (was: Andrew Schwartzmeyer)

> Port mac_tests
> --
>
> Key: MESOS-3440
> URL: https://issues.apache.org/jira/browse/MESOS-3440
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Jeff Coffler
>  Labels: mesosphere, stout
>
> Depends on abort.hpp and mac.hpp. Will probably "just work" if we can get 
> these two files ported.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-3442) Port path_tests to Windows

2017-10-05 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193886#comment-16193886
 ] 

Andrew Schwartzmeyer commented on MESOS-3442:
-

This builds, but most of the tests are disabled for Windows and need porting.

> Port path_tests to Windows
> --
>
> Key: MESOS-3442
> URL: https://issues.apache.org/jira/browse/MESOS-3442
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: mesosphere, stout
>
> Will probably "just work" if we can get os.hpp working.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-3445) Port signals_tests to Windows

2017-10-05 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-3445:

Priority: Minor  (was: Major)

> Port signals_tests to Windows
> -
>
> Key: MESOS-3445
> URL: https://issues.apache.org/jira/browse/MESOS-3445
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>Priority: Minor
>  Labels: mesosphere, stout
>
> Probably will mostly be no-ops, but in any event, it depends on os.hpp, which 
> will be challenging to port.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-3447) Port svn_tests

2017-10-05 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-3447:

Priority: Minor  (was: Major)

> Port svn_tests
> --
>
> Key: MESOS-3447
> URL: https://issues.apache.org/jira/browse/MESOS-3447
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>Priority: Minor
>  Labels: mesosphere, stout
>
> Should be trivial if we have libapr and libsvn building and linking correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-3644) Implement stout/os/windows/signals.hpp

2017-10-05 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-3644:
---

Assignee: (was: Andrew Schwartzmeyer)

> Implement stout/os/windows/signals.hpp
> --
>
> Key: MESOS-3644
> URL: https://issues.apache.org/jira/browse/MESOS-3644
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Priority: Minor
>  Labels: mesosphere, windows
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-3644) Implement stout/os/windows/signals.hpp

2017-10-05 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193848#comment-16193848
 ] 

Andrew Schwartzmeyer commented on MESOS-3644:
-

In fact, the only use of it that I could find at a glance is:

{noformat}
src/logging/logging.cpp
100:os::signals::reset(signal);

3rdparty/libprocess/src/subprocess.cpp
148:if (os::signals::install(SIGTERM, ) != 0) {

src/slave/containerizer/mesos/launch.cpp
204:if (os::signals::install(i, signalHandler) != 0 && i < posixLimit) {

3rdparty/libprocess/src/tests/main.cpp
97:  os::signals::reset(SIGTERM);

src/slave/containerizer/mesos/io/switchboard_main.cpp
114:  if (os::signals::install(SIGTERM, sigtermHandler) != 0) {

3rdparty/stout/include/stout/os/signals.hpp
27:  if (os::signals::internal::Suppressor suppressor ## signal = \
28:  os::signals::internal::Suppressor(signal))

3rdparty/stout/include/stout/os/posix/signals.hpp
110:pending = signals::pending(signal);
115:  unblock = signals::block(signal);
128:if (!pending && signals::pending(signal)) {
167:  signals::unblock(signal);
{noformat}

The only signal that has a handler installed for it is {{SIGTERM}}... which is 
weird because you don't _generally_ want to handle this on Linux either 
({{SIGINT}} would be more appropriate). But neither of these signals are used 
on Windows.

While you can manually {{raise}} a signal on Windows, Ctrl-C will _not_ send 
{{SIGINT}}, and killing the process will _not_ send {{SIGTERM}}.

There does not appear to be a need to implement this on Windows.

> Implement stout/os/windows/signals.hpp
> --
>
> Key: MESOS-3644
> URL: https://issues.apache.org/jira/browse/MESOS-3644
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>Priority: Minor
>  Labels: mesosphere, windows
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-3644) Implement stout/os/windows/signals.hpp

2017-10-05 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-3644:

Priority: Minor  (was: Major)

> Implement stout/os/windows/signals.hpp
> --
>
> Key: MESOS-3644
> URL: https://issues.apache.org/jira/browse/MESOS-3644
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>Priority: Minor
>  Labels: mesosphere, windows
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-3644) Implement stout/os/windows/signals.hpp

2017-10-05 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193829#comment-16193829
 ] 

Andrew Schwartzmeyer commented on MESOS-3644:
-

I'm moving the priority of this down. While Windows _supports_ signals, it's 
not a paradigm that's generally used. We would only implement signals if 
something in _Mesos_ requires signals for logic, and cannot be changed.

> Implement stout/os/windows/signals.hpp
> --
>
> Key: MESOS-3644
> URL: https://issues.apache.org/jira/browse/MESOS-3644
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: mesosphere, windows
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6894) Checkpoint 'ContainerConfig' in Mesos Containerizer.

2017-10-05 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6894:

Summary: Checkpoint 'ContainerConfig' in Mesos Containerizer.  (was: Track 
ContainerInfo for containers)

> Checkpoint 'ContainerConfig' in Mesos Containerizer.
> 
>
> Key: MESOS-6894
> URL: https://issues.apache.org/jira/browse/MESOS-6894
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> This information can be used ford image GC in Mesos Containerizer, as well as 
> other purposes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8055) Design doc for offer operations feedback

2017-10-05 Thread JIRA
Gastón Kleiman created MESOS-8055:
-

 Summary: Design doc for offer operations feedback
 Key: MESOS-8055
 URL: https://issues.apache.org/jira/browse/MESOS-8055
 Project: Mesos
  Issue Type: Documentation
Reporter: Gastón Kleiman
Assignee: Gastón Kleiman






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8054) Feedback for offer operations

2017-10-05 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8054:
--
Summary: Feedback for offer operations  (was: Offer operations 
reconciliation)

> Feedback for offer operations
> -
>
> Key: MESOS-8054
> URL: https://issues.apache.org/jira/browse/MESOS-8054
> Project: Mesos
>  Issue Type: Epic
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8054) Feedback for offer operations

2017-10-05 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8054:
--
Epic Name: Feedback for offer operations  (was: Offer operations 
reconciliation)

> Feedback for offer operations
> -
>
> Key: MESOS-8054
> URL: https://issues.apache.org/jira/browse/MESOS-8054
> Project: Mesos
>  Issue Type: Epic
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8054) Feedback for offer operations

2017-10-05 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8054:
--
Description: Only LAUNCH operations provide feedback on success or failure. 
All Operations should do so. RESERVE, UNRESERVE, CREATE, DESTROY, 
CREATE_VOLUME, AND DESTROY_VOLUME should all provide feedback on success or 
failure.

> Feedback for offer operations
> -
>
> Key: MESOS-8054
> URL: https://issues.apache.org/jira/browse/MESOS-8054
> Project: Mesos
>  Issue Type: Epic
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>
> Only LAUNCH operations provide feedback on success or failure. All Operations 
> should do so. RESERVE, UNRESERVE, CREATE, DESTROY, CREATE_VOLUME, AND 
> DESTROY_VOLUME should all provide feedback on success or failure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7578) Write a proposal to make the I/O Switchboards optional

2017-10-05 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman reassigned MESOS-7578:
-

Assignee: (was: Gastón Kleiman)

> Write a proposal to make the I/O Switchboards optional
> --
>
> Key: MESOS-7578
> URL: https://issues.apache.org/jira/browse/MESOS-7578
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Gastón Kleiman
>  Labels: check, containerizer, health-check, mesosphere
>
> Right now DEBUG containers can only be started using the 
> LaunchNestedContainerSession API call. They will enter its parent’s 
> namespaces, inherit environment variables, stream its I/O, and Mesos will tie 
> their life-cycle to the lifetime of the HTTP connection.
> Streaming the I/O of a container requires an I/O Switchboard and adds some 
> overhead and complexity:
> - Mesos will launch an extra process, called an I/O Switchboard for each 
> nested container. These process aren’t free, they take some time to 
> create/destroy and consume resources.
> - I/O Switchboards are managed by a complex isolator.
> - /O Swichboards introduce new race conditions, and have been a source of 
> deadlocks in the past. 
> Some use cases require some of the features provided by DEBUG containers, but 
> don’t need the functionality provided by the I/O switchboard. For instance, 
> the Default Executor uses DEBUG containers to perform (health)checks, but it 
> doesn’t need to stream anything to/from the container. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-6494) Clean up the flags parsing in the executors.

2017-10-05 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman reassigned MESOS-6494:
-

Assignee: (was: Gastón Kleiman)

> Clean up the flags parsing in the executors.
> 
>
> Key: MESOS-6494
> URL: https://issues.apache.org/jira/browse/MESOS-6494
> Project: Mesos
>  Issue Type: Improvement
>  Components: executor
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> The current executors and the executor libraries use a mix of `stout::flags` 
> and `os::getenv` to parse flags, leading to a lot of unnecessary and 
> sometimes duplicated code.
> This should be cleaned up, using only {{stout::flags}} to parse flags.
> Environment variables should be used for the flags that are common to ALL the 
> executors (listed in the Executor HTTP API doc).
> Command line parameters should be used for flags that apply only to 
> individual executors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8054) Offer operations reconciliation

2017-10-05 Thread JIRA
Gastón Kleiman created MESOS-8054:
-

 Summary: Offer operations reconciliation
 Key: MESOS-8054
 URL: https://issues.apache.org/jira/browse/MESOS-8054
 Project: Mesos
  Issue Type: Epic
Reporter: Gastón Kleiman
Assignee: Gastón Kleiman






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7504) Parent's mount namespace cannot be determined when launching a nested container.

2017-10-05 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193100#comment-16193100
 ] 

Andrei Budnik commented on MESOS-7504:
--

Containerizer launcher spawns 
[pre-exec-hooks|https://github.com/apache/mesos/blob/46db7e4f27831d20244a57b22a70312f2a574395/src/slave/containerizer/mesos/launch.cpp#L384]
 before launching given command (e.g. `sleep 1000`).
For 
{{NestedMesosContainerizerTest.ROOT_CGROUPS_DestroyDebugContainerOnRecover}} 
test, we need to enter {{"cgroups/cpu,filesystem/linux,namespaces/pid"}} 
namespaces, where `filesystem/linux` and `namespaces/pid` isolators add 2 
pre-exec-hooks, from logs:
{code}
Executing pre-exec command 
'{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/abudnik\/mesos\/build\/src\/mesos-containerizer"}'
Executing pre-exec command '{"shell":true,"value":"mount -n -t proc proc \/proc 
-o nosuid,noexec,nodev"}'
{code}
After launching parent container, we try to launch nested container. Agent 
[calls|https://github.com/apache/mesos/blob/46db7e4f27831d20244a57b22a70312f2a574395/src/slave/containerizer/mesos/containerizer.cpp#L1758]
  
[getMountNamespaceTarget|https://github.com/apache/mesos/blob/46db7e4f27831d20244a57b22a70312f2a574395/src/slave/containerizer/mesos/utils.cpp#L59]
 function, which returns the "Cannot get target mount namespace from process" 
error in this test.
If you take a look at it, you'll find that there is a small delay after 
enumerating all child processes (which might still contain running 
pre-exec-hook processes) and before calling {{ns::getns}} for each child 
process. During this delay any of pre-exec-hook processes might exit, hence 
causing this error message.

> Parent's mount namespace cannot be determined when launching a nested 
> container.
> 
>
> Key: MESOS-7504
> URL: https://issues.apache.org/jira/browse/MESOS-7504
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
> Environment: Ubuntu 16.04
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: containerizer, flaky-test, mesosphere
>
> I've observed this failure twice in different Linux environments. Here is an 
> example of such failure:
> {noformat}
> [ RUN  ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_DestroyDebugContainerOnRecover
> I0509 21:53:25.471657 17167 containerizer.cpp:221] Using isolation: 
> cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
> I0509 21:53:25.475124 17167 linux_launcher.cpp:150] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> I0509 21:53:25.475407 17167 provisioner.cpp:249] Using default backend 
> 'overlay'
> I0509 21:53:25.481232 17186 containerizer.cpp:608] Recovering containerizer
> I0509 21:53:25.482295 17186 provisioner.cpp:410] Provisioner recovery complete
> I0509 21:53:25.482587 17187 containerizer.cpp:1001] Starting container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d for executor 'executor' of framework 
> I0509 21:53:25.482918 17189 cgroups.cpp:410] Creating cgroup at 
> '/sys/fs/cgroup/cpu,cpuacct/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d'
>  for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.484103 17190 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus 
> 1) for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.484808 17186 containerizer.cpp:1524] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"sleep
>  
> 1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/ubuntu\/workspace\/mesos\/Mesos_CI-build\/FLAG\/SSL\/label\/mesos-ec2-ubuntu-16.04\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o 
> nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}"
>  --pipe_read="29" --pipe_write="32" 
> --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_sKhtj7/containers/21bc372c-0f2c-49f5-b8ab-8d32c232b95d"
>  --unshare_namespace_mnt="false"'
> I0509 21:53:25.484978 17189 linux_launcher.cpp:429] Launching container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d and cloning with namespaces CLONE_NEWNS 
> | CLONE_NEWPID
> I0509 21:53:25.513890 17186 containerizer.cpp:1623] 

[jira] [Comment Edited] (MESOS-8038) Launching GPU task sporadically fails.

2017-10-05 Thread Sai Teja Ranuva (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193021#comment-16193021
 ] 

Sai Teja Ranuva edited comment on MESOS-8038 at 10/5/17 4:03 PM:
-

[~bmahler] [~klueska]
Attached the logs for master and slave. 
For more context, when a large number of tasks (10-20) are launched, this 
problem occurs always.
When the framework launches job, with lesser number of tasks(1-6), it works 
fine or fails rarely. All the tasks are gpu based.


was (Author: saitejar):
[~bmahler] Attached the logs for master and slave. 
For more context, when a large number of tasks (10-20) are launched, this 
problem occurs always.
When the framework launches job, with lesser number of tasks(1-6), it works 
fine or fails rarely. All the tasks are gpu based.

> Launching GPU task sporadically fails.
> --
>
> Key: MESOS-8038
> URL: https://issues.apache.org/jira/browse/MESOS-8038
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, gpu
>Affects Versions: 1.4.0
>Reporter: Sai Teja Ranuva
>Priority: Critical
> Attachments: mesos-master.log, mesos-slave.INFO.log
>
>
> I was running a job which uses GPUs. It runs fine most of the time. 
> But occasionally I see the following message in the mesos log.
> "Collect failed: Requested 1 but only 0 available"
> Followed by executor getting killed and the tasks getting lost. This happens 
> even before the the job starts. A little search in the code base points me to 
> something related to GPU resource being the probable cause.
> There is no deterministic way that this can be reproduced. It happens 
> occasionally.
> I have attached the slave log for the issue.
> Using 1.4.0 Mesos Master and 1.4.0 Mesos Slave.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-8038) Launching GPU task sporadically fails.

2017-10-05 Thread Sai Teja Ranuva (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193021#comment-16193021
 ] 

Sai Teja Ranuva edited comment on MESOS-8038 at 10/5/17 4:03 PM:
-

[~bmahler] [~klueska]
Attached the logs for master and slave. 
For more context, when a large number of tasks (10-20) are launched, this 
problem occurs always.
When the framework launches job, with lesser number of tasks(1-6), it works 
fine or fails rarely. All the tasks are gpu based.


was (Author: saitejar):
[~bmahler] [~klueska]
Attached the logs for master and slave. 
For more context, when a large number of tasks (10-20) are launched, this 
problem occurs always.
When the framework launches job, with lesser number of tasks(1-6), it works 
fine or fails rarely. All the tasks are gpu based.

> Launching GPU task sporadically fails.
> --
>
> Key: MESOS-8038
> URL: https://issues.apache.org/jira/browse/MESOS-8038
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, gpu
>Affects Versions: 1.4.0
>Reporter: Sai Teja Ranuva
>Priority: Critical
> Attachments: mesos-master.log, mesos-slave.INFO.log
>
>
> I was running a job which uses GPUs. It runs fine most of the time. 
> But occasionally I see the following message in the mesos log.
> "Collect failed: Requested 1 but only 0 available"
> Followed by executor getting killed and the tasks getting lost. This happens 
> even before the the job starts. A little search in the code base points me to 
> something related to GPU resource being the probable cause.
> There is no deterministic way that this can be reproduced. It happens 
> occasionally.
> I have attached the slave log for the issue.
> Using 1.4.0 Mesos Master and 1.4.0 Mesos Slave.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-8038) Launching GPU task sporadically fails.

2017-10-05 Thread Sai Teja Ranuva (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193021#comment-16193021
 ] 

Sai Teja Ranuva edited comment on MESOS-8038 at 10/5/17 4:00 PM:
-

[~bmahler] Attached the logs for master and slave. 
For more context, when a large number of tasks (10-20) are launched, this 
problem occurs always.
When the framework launches job, with lesser number of tasks(1-6), it works 
fine or fails rarely. All the tasks are gpu based.


was (Author: saitejar):
[~bmahler] Attached the logs for master and slave. 
For more context, when a large number of tasks (10-20) are launched, this 
problem occurs always.
When the framework launches job, with lesser number of tasks(1-6), it works 
fine or occurs very rarely. All the tasks are gpu based.

> Launching GPU task sporadically fails.
> --
>
> Key: MESOS-8038
> URL: https://issues.apache.org/jira/browse/MESOS-8038
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, gpu
>Affects Versions: 1.4.0
>Reporter: Sai Teja Ranuva
>Priority: Critical
> Attachments: mesos-master.log, mesos-slave.INFO.log
>
>
> I was running a job which uses GPUs. It runs fine most of the time. 
> But occasionally I see the following message in the mesos log.
> "Collect failed: Requested 1 but only 0 available"
> Followed by executor getting killed and the tasks getting lost. This happens 
> even before the the job starts. A little search in the code base points me to 
> something related to GPU resource being the probable cause.
> There is no deterministic way that this can be reproduced. It happens 
> occasionally.
> I have attached the slave log for the issue.
> Using 1.4.0 Mesos Master and 1.4.0 Mesos Slave.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7757) Update master to handle updates to agent total resources

2017-10-05 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193059#comment-16193059
 ] 

Benjamin Bannier commented on MESOS-7757:
-

{noformat}
commit d38fe9d5a4db0a37b876c55c99b547d4c8fbd8dd
Author: Benjamin Bannier 
Date:   Thu Sep 7 16:09:10 2017 +0200

Rescinded offers possibly affected by updates to agent total resources.

When an agent changes its resources, the master should rescind any
offers affected by the change. We already performed the rescind for
updates to the agent's oversubscribed resources; this patch adds offer
rescinding when an update an agent's total resources is processed.

While for updates to an agent's oversubscribed resources we currently
only rescind offers containing revocable resources to e.g., reduce
offer churn, for updates to the total we here currently rescind all
offers for resources on the agent.

As an optimization, this patch adds logic to ignore redundant updates
to agent resources.

Review: https://reviews.apache.org/r/62158
{noformat}

> Update master to handle updates to agent total resources
> 
>
> Key: MESOS-7757
> URL: https://issues.apache.org/jira/browse/MESOS-7757
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere, storage
> Fix For: 1.4.0
>
>
> With MESOS-7755 we update the allocator interface to support updating the 
> total resources on an agent. These allocator invocations are driven by the 
> master when it receives an update the an agent's total resources.
> We could transport the updates from agents to the master either as update to 
> {{UpdateSlaveMessage}}, e.g., by adding a {{repeated Resource total}} field; 
> in order to distinguish updates to {{oversubscribed}} to updates to {{total}} 
> we would need to introduce an additional tag field (an empty list of 
> {{Resource}} has the same representation as an absent list of {{Resource}}). 
> Alternatively we could introduce a new message transporting just the update 
> to {{total}}; it should be possible to reuse such a message for external 
> resource providers which we will likely add at a later point.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-8038) Launching GPU task sporadically fails.

2017-10-05 Thread Sai Teja Ranuva (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193021#comment-16193021
 ] 

Sai Teja Ranuva edited comment on MESOS-8038 at 10/5/17 3:35 PM:
-

[~bmahler] Attached the logs for master and slave. 
For more context, when a large number of tasks (10-20) are launched, this 
problem occurs always.
When the framework launches job, with lesser number of tasks(1-6), it works 
fine or occurs very rarely. All the tasks are gpu based.


was (Author: saitejar):
[~bmahler] Attached the logs for master and slave. 

> Launching GPU task sporadically fails.
> --
>
> Key: MESOS-8038
> URL: https://issues.apache.org/jira/browse/MESOS-8038
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, gpu
>Affects Versions: 1.4.0
>Reporter: Sai Teja Ranuva
>Priority: Critical
> Attachments: mesos-master.log, mesos-slave.INFO.log
>
>
> I was running a job which uses GPUs. It runs fine most of the time. 
> But occasionally I see the following message in the mesos log.
> "Collect failed: Requested 1 but only 0 available"
> Followed by executor getting killed and the tasks getting lost. This happens 
> even before the the job starts. A little search in the code base points me to 
> something related to GPU resource being the probable cause.
> There is no deterministic way that this can be reproduced. It happens 
> occasionally.
> I have attached the slave log for the issue.
> Using 1.4.0 Mesos Master and 1.4.0 Mesos Slave.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8038) Launching GPU task sporadically fails.

2017-10-05 Thread Sai Teja Ranuva (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193021#comment-16193021
 ] 

Sai Teja Ranuva commented on MESOS-8038:


[~bmahler] Attached the logs for master and slave. 

> Launching GPU task sporadically fails.
> --
>
> Key: MESOS-8038
> URL: https://issues.apache.org/jira/browse/MESOS-8038
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, gpu
>Affects Versions: 1.4.0
>Reporter: Sai Teja Ranuva
>Priority: Critical
> Attachments: mesos-master.log, mesos-slave.INFO.log
>
>
> I was running a job which uses GPUs. It runs fine most of the time. 
> But occasionally I see the following message in the mesos log.
> "Collect failed: Requested 1 but only 0 available"
> Followed by executor getting killed and the tasks getting lost. This happens 
> even before the the job starts. A little search in the code base points me to 
> something related to GPU resource being the probable cause.
> There is no deterministic way that this can be reproduced. It happens 
> occasionally.
> I have attached the slave log for the issue.
> Using 1.4.0 Mesos Master and 1.4.0 Mesos Slave.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8038) Launching GPU task sporadically fails.

2017-10-05 Thread Sai Teja Ranuva (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Teja Ranuva updated MESOS-8038:
---
Attachment: mesos-slave.INFO.log

> Launching GPU task sporadically fails.
> --
>
> Key: MESOS-8038
> URL: https://issues.apache.org/jira/browse/MESOS-8038
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, gpu
>Affects Versions: 1.4.0
>Reporter: Sai Teja Ranuva
>Priority: Critical
> Attachments: mesos-master.log, mesos-slave.INFO.log
>
>
> I was running a job which uses GPUs. It runs fine most of the time. 
> But occasionally I see the following message in the mesos log.
> "Collect failed: Requested 1 but only 0 available"
> Followed by executor getting killed and the tasks getting lost. This happens 
> even before the the job starts. A little search in the code base points me to 
> something related to GPU resource being the probable cause.
> There is no deterministic way that this can be reproduced. It happens 
> occasionally.
> I have attached the slave log for the issue.
> Using 1.4.0 Mesos Master and 1.4.0 Mesos Slave.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8038) Launching GPU task sporadically fails.

2017-10-05 Thread Sai Teja Ranuva (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Teja Ranuva updated MESOS-8038:
---
Attachment: (was: mesos-slave.INFO)

> Launching GPU task sporadically fails.
> --
>
> Key: MESOS-8038
> URL: https://issues.apache.org/jira/browse/MESOS-8038
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, gpu
>Affects Versions: 1.4.0
>Reporter: Sai Teja Ranuva
>Priority: Critical
> Attachments: mesos-master.log, mesos-slave.INFO.log
>
>
> I was running a job which uses GPUs. It runs fine most of the time. 
> But occasionally I see the following message in the mesos log.
> "Collect failed: Requested 1 but only 0 available"
> Followed by executor getting killed and the tasks getting lost. This happens 
> even before the the job starts. A little search in the code base points me to 
> something related to GPU resource being the probable cause.
> There is no deterministic way that this can be reproduced. It happens 
> occasionally.
> I have attached the slave log for the issue.
> Using 1.4.0 Mesos Master and 1.4.0 Mesos Slave.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8038) Launching GPU task sporadically fails.

2017-10-05 Thread Sai Teja Ranuva (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Teja Ranuva updated MESOS-8038:
---
Attachment: mesos-master.log

> Launching GPU task sporadically fails.
> --
>
> Key: MESOS-8038
> URL: https://issues.apache.org/jira/browse/MESOS-8038
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, gpu
>Affects Versions: 1.4.0
>Reporter: Sai Teja Ranuva
>Priority: Critical
> Attachments: mesos-master.log, mesos-slave.INFO
>
>
> I was running a job which uses GPUs. It runs fine most of the time. 
> But occasionally I see the following message in the mesos log.
> "Collect failed: Requested 1 but only 0 available"
> Followed by executor getting killed and the tasks getting lost. This happens 
> even before the the job starts. A little search in the code base points me to 
> something related to GPU resource being the probable cause.
> There is no deterministic way that this can be reproduced. It happens 
> occasionally.
> I have attached the slave log for the issue.
> Using 1.4.0 Mesos Master and 1.4.0 Mesos Slave.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6390) Ensure Python support scripts are linted

2017-10-05 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192922#comment-16192922
 ] 

Kevin Klues commented on MESOS-6390:


{noformat}
commit 38af943ad6a2f8e7a47148ab6637692978545500
Author: Armand Grillet 
Date:   Thu Oct 5 16:03:51 2017 +0200

Added support/ to the list of the linted directories.

By adding the support directory to 'mesos-style.py', we make sure
that all our support scripts follow the same coding style that the
rest of our Python codebase uses.

We also added invalid-name to 'pylint.config' as all the Python files
in support/ use dashes instead of underscores and we also added
'file-ignored' as we do not lint 'support/post-reviews.py' yet.

Review: https://reviews.apache.org/r/62788/
{noformat}

> Ensure Python support scripts are linted
> 
>
> Key: MESOS-6390
> URL: https://issues.apache.org/jira/browse/MESOS-6390
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Bannier
>Assignee: Armand Grillet
>  Labels: newbie, python
> Fix For: 1.5.0
>
>
> Currently {{support/mesos-style.py}} does not lint files under {{support/}}. 
> This is mostly due to the fact that these scripts are too inconsistent 
> style-wise that they wouldn't even pass the linter now.
> We should clean up all Python scripts under {{support/}} so they pass the 
> Python linter, and activate that directory in the linter for future 
> additions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-2923) fetcher.cpp - problem with certificates..?

2017-10-05 Thread Jean-Baptiste (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192712#comment-16192712
 ] 

Jean-Baptiste edited comment on MESOS-2923 at 10/5/17 12:43 PM:


Hi guys, FYI I've opened a pull request to fix it on Debian:

https://github.com/mesosphere/mesos-deb-packaging/pull/111 

Maybe it can be reviewed by [~bernd-mesos] :D


was (Author: jibek):
Hi guys, FYI I've opened a pull request to fix it on Debian:

https://github.com/mesosphere/mesos-deb-packaging/pull/111 

> fetcher.cpp - problem with certificates..?
> --
>
> Key: MESOS-2923
> URL: https://issues.apache.org/jira/browse/MESOS-2923
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.22.1
> Environment: Ubuntu 14.04 (build + test)
>Reporter: Tomasz Mieszkowski
>Assignee: Bernd Mathiske
>  Labels: bugs, fetcher, https, mesosphere
>
> Mesos 0.22.0/0.22.1 built and installed from sources accordingly to the 
> instructions given [here|http://mesos.apache.org/gettingstarted/] has some 
> problem with certificates.
> Every time I try to deploy something that requires downloading any resource 
> via HTTPS (with URI specified via Marathon), such deployment fails and I get 
> this message in failed app's sandbox:
> {code}
> E0617 09:58:44.339409 12380 fetcher.cpp:138] Error downloading resource: 
> Problem with the SSL CA cert (path? access rights?)
> {code}
> Trying to download the same resource on the same slave with {{curl}} or 
> {{wget}} works without problems.
> Moreover, when I install exactly the same version of Mesos from Mesosphere's 
> debs on identical machines (i.e., set up by the same Ansible scripts), 
> everything works fine as well.
> I guess it must be something related to the way how Mesos is built - maybe 
> some missing switch for {{configure}} or {{make}}..?
> Any ideas..?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7504) Parent's mount namespace cannot be determined when launching a nested container.

2017-10-05 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192797#comment-16192797
 ] 

Andrei Budnik commented on MESOS-7504:
--

Code modifications to reproduce test failure:
1. Add {{::sleep(1);}} to 
https://github.com/apache/mesos/blob/657a930e173aaee7a168734bf59e8eb022d6668f/src/tests/containerizer/nested_mesos_containerizer_tests.cpp#L1144
2. Add {{launchInfo.add_pre_exec_commands()->set_value("sleep 2");}} to 
https://github.com/apache/mesos/blob/657a930e173aaee7a168734bf59e8eb022d6668f/src/slave/containerizer/mesos/isolators/namespaces/pid.cpp#L135
3. Add {{::sleep(3);}} to 
https://github.com/apache/mesos/blob/657a930e173aaee7a168734bf59e8eb022d6668f/src/slave/containerizer/mesos/utils.cpp#L73

> Parent's mount namespace cannot be determined when launching a nested 
> container.
> 
>
> Key: MESOS-7504
> URL: https://issues.apache.org/jira/browse/MESOS-7504
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
> Environment: Ubuntu 16.04
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: containerizer, flaky-test, mesosphere
>
> I've observed this failure twice in different Linux environments. Here is an 
> example of such failure:
> {noformat}
> [ RUN  ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_DestroyDebugContainerOnRecover
> I0509 21:53:25.471657 17167 containerizer.cpp:221] Using isolation: 
> cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
> I0509 21:53:25.475124 17167 linux_launcher.cpp:150] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> I0509 21:53:25.475407 17167 provisioner.cpp:249] Using default backend 
> 'overlay'
> I0509 21:53:25.481232 17186 containerizer.cpp:608] Recovering containerizer
> I0509 21:53:25.482295 17186 provisioner.cpp:410] Provisioner recovery complete
> I0509 21:53:25.482587 17187 containerizer.cpp:1001] Starting container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d for executor 'executor' of framework 
> I0509 21:53:25.482918 17189 cgroups.cpp:410] Creating cgroup at 
> '/sys/fs/cgroup/cpu,cpuacct/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d'
>  for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.484103 17190 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus 
> 1) for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.484808 17186 containerizer.cpp:1524] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"sleep
>  
> 1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/ubuntu\/workspace\/mesos\/Mesos_CI-build\/FLAG\/SSL\/label\/mesos-ec2-ubuntu-16.04\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o 
> nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}"
>  --pipe_read="29" --pipe_write="32" 
> --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_sKhtj7/containers/21bc372c-0f2c-49f5-b8ab-8d32c232b95d"
>  --unshare_namespace_mnt="false"'
> I0509 21:53:25.484978 17189 linux_launcher.cpp:429] Launching container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d and cloning with namespaces CLONE_NEWNS 
> | CLONE_NEWPID
> I0509 21:53:25.513890 17186 containerizer.cpp:1623] Checkpointing container's 
> forked pid 1873 to 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_Rdjw6M/meta/slaves/frameworks/executors/executor/runs/21bc372c-0f2c-49f5-b8ab-8d32c232b95d/pids/forked.pid'
> I0509 21:53:25.515878 17190 fetcher.cpp:353] Starting to fetch URIs for 
> container: 21bc372c-0f2c-49f5-b8ab-8d32c232b95d, directory: 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr
> I0509 21:53:25.517715 17193 containerizer.cpp:1791] Starting nested container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.518569 17193 switchboard.cpp:545] Launching 
> 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" 
> --help="false" 
> --socket_address="/tmp/mesos-io-switchboard-ca463cf2-70ba-4121-a5c6-1a170ae40c1b"
>  --stderr_from_fd="36" --stderr_to_fd="2" --stdin_to_fd="32" 
> --stdout_from_fd="33" --stdout_to_fd="1" --tty="false" 
> --wait_for_connection="true"' for container 
> 

[jira] [Commented] (MESOS-8053) CMake based project is poorly supported in IDEs

2017-10-05 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192748#comment-16192748
 ] 

Alexander Rukletsov commented on MESOS-8053:


A possible approach:
https://reviews.apache.org/r/62070/
https://reviews.apache.org/r/62071/

> CMake based project is poorly supported in IDEs
> ---
>
> Key: MESOS-8053
> URL: https://issues.apache.org/jira/browse/MESOS-8053
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>  Labels: build, cmake
>
> When cmake project is created for whatever IDE, only files targeted for build 
> for the current platform are included. This impacts code navigation features 
> in IDEs. CMake project should include all source code files regardless of the 
> current platform plus all relevant files, i.e. .proto, .js.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8053) CMake based project is poorly supported in IDEs

2017-10-05 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8053:
--

 Summary: CMake based project is poorly supported in IDEs
 Key: MESOS-8053
 URL: https://issues.apache.org/jira/browse/MESOS-8053
 Project: Mesos
  Issue Type: Improvement
Reporter: Alexander Rukletsov


When cmake project is created for whatever IDE, only files targeted for build 
for the current platform are included. This impacts code navigation features in 
IDEs. CMake project should include all source code files regardless of the 
current platform plus all relevant files, i.e. .proto, .js.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-2923) fetcher.cpp - problem with certificates..?

2017-10-05 Thread Jean-Baptiste (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192712#comment-16192712
 ] 

Jean-Baptiste commented on MESOS-2923:
--

Hi guys, FYI I've opened a pull request to fix it on Debian:

https://github.com/mesosphere/mesos-deb-packaging/pull/111 

> fetcher.cpp - problem with certificates..?
> --
>
> Key: MESOS-2923
> URL: https://issues.apache.org/jira/browse/MESOS-2923
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.22.1
> Environment: Ubuntu 14.04 (build + test)
>Reporter: Tomasz Mieszkowski
>Assignee: Bernd Mathiske
>  Labels: bugs, fetcher, https, mesosphere
>
> Mesos 0.22.0/0.22.1 built and installed from sources accordingly to the 
> instructions given [here|http://mesos.apache.org/gettingstarted/] has some 
> problem with certificates.
> Every time I try to deploy something that requires downloading any resource 
> via HTTPS (with URI specified via Marathon), such deployment fails and I get 
> this message in failed app's sandbox:
> {code}
> E0617 09:58:44.339409 12380 fetcher.cpp:138] Error downloading resource: 
> Problem with the SSL CA cert (path? access rights?)
> {code}
> Trying to download the same resource on the same slave with {{curl}} or 
> {{wget}} works without problems.
> Moreover, when I install exactly the same version of Mesos from Mesosphere's 
> debs on identical machines (i.e., set up by the same Ansible scripts), 
> everything works fine as well.
> I guess it must be something related to the way how Mesos is built - maybe 
> some missing switch for {{configure}} or {{make}}..?
> Any ideas..?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)