date:20150519


 [ 
https://issues.apache.org/jira/browse/MESOS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2731:
--
Labels: mesosphere  (was: )

 Allow frameworks to deploy storage drivers on demand.
 -

 Key: MESOS-2731
 URL: https://issues.apache.org/jira/browse/MESOS-2731
 Project: Mesos
  Issue Type: Epic
Reporter: Joerg Schad
  Labels: mesosphere

 Certain storage options require storage drivers to access them including HDFS 
 driver, Quobyte client, Database driver, and so on.
 When Tasks in Mesos require access to such storage they also need access to 
 the respective driver on the node where they were scheduled to.
 As it is not desirable to deploy the driver onto all nodes in the cluster, it 
 would be good to deploy the driver on demand.
 Use Cases:
 1. Fetcher Cache accessing resources from user-provided URIs
 2. Framework executors/tasks requiring access to HDFS/DFS
 3. Framework executors/tasks requiring Databases access (requiring drivers)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2588) Create pre-create hook before a Docker container launches


[ 
https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550779#comment-14550779
 ] 

haosdent commented on MESOS-2588:
-

[~baotiao]Sorry for not update this issue quickly. I unassigned it now.

 Create pre-create hook before a Docker container launches
 -

 Key: MESOS-2588
 URL: https://issues.apache.org/jira/browse/MESOS-2588
 Project: Mesos
  Issue Type: Bug
  Components: docker
Reporter: Timothy Chen

 To be able to support custom actions to be called before launching a docker 
 contianer, we should create a hook that can be extensible and allow 
 module/hooks to be performed before a docker container is launched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2732) Expose Mount Tables

[
https://issues.apache.org/jira/browse/MESOS-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adam B updated MESOS-2732:
--
Labels: mesosphere (was: )

Expose Mount Tables
---

Key: MESOS-2732
URL: https://issues.apache.org/jira/browse/MESOS-2732
Project: Mesos
Issue Type: Epic
Reporter: Joerg Schad
Labels: mesosphere

When there are multiple distributed filesystems connected to a Mesos cluster,
clients (e.g. the Mesos fetcher, or a Mesos task) of those filesystems need a
clear way to distinguish between them and Mesos needs a way to direct
requests to the correct (distributed) filesystem.
#Use Cases:
- Multiple HDFS clusters on the same Mesos cluster
- Connecting HDFS, MapRFS, Ceph, Lustre, GlusterFS, S3, GCS, and other
SAN/NAS to a Mesos cluster
- The Mesos fetcher may want to pull from any of the above.
- An executor or task may want to read or write to multiple filesystems,
within the same process.
#Traditional Operating System Analogy:
Each line in Linux's fstab describes a different filesystem to mount into the
root filesystem:
1. The device name or remote filesystem to be mounted.
2. The mount point, where the data is to be attached to the root file system.
3. The file system type or algorithm used to interpret the file system.
4. Options to be used when mounting (e.g. Read-Only).
What we need for each filesystem in the Mesos ecosystem:
1. The metadata server or dfs/san entrypoint host:port
2. Mount point, where this filesystem fits into the universal
Mesos-accessible filesystem namespace.
3. The protocol to speak, perhaps acceptable URI prefixes.
4. Options, ACLs for which frameworks/principals can access a particular
filesystem, and how.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2728) Introduce concept of cluster wide resources.

[
https://issues.apache.org/jira/browse/MESOS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adam B updated MESOS-2728:
--
Description:
There are resources which are not provided by a single node. Consider for
example a external Network Bandwidth of a cluster. Being a limited resource it
makes sense for Mesos to manage it but still it is not a resource being offered
by a single node. A cluster-wide resource is still consumed by a task, and when
that task completes, the resources are then available to be allocated to
another framework/task.

Use Cases:
1. Network Bandwidth
2. IP Addresses
3. Global Service Ports
2. Distributed File System Storage
3. Software Licences

was:
There are resources which are not provided by a single node. Consider for
example a external Network Bandwidth of a cluster. Being a limited resource it
makes sense for Mesos to manage it but still it is not a resource being offered
by a single node.

Use Cases:
1. Network Bandwidth
2. IP Addresses
3. Global Service Ports
2. Distributed File System Storage
3. Software Licences

Introduce concept of cluster wide resources.

Key: MESOS-2728
URL: https://issues.apache.org/jira/browse/MESOS-2728
Project: Mesos
Issue Type: Epic
Reporter: Joerg Schad
Labels: mesosphere

There are resources which are not provided by a single node. Consider for
example a external Network Bandwidth of a cluster. Being a limited resource
it makes sense for Mesos to manage it but still it is not a resource being
offered by a single node. A cluster-wide resource is still consumed by a
task, and when that task completes, the resources are then available to be
allocated to another framework/task.
Use Cases:
1. Network Bandwidth
2. IP Addresses
3. Global Service Ports
2. Distributed File System Storage
3. Software Licences

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2731) Allow frameworks to deploy storage drivers on demand.

[
https://issues.apache.org/jira/browse/MESOS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adam B updated MESOS-2731:
--
Description:
Certain storage options require storage drivers to access them including HDFS
driver, Quobyte client, Database driver, and so on.
When Tasks in Mesos require access to such storage they also need access to the
respective driver on the node where they were scheduled to.
As it is not desirable to deploy the driver onto all nodes in the cluster, it
would be good to deploy the driver on demand.

Use Cases:
1. Fetcher Cache pulling resources from user-provided URIs
2. Framework executors/tasks requiring r/w access to HDFS/DFS
3. Framework executors/tasks requiring r/w Databases access (requiring drivers)

was:
Certain storage options require storage drivers to access them including HDFS
driver, Quobyte client, Database driver, and so on.
When Tasks in Mesos require access to such storage they also need access to the
respective driver on the node where they were scheduled to.
As it is not desirable to deploy the driver onto all nodes in the cluster, it
would be good to deploy the driver on demand.

Use Cases:
1. Fetcher Cache accessing resources from user-provided URIs
2. Framework executors/tasks requiring access to HDFS/DFS
3. Framework executors/tasks requiring Databases access (requiring drivers)

Allow frameworks to deploy storage drivers on demand.
-

Key: MESOS-2731
URL: https://issues.apache.org/jira/browse/MESOS-2731
Project: Mesos
Issue Type: Epic
Reporter: Joerg Schad
Labels: mesosphere

Certain storage options require storage drivers to access them including HDFS
driver, Quobyte client, Database driver, and so on.
When Tasks in Mesos require access to such storage they also need access to
the respective driver on the node where they were scheduled to.
As it is not desirable to deploy the driver onto all nodes in the cluster, it
would be good to deploy the driver on demand.
Use Cases:
1. Fetcher Cache pulling resources from user-provided URIs
2. Framework executors/tasks requiring r/w access to HDFS/DFS
3. Framework executors/tasks requiring r/w Databases access (requiring
drivers)

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2588) Create pre-create hook before a Docker container launches

2015-05-19 Thread chenzongzhi (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550765#comment-14550765
 ] 

chenzongzhi commented on MESOS-2588:


Hey haosdent, Adam Avilla.  
Do you have any plan about this issue?
We really need this feature, so if you don't have time, maybe you can assign 
this feature to me.


 Create pre-create hook before a Docker container launches
 -

 Key: MESOS-2588
 URL: https://issues.apache.org/jira/browse/MESOS-2588
 Project: Mesos
  Issue Type: Bug
  Components: docker
Reporter: Timothy Chen
Assignee: haosdent

 To be able to support custom actions to be called before launching a docker 
 contianer, we should create a hook that can be extensible and allow 
 module/hooks to be performed before a docker container is launched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2588) Create pre-create hook before a Docker container launches


 [ 
https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-2588:

Assignee: (was: haosdent)

 Create pre-create hook before a Docker container launches
 -

 Key: MESOS-2588
 URL: https://issues.apache.org/jira/browse/MESOS-2588
 Project: Mesos
  Issue Type: Bug
  Components: docker
Reporter: Timothy Chen

 To be able to support custom actions to be called before launching a docker 
 contianer, we should create a hook that can be extensible and allow 
 module/hooks to be performed before a docker container is launched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2340) Publish JSON in ZK instead of serialized MasterInfo

2015-05-19 Thread Marco Massenzio (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550958#comment-14550958
 ] 

Marco Massenzio commented on MESOS-2340:


so, this is the simplest code I could came up with (this needs refining, 
obviously!) (in {{src/zookeeper/group.cpp}}:

{code}
  // if label is not None, this is the MasterInfo being serialized
  if (label.isSome()) {
// TODO: how do we serialize MasterInfo to JSON? we only have the
// raw serialized data here
string json = {\value\: \foobar\};
string loc = result + .json;
string jsonResult;
zk-create(
  loc,
  json,
  acl,
  ZOO_EPHEMERAL,
  jsonResult);
LOG(INFO)  Added JSON data to   jsonResult;
  }
{info}

If I now start the Master, I can see both nodes in the {{/test/json}} folder.
{noformat}
$ ./bin/mesos-master.sh --zk=zk://127.0.0.1:2181/json/test 
--work_dir=/tmp/mesos --quorum=1
{noformat}
{noformat}
[zk: localhost:2181(CONNECTED) 8] ls /json/test 
[log_replicas, info_10, info_10.json]

[zk: localhost:2181(CONNECTED) 9] get /json/test/info_10.json
{value: foobar}
cZxid = 0xe6
ctime = Tue May 19 11:24:55 PDT 2015
mZxid = 0xe6
mtime = Tue May 19 11:24:55 PDT 2015
pZxid = 0xe6
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x14d496680460057
dataLength = 19
numChildren = 0
{noformat}

and there's no .json in {{log_replicas}}.
I would like to get suggestions as to where to inject the JSON, as in the 
Group class, we only get the serialized String, not the {{MasterInfo}} PB.
There are obviously way around this, but I'd like to come up with an extensible 
way.

 Publish JSON in ZK instead of serialized MasterInfo
 ---

 Key: MESOS-2340
 URL: https://issues.apache.org/jira/browse/MESOS-2340
 Project: Mesos
  Issue Type: Improvement
Reporter: Zameer Manji
Assignee: haosdent

 Currently to discover the master a client needs the ZK node location and 
 access to the MasterInfo protobuf so it can deserialize the binary blob in 
 the node.
 I think it would be nice to publish JSON (like Twitter's ServerSets) so 
 clients are not tied to protobuf to do service discovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-2340) Publish JSON in ZK instead of serialized MasterInfo

2015-05-19 Thread Marco Massenzio (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550958#comment-14550958
 ] 

Marco Massenzio edited comment on MESOS-2340 at 5/19/15 6:30 PM:
-

so, this is the simplest code I could came up with (this needs refining, 
obviously!) (in {{src/zookeeper/group.cpp}}:

{code}
  // if label is not None, this is the MasterInfo being serialized
  if (label.isSome()) {
// TODO: how do we serialize MasterInfo to JSON? we only have the
// raw serialized data here
string json = {\value\: \foobar\};
string loc = result + .json;
string jsonResult;
zk-create(
  loc,
  json,
  acl,
  ZOO_EPHEMERAL,
  jsonResult);
LOG(INFO)  Added JSON data to   jsonResult;
  }
{code}

If I now start the Master, I can see both nodes in the {{/test/json}} folder.
{noformat}
$ ./bin/mesos-master.sh --zk=zk://127.0.0.1:2181/json/test 
--work_dir=/tmp/mesos --quorum=1
{noformat}
{noformat}
[zk: localhost:2181(CONNECTED) 8] ls /json/test 
[log_replicas, info_10, info_10.json]

[zk: localhost:2181(CONNECTED) 9] get /json/test/info_10.json
{value: foobar}
cZxid = 0xe6
ctime = Tue May 19 11:24:55 PDT 2015
mZxid = 0xe6
mtime = Tue May 19 11:24:55 PDT 2015
pZxid = 0xe6
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x14d496680460057
dataLength = 19
numChildren = 0
{noformat}

and there's no .json in {{log_replicas}}.
I would like to get suggestions as to where to inject the JSON, as in the 
Group class, we only get the serialized String, not the {{MasterInfo}} PB.
There are obviously way around this, but I'd like to come up with an extensible 
way.


was (Author: marco-mesos):
so, this is the simplest code I could came up with (this needs refining, 
obviously!) (in {{src/zookeeper/group.cpp}}:

{code}
  // if label is not None, this is the MasterInfo being serialized
  if (label.isSome()) {
// TODO: how do we serialize MasterInfo to JSON? we only have the
// raw serialized data here
string json = {\value\: \foobar\};
string loc = result + .json;
string jsonResult;
zk-create(
  loc,
  json,
  acl,
  ZOO_EPHEMERAL,
  jsonResult);
LOG(INFO)  Added JSON data to   jsonResult;
  }
{info}

If I now start the Master, I can see both nodes in the {{/test/json}} folder.
{noformat}
$ ./bin/mesos-master.sh --zk=zk://127.0.0.1:2181/json/test 
--work_dir=/tmp/mesos --quorum=1
{noformat}
{noformat}
[zk: localhost:2181(CONNECTED) 8] ls /json/test 
[log_replicas, info_10, info_10.json]

[zk: localhost:2181(CONNECTED) 9] get /json/test/info_10.json
{value: foobar}
cZxid = 0xe6
ctime = Tue May 19 11:24:55 PDT 2015
mZxid = 0xe6
mtime = Tue May 19 11:24:55 PDT 2015
pZxid = 0xe6
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x14d496680460057
dataLength = 19
numChildren = 0
{noformat}

and there's no .json in {{log_replicas}}.
I would like to get suggestions as to where to inject the JSON, as in the 
Group class, we only get the serialized String, not the {{MasterInfo}} PB.
There are obviously way around this, but I'd like to come up with an extensible 
way.

 Publish JSON in ZK instead of serialized MasterInfo
 ---

 Key: MESOS-2340
 URL: https://issues.apache.org/jira/browse/MESOS-2340
 Project: Mesos
  Issue Type: Improvement
Reporter: Zameer Manji
Assignee: haosdent

 Currently to discover the master a client needs the ZK node location and 
 access to the MasterInfo protobuf so it can deserialize the binary blob in 
 the node.
 I think it would be nice to publish JSON (like Twitter's ServerSets) so 
 clients are not tied to protobuf to do service discovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2717) Qemu/KVM containerizer

[
https://issues.apache.org/jira/browse/MESOS-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550964#comment-14550964
]

Ian Downes commented on MESOS-2717:
---

The containerizer interface was designed to support this and I'd be happy to
shepherd any efforts.

Some initial notes:
1) I agree, bridged networking would be simplest.
2) This could be done by the custom executor. Work is being started on making
IP addresses a global resource.
3) The fetcher should be used. Patches for caching objects will soon be
committed.
4) You could just run the VM inside cgroups/namespaces etc. and leverage the
existing code for managing them.
5) Be aware that you need to architect the code so the slave can be restarted
while VMs/containers are running, i.e., you'll need to re-establish said
connections during slave recovery.

Qemu/KVM containerizer
--

Key: MESOS-2717
URL: https://issues.apache.org/jira/browse/MESOS-2717
Project: Mesos
Issue Type: Wish
Components: containerization
Reporter: Pierre-Yves Ritschard

I think it would make sense for Mesos to have the ability to treat
hypervisors as containerizers and the most sensible one to start with would
probably be Qemu/KVM.
There are a few workloads that can require full-fledged VMs (the most obvious
one being Windows workloads).
The containerization code is well decoupled and seems simple enough, I can
definitely take a shot at it. VMs do bring some questions with them here is
my take on them:
1. Routing, network strategy
==
The simplest approach here might very well be to go for bridged networks
and leave the setup and inter slave routing up to the administrator
2. IP Address assignment

At first, it can be up to the Frameworks to deal with IP assignment.
The simplest way to address this could be to have an executor running
on slaves providing the qemu/kvm containerizer which would instrument a DHCP
server and collect IP + Mac address resources from slaves. While it may be up
to the frameworks to provide this, an example should most likely be provided.
3. VM Templates
==
VM templates should probably leverage the fetcher and could thus be copied
locally or fetch from HTTP(s) / HDFS.
4. Resource limiting

Mapping resouce constraints to the qemu command line is probably the easiest
part, Additional command line should also be fetchable. For Unix VMs, the
sandbox could show the output of the serial console
5. Libvirt / plain Qemu
=
I tend to favor limiting the amount of necessary hoops to jump through and
would thus investigate working directly with Qemu, maintaining an open
connection to the monitor to assert status.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-2750) Extend qeueing discipline wrappers to expose network isolator statistics

Paul Brett created MESOS-2750:
-

 Summary: Extend qeueing discipline wrappers to expose network 
isolator statistics
 Key: MESOS-2750
 URL: https://issues.apache.org/jira/browse/MESOS-2750
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Reporter: Paul Brett
Assignee: Paul Brett


Export Traffic Control statistics in queueing library to enable reporting out 
impact of network bandwidth statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2750) Extend queueing discipline wrappers to expose network isolator statistics


 [ 
https://issues.apache.org/jira/browse/MESOS-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2750:
--
Summary: Extend queueing discipline wrappers to expose network isolator 
statistics  (was: Extend qeueing discipline wrappers to expose network isolator 
statistics)

 Extend queueing discipline wrappers to expose network isolator statistics
 -

 Key: MESOS-2750
 URL: https://issues.apache.org/jira/browse/MESOS-2750
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Reporter: Paul Brett
Assignee: Paul Brett

 Export Traffic Control statistics in queueing library to enable reporting out 
 impact of network bandwidth statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2745) Add 'Path' to stout's user guide


[ 
https://issues.apache.org/jira/browse/MESOS-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550687#comment-14550687
 ] 

haosdent commented on MESOS-2745:
-

Review board: https://reviews.apache.org/r/34416/

 Add 'Path' to stout's user guide 
 -

 Key: MESOS-2745
 URL: https://issues.apache.org/jira/browse/MESOS-2745
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff
  Labels: newbie

 stout's README does not yet include 'Path', we should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2735) Change the interaction between the slave and the resource estimator from polling to pushing

2015-05-19 Thread Jie Yu (JIRA)

[
https://issues.apache.org/jira/browse/MESOS-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550838#comment-14550838
]

Jie Yu commented on MESOS-2735:
---

{quote} One of the advantages that we had discussed in the past was that the
pull model enables us to move as fast as we possibly can, rather than just
getting a bunch of messages queued up in the slave that we have to process.
{quote}

I don't think there is a difference in terms of queueing messages. The pull
model also queues messages in the slave (e.g.,
'estimator-oversubscribed().then(defer(...))' also queues messages in slave's
queue).

{quote} Even if we want to collect more fine-grained resource estimations a
ResourceEstimator could do this and store this information until future polls.
{quote}

I think there's no fundamental difference between the pull and the push model.
The are only two subtle differences between the two: 1) the push model makes
less assumptions about the slave behavior. 2) the push model is safer in the
face of bad behaved resource estimator. Let me elaborate both of them below:

Regarding (1), let's use an example. Say we want to write a resource estimator
which sends constant number of cpus (say 2 cpus) every 10 seconds. If we use a
push model, we could just follow the
[NoopResourceEstimatorProcess|https://github.com/apache/mesos/blob/master/src/slave/resource_estimator.cpp#L52]
implementation in the code. Basically, we fork a libprocess and invoke the
registered callback every 10 seconds with 2 cpus.

Now, if we use a pull model, we first need to make an assumption that the slave
pull the resource estimator as fast as it can without any delay. If there's a
delay say 1 second, the resource estimator needs to adjust its internal delay
to be 9 seconds so that the total interval between two estimations is 10
seconds apart. When implementing the `FutureResources oversubscribed()`
interface, the module writer needs to make another assumption about the slave
that the slave will not invoke the interface again if the previous estimation
is still pending. This is important because otherwise, the module writer needs
to maintain a list of Promises (instead of just one). I just feels that
there're so many implicit assumptions that the module writer needs to make in a
pull model.

Regarding (2), as I already stated in this ticket, since the slave invoked the
interface ('oversubscribed()') in its context, the module writer needs to make
sure the implementation of the interface does not block, otherwise the slave
will hang. An alternative is to use 'async' while invoking the interface in the
slave. I just feel this is rather not necessary if we use a push model.

Change the interaction between the slave and the resource estimator from
polling to pushing

Key: MESOS-2735
URL: https://issues.apache.org/jira/browse/MESOS-2735
Project: Mesos
Issue Type: Bug
Reporter: Jie Yu
Assignee: Jie Yu
Labels: twitter

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2748) /help generated links point to wrong URLs


 [ 
https://issues.apache.org/jira/browse/MESOS-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Lunøe updated MESOS-2748:
-
Description: 
As reported by Michael Lunøe mlu...@mesosphere.io (see also MESOS-329 and 
MESOS-913 for background):

{quote}
In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, which 
is then converted to html through a javascript library 

All endpoints point to {{/help/...}}, they need to work dynamically for reverse 
proxy to do its thing. {{/mesos/help}} works, and displays the endpoints, but 
they each need to go to their respective {{/help/...}} endpoint. 

Note that this needs to work both for master, and for slaves. I think the route 
to slaves help is something like this: 
{{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
double check this.
{quote}

The fix appears to be not too complex (as it would require to simply manipulate 
the generated URL) but a quick skim of the code would suggest that something 
more substantial may be desirable too.

  was:
As reported by Michael Lunøe mlu...@mesosphere.io (see also MESOS-329 and 
MESOS-913 for background):

{quote}
In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, which 
is then converted to html through a javascript library 

All endpoints point to {{/help/...}}, they need to work dynamically for reverse 
proxy to do its thing. {{/mesos/help}} works, and displays the endpoints, but 
they each need to go to their respective {{/mesos/help/...}} endpoint. 

Note that this needs to work both for master, and for slaves. I think the route 
to slaves help is something like this: 
{{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
double check this.
{quote}

The fix appears to be not too complex (as it would require to simply manipulate 
the generated URL) but a quick skim of the code would suggest that something 
more substantial may be desirable too.


 /help generated links point to wrong URLs
 -

 Key: MESOS-2748
 URL: https://issues.apache.org/jira/browse/MESOS-2748
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.22.1
Reporter: Marco Massenzio
Priority: Minor

 As reported by Michael Lunøe mlu...@mesosphere.io (see also MESOS-329 and 
 MESOS-913 for background):
 {quote}
 In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, 
 which is then converted to html through a javascript library 
 All endpoints point to {{/help/...}}, they need to work dynamically for 
 reverse proxy to do its thing. {{/mesos/help}} works, and displays the 
 endpoints, but they each need to go to their respective {{/help/...}} 
 endpoint. 
 Note that this needs to work both for master, and for slaves. I think the 
 route to slaves help is something like this: 
 {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
 double check this.
 {quote}
 The fix appears to be not too complex (as it would require to simply 
 manipulate the generated URL) but a quick skim of the code would suggest that 
 something more substantial may be desirable too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2748) /help generated links point to wrong URLs


[ 
https://issues.apache.org/jira/browse/MESOS-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550809#comment-14550809
 ] 

Michael Lunøe commented on MESOS-2748:
--

[~haosd...@gmail.com] Yes, the problem is exactly the use of absolute paths. 
/mesos/help works (showing a page with urls), but the urls listed are 
absolute paths, i.e. /help/metrics or /help/__processes__. If it were to 
use relative paths, so they would show correct paths: /mesos/help/metrics and 
 /mesos/help/__processes__ in stead. Does that answer your question?

 /help generated links point to wrong URLs
 -

 Key: MESOS-2748
 URL: https://issues.apache.org/jira/browse/MESOS-2748
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.22.1
Reporter: Marco Massenzio
Priority: Minor

 As reported by Michael Lunøe mlu...@mesosphere.io (see also MESOS-329 and 
 MESOS-913 for background):
 {quote}
 In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, 
 which is then converted to html through a javascript library 
 All endpoints point to {{/help/...}}, they need to work dynamically for 
 reverse proxy to do its thing. {{/mesos/help}} works, and displays the 
 endpoints, but they each need to go to their respective {{/help/...}} 
 endpoint. 
 Note that this needs to work both for master, and for slaves. I think the 
 route to slaves help is something like this: 
 {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
 double check this.
 {quote}
 The fix appears to be not too complex (as it would require to simply 
 manipulate the generated URL) but a quick skim of the code would suggest that 
 something more substantial may be desirable too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-2748) /help generated links point to wrong URLs


[ 
https://issues.apache.org/jira/browse/MESOS-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550809#comment-14550809
 ] 

Michael Lunøe edited comment on MESOS-2748 at 5/19/15 5:22 PM:
---

[~haosd...@gmail.com] Yes, the problem is exactly the use of absolute paths. 
{{/mesos/help}} works (showing a page with urls), but the urls listed are 
absolute paths, i.e. {{/help/metrics}} or {{/help/__processes__}}. If it were 
to use relative paths, so they would show correct paths: 
{{/mesos/help/metrics}} and  {{/mesos/help/__processes__}} in stead. Does that 
answer your question?


was (Author: mlunoe):
[~haosd...@gmail.com] Yes, the problem is exactly the use of absolute paths. 
/mesos/help works (showing a page with urls), but the urls listed are 
absolute paths, i.e. /help/metrics or /help/__processes__. If it were to 
use relative paths, so they would show correct paths: /mesos/help/metrics and 
 /mesos/help/__processes__ in stead. Does that answer your question?

 /help generated links point to wrong URLs
 -

 Key: MESOS-2748
 URL: https://issues.apache.org/jira/browse/MESOS-2748
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.22.1
Reporter: Marco Massenzio
Priority: Minor

 As reported by Michael Lunøe mlu...@mesosphere.io (see also MESOS-329 and 
 MESOS-913 for background):
 {quote}
 In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, 
 which is then converted to html through a javascript library 
 All endpoints point to {{/help/...}}, they need to work dynamically for 
 reverse proxy to do its thing. {{/mesos/help}} works, and displays the 
 endpoints, but they each need to go to their respective {{/help/...}} 
 endpoint. 
 Note that this needs to work both for master, and for slaves. I think the 
 route to slaves help is something like this: 
 {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
 double check this.
 {quote}
 The fix appears to be not too complex (as it would require to simply 
 manipulate the generated URL) but a quick skim of the code would suggest that 
 something more substantial may be desirable too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-2748) /help generated links point to wrong URLs


[ 
https://issues.apache.org/jira/browse/MESOS-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550809#comment-14550809
 ] 

Michael Lunøe edited comment on MESOS-2748 at 5/19/15 5:25 PM:
---

[~haosd...@gmail.com] Yes, the problem is exactly the use of absolute paths. 
{{/mesos/help}} works (showing a page with urls), but the urls listed are 
absolute paths, i.e. {{/help/metrics}} or {{/help/\_\_processess\_\_}}. If it 
were to use relative paths, so they would show correct paths: 
{{/mesos/help/metrics}} and  {{/mesos/help/\_\_processess\_\_}} in stead. Does 
that answer your question?


was (Author: mlunoe):
[~haosd...@gmail.com] Yes, the problem is exactly the use of absolute paths. 
{{/mesos/help}} works (showing a page with urls), but the urls listed are 
absolute paths, i.e. {{/help/metrics}} or {{/help/__processes__}}. If it were 
to use relative paths, so they would show correct paths: 
{{/mesos/help/metrics}} and  {{/mesos/help/__processes__}} in stead. Does that 
answer your question?

 /help generated links point to wrong URLs
 -

 Key: MESOS-2748
 URL: https://issues.apache.org/jira/browse/MESOS-2748
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.22.1
Reporter: Marco Massenzio
Priority: Minor

 As reported by Michael Lunøe mlu...@mesosphere.io (see also MESOS-329 and 
 MESOS-913 for background):
 {quote}
 In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, 
 which is then converted to html through a javascript library 
 All endpoints point to {{/help/...}}, they need to work dynamically for 
 reverse proxy to do its thing. {{/mesos/help}} works, and displays the 
 endpoints, but they each need to go to their respective {{/help/...}} 
 endpoint. 
 Note that this needs to work both for master, and for slaves. I think the 
 route to slaves help is something like this: 
 {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
 double check this.
 {quote}
 The fix appears to be not too complex (as it would require to simply 
 manipulate the generated URL) but a quick skim of the code would suggest that 
 something more substantial may be desirable too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2741) Exposing Resources along with ResourceStatistics from resource monitor


[ 
https://issues.apache.org/jira/browse/MESOS-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550825#comment-14550825
 ] 

haosdent commented on MESOS-2741:
-

And form implement this issue, how about change the interface in Container from 
{code}
virtual process::FutureResourceStatistics usage(
  const ContainerID containerId) = 0;
{code}

to

{code}
virtual process::FutureResourceMonitor::Usage usage(
  const ContainerID containerId) = 0;
{code}

 Exposing Resources along with ResourceStatistics from resource monitor
 --

 Key: MESOS-2741
 URL: https://issues.apache.org/jira/browse/MESOS-2741
 Project: Mesos
  Issue Type: Bug
Reporter: Jie Yu
  Labels: mesosphere, twitter

 Right now, the resource monitor returns a Usage which contains ContainerId, 
 ExecutorInfo and ResourceStatistics. In order for resource estimator/qos 
 controller to calculate usage slack, or tell if a container is using 
 revokable resources or not, we need to expose the Resources that are 
 currently assigned to the container.
 This requires us the change the containerizer interface to get the Resources 
 as well while calling 'usage()'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2636) Segfault in inline TryIP getIP(const std::string hostname, int family)

2015-05-19 Thread Chi Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551289#comment-14551289
 ] 

Chi Zhang commented on MESOS-2636:
--

https://reviews.apache.org/r/34438/

did a round of grepping. getIP and hostname are the only 2 places use 
freeaddrinfo.

 Segfault in inline TryIP getIP(const std::string hostname, int family)
 -

 Key: MESOS-2636
 URL: https://issues.apache.org/jira/browse/MESOS-2636
 Project: Mesos
  Issue Type: Bug
Reporter: Chi Zhang
Assignee: Chi Zhang
  Labels: twitter
 Fix For: 0.23.0


 We saw a segfault in production. Attaching the coredump, we see:
 Core was generated by `/usr/local/sbin/mesos-slave --port=5051 
 --resources=cpus:23;mem:70298;ports:[31'.
 Program terminated with signal 11, Segmentation fault.
 #0  0x7f639867c77e in free () from /lib64/libc.so.6
 (gdb) bt
 #0  0x7f639867c77e in free () from /lib64/libc.so.6
 #1  0x7f63986c25d0 in freeaddrinfo () from /lib64/libc.so.6
 #2  0x7f6399deeafa in net::getIP (hostname=redacted, family=2) at 
 ./3rdparty/stout/include/stout/net.hpp:201
 #3  0x7f6399e1f273 in process::initialize (delegate=Unhandled dwarf 
 expression opcode 0xf3
 ) at src/process.cpp:837
 #4  0x0042342f in main ()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2636) Segfault in inline TryIP getIP(const std::string hostname, int family)

2015-05-19 Thread Benjamin Mahler (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551337#comment-14551337
 ] 

Benjamin Mahler commented on MESOS-2636:


net::hostname fix was committed:

{noformat}
commit 08e11d372afbb66907130998b485c185687fae34
Author: Chi Zhang chzhc...@gmail.com
Date:   Tue May 19 15:03:23 2015 -0700

Removed bad call to freeaddrinfo in net::hostname.

Review: https://reviews.apache.org/r/34438
{noformat}

 Segfault in inline TryIP getIP(const std::string hostname, int family)
 -

 Key: MESOS-2636
 URL: https://issues.apache.org/jira/browse/MESOS-2636
 Project: Mesos
  Issue Type: Bug
Reporter: Chi Zhang
Assignee: Chi Zhang
  Labels: twitter
 Fix For: 0.23.0


 We saw a segfault in production. Attaching the coredump, we see:
 Core was generated by `/usr/local/sbin/mesos-slave --port=5051 
 --resources=cpus:23;mem:70298;ports:[31'.
 Program terminated with signal 11, Segmentation fault.
 #0  0x7f639867c77e in free () from /lib64/libc.so.6
 (gdb) bt
 #0  0x7f639867c77e in free () from /lib64/libc.so.6
 #1  0x7f63986c25d0 in freeaddrinfo () from /lib64/libc.so.6
 #2  0x7f6399deeafa in net::getIP (hostname=redacted, family=2) at 
 ./3rdparty/stout/include/stout/net.hpp:201
 #3  0x7f6399e1f273 in process::initialize (delegate=Unhandled dwarf 
 expression opcode 0xf3
 ) at src/process.cpp:837
 #4  0x0042342f in main ()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources


[ 
https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551381#comment-14551381
 ] 

Ian Downes commented on MESOS-2652:
---

Borg does prod and non-prod as coarse prioritization bands but supports 
different priorities within each.

 Update Mesos containerizer to understand revocable cpu resources
 

 Key: MESOS-2652
 URL: https://issues.apache.org/jira/browse/MESOS-2652
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Ian Downes
  Labels: twitter

 The CPU isolator needs to properly set limits for revocable and non-revocable 
 containers.
 The proposed strategy is to use a two-way split of the cpu cgroup hierarchy 
 -- normal (non-revocable) and low priority (revocable) subtrees -- and to use 
 a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split 
 (TBD). Containers would be present in only one of the subtrees. CFS quotas 
 will *not* be set on subtree roots, only cpu.shares. Each container would set 
 CFS quota and shares as done currently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-2254) Posix CPU isolator usage call introduce high cpu load


[ 
https://issues.apache.org/jira/browse/MESOS-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551142#comment-14551142
 ] 

Ian Downes edited comment on MESOS-2254 at 5/19/15 8:18 PM:


I presume he's referring to the slave flag {{--resource_monitoring_interval}} 
which currently defaults to {{RESOURCE_MONITORING_INTERVAL = Seconds(1)}} but 
which [~nnielsen] has marked as soon to be deprecated.
{noformat}
  // TODO(nnielsen): Deprecate resource_monitoring_interval flag after
  // Mesos 0.23.0.
  Duration resource_monitoring_interval;
{noformat}
In the meantime, if this is causing performance issues then you could set 
{{--resource_monitoring_interval}} to something longer than the default.


was (Author: idownes):
I presume he's referring to the slave flag {{--resource_monitoring_interval}} 
which currently defaults to {{RESOURCE_MONITORING_INTERVAL = Seconds(1)}} but 
which [~nnielsen] has marked as soon to be deprecated.
{noformat}
  // TODO(nnielsen): Deprecate resource_monitoring_interval flag after
  // Mesos 0.23.0.
  Duration resource_monitoring_interval;
{noformat}
In the meantime, if this is causing performance issues then you could set 
{{--resource_monitoring_internal}} to something longer than the default.

 Posix CPU isolator usage call introduce high cpu load
 -

 Key: MESOS-2254
 URL: https://issues.apache.org/jira/browse/MESOS-2254
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen

 With more than 20 executors running on a slave with the posix isolator, we 
 have seen an very high cpu load (over 200%).
 From profiling one thread (there were two, taking up all the cpu time. The 
 total CPU time was over 200%):
 {code}
 Running Time  SelfSymbol Name
 27133.0ms   47.8% 0.0 _pthread_body  0x1adb50
 27133.0ms   47.8% 0.0  thread_start
 27133.0ms   47.8% 0.0   _pthread_start
 27133.0ms   47.8% 0.0_pthread_body
 27133.0ms   47.8% 0.0 process::schedule(void*)
 27133.0ms   47.8% 2.0  
 process::ProcessManager::resume(process::ProcessBase*)
 27126.0ms   47.8% 1.0   
 process::ProcessBase::serve(process::Event const)
 27125.0ms   47.8% 0.0
 process::DispatchEvent::visit(process::EventVisitor*) const
 27125.0ms   47.8% 0.0 
 process::ProcessBase::visit(process::DispatchEvent const)
 27125.0ms   47.8% 0.0  std::__1::functionvoid 
 (process::ProcessBase*)::operator()(process::ProcessBase*) const
 27124.0ms   47.8% 0.0   
 std::__1::__function::__funcprocess::Futuremesos::ResourceStatistics 
 process::dispatchmesos::ResourceStatistics, 
 mesos::internal::slave::IsolatorProcess, mesos::ContainerID const, 
 mesos::ContainerID(process::PIDmesos::internal::slave::IsolatorProcess 
 const, process::Futuremesos::ResourceStatistics 
 (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const), 
 mesos::ContainerID)::'lambda'(process::ProcessBase*), 
 std::__1::allocatorprocess::Futuremesos::ResourceStatistics 
 process::dispatchmesos::ResourceStatistics, 
 mesos::internal::slave::IsolatorProcess, mesos::ContainerID const, 
 mesos::ContainerID(process::PIDmesos::internal::slave::IsolatorProcess 
 const, process::Futuremesos::ResourceStatistics 
 (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const), 
 mesos::ContainerID)::'lambda'(process::ProcessBase*), void 
 (process::ProcessBase*)::operator()(process::ProcessBase*)
 27124.0ms   47.8% 1.0
 process::Futuremesos::ResourceStatistics 
 process::dispatchmesos::ResourceStatistics, 
 mesos::internal::slave::IsolatorProcess, mesos::ContainerID const, 
 mesos::ContainerID(process::PIDmesos::internal::slave::IsolatorProcess 
 const, process::Futuremesos::ResourceStatistics 
 (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const), 
 mesos::ContainerID)::'lambda'(process::ProcessBase*)::operator()(process::ProcessBase*)
  const
 27060.0ms   47.7% 1.0 
 mesos::internal::slave::PosixCpuIsolatorProcess::usage(mesos::ContainerID 
 const)
 27046.0ms   47.7% 2.0  
 mesos::internal::usage(int, bool, bool)
 27023.0ms   47.6% 2.0   os::pstree(Optionint)
 26748.0ms   47.1% 23.0   os::processes()
 24809.0ms   43.7% 349.0   os::process(int)
 8199.0ms   14.4%  47.0 os::sysctl::string() 
 const
 7562.0ms   13.3%  7562.0__sysctl
 {code}
 We could see that usage() in usage/usage.cpp is causing this.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (MESOS-2254) Posix CPU isolator usage call introduce high cpu load


[ 
https://issues.apache.org/jira/browse/MESOS-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551142#comment-14551142
 ] 

Ian Downes commented on MESOS-2254:
---

I presume he's referring to the slave flag {{--resource_monitoring_interval}} 
which currently defaults to {{RESOURCE_MONITORING_INTERVAL = Seconds(1)}} but 
which [~nnielsen] has marked as soon to be deprecated.
{noformat}
  // TODO(nnielsen): Deprecate resource_monitoring_interval flag after
  // Mesos 0.23.0.
  Duration resource_monitoring_interval;
{noformat}
In the meantime, if this is causing performance issues then you could set 
{{--resource_monitoring_internal}} to something longer than the default.

 Posix CPU isolator usage call introduce high cpu load
 -

 Key: MESOS-2254
 URL: https://issues.apache.org/jira/browse/MESOS-2254
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen

 With more than 20 executors running on a slave with the posix isolator, we 
 have seen an very high cpu load (over 200%).
 From profiling one thread (there were two, taking up all the cpu time. The 
 total CPU time was over 200%):
 {code}
 Running Time  SelfSymbol Name
 27133.0ms   47.8% 0.0 _pthread_body  0x1adb50
 27133.0ms   47.8% 0.0  thread_start
 27133.0ms   47.8% 0.0   _pthread_start
 27133.0ms   47.8% 0.0_pthread_body
 27133.0ms   47.8% 0.0 process::schedule(void*)
 27133.0ms   47.8% 2.0  
 process::ProcessManager::resume(process::ProcessBase*)
 27126.0ms   47.8% 1.0   
 process::ProcessBase::serve(process::Event const)
 27125.0ms   47.8% 0.0
 process::DispatchEvent::visit(process::EventVisitor*) const
 27125.0ms   47.8% 0.0 
 process::ProcessBase::visit(process::DispatchEvent const)
 27125.0ms   47.8% 0.0  std::__1::functionvoid 
 (process::ProcessBase*)::operator()(process::ProcessBase*) const
 27124.0ms   47.8% 0.0   
 std::__1::__function::__funcprocess::Futuremesos::ResourceStatistics 
 process::dispatchmesos::ResourceStatistics, 
 mesos::internal::slave::IsolatorProcess, mesos::ContainerID const, 
 mesos::ContainerID(process::PIDmesos::internal::slave::IsolatorProcess 
 const, process::Futuremesos::ResourceStatistics 
 (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const), 
 mesos::ContainerID)::'lambda'(process::ProcessBase*), 
 std::__1::allocatorprocess::Futuremesos::ResourceStatistics 
 process::dispatchmesos::ResourceStatistics, 
 mesos::internal::slave::IsolatorProcess, mesos::ContainerID const, 
 mesos::ContainerID(process::PIDmesos::internal::slave::IsolatorProcess 
 const, process::Futuremesos::ResourceStatistics 
 (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const), 
 mesos::ContainerID)::'lambda'(process::ProcessBase*), void 
 (process::ProcessBase*)::operator()(process::ProcessBase*)
 27124.0ms   47.8% 1.0
 process::Futuremesos::ResourceStatistics 
 process::dispatchmesos::ResourceStatistics, 
 mesos::internal::slave::IsolatorProcess, mesos::ContainerID const, 
 mesos::ContainerID(process::PIDmesos::internal::slave::IsolatorProcess 
 const, process::Futuremesos::ResourceStatistics 
 (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const), 
 mesos::ContainerID)::'lambda'(process::ProcessBase*)::operator()(process::ProcessBase*)
  const
 27060.0ms   47.7% 1.0 
 mesos::internal::slave::PosixCpuIsolatorProcess::usage(mesos::ContainerID 
 const)
 27046.0ms   47.7% 2.0  
 mesos::internal::usage(int, bool, bool)
 27023.0ms   47.6% 2.0   os::pstree(Optionint)
 26748.0ms   47.1% 23.0   os::processes()
 24809.0ms   43.7% 349.0   os::process(int)
 8199.0ms   14.4%  47.0 os::sysctl::string() 
 const
 7562.0ms   13.3%  7562.0__sysctl
 {code}
 We could see that usage() in usage/usage.cpp is causing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2665) Fix queuing discipline wrapper in linux/routing/queueing


[ 
https://issues.apache.org/jira/browse/MESOS-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551270#comment-14551270
 ] 

Paul Brett commented on MESOS-2665:
---

Added:

https://reviews.apache.org/r/34426/

 Fix queuing discipline wrapper in linux/routing/queueing 
 -

 Key: MESOS-2665
 URL: https://issues.apache.org/jira/browse/MESOS-2665
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Reporter: Paul Brett
Assignee: Paul Brett
Priority: Critical

 qdisc search function is dependent on matching a single hard coded handle and 
 does not correctly test for interface, making the implementation fragile.  
 Additionally, the current setup scripts (using dynamically created shell 
 commands) do not match the hard coded handles.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-2752) Add HTB queueing discipline wrapper class

Paul Brett created MESOS-2752:
-

 Summary: Add HTB queueing discipline wrapper class
 Key: MESOS-2752
 URL: https://issues.apache.org/jira/browse/MESOS-2752
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett


Network isolator uses a Hierarchical Token Bucket (HTB) traffic control 
discipline on the egress filter inside each container as the root for adding 
traffic filters.  A HTB wrapper is needed to access the network statistics for 
this interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2044) Use one IP address per container for network isolation


[ 
https://issues.apache.org/jira/browse/MESOS-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551326#comment-14551326
 ] 

Ian Downes commented on MESOS-2044:
---

This JIRA is intended to address a single IP per container which is shared by 
the executor and all tasks within the container and is different to the host's. 
That's a very valid requirement though so please raise a separate ticket.

 Use one IP address per container for network isolation
 --

 Key: MESOS-2044
 URL: https://issues.apache.org/jira/browse/MESOS-2044
 Project: Mesos
  Issue Type: Epic
Reporter: Cong Wang

 If there are enough IP addresses, either IPv4 or IPv6, we should use one IP 
 address per container, instead of the ugly port range based solution. One 
 problem with this is the IP address management, usually it is managed by a 
 DHCP server, maybe we need to manage them in mesos master/slave.
 Also, maybe use macvlan instead of veth for better isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2254) Posix CPU isolator usage call introduce high cpu load

2015-05-19 Thread Daniel Nugent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551127#comment-14551127
 ] 

Daniel Nugent commented on MESOS-2254:
--

[~idownes] In that case, do you know what the rate limiting is that [~nnielsen] 
referred to?

 Posix CPU isolator usage call introduce high cpu load
 -

 Key: MESOS-2254
 URL: https://issues.apache.org/jira/browse/MESOS-2254
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen

 With more than 20 executors running on a slave with the posix isolator, we 
 have seen an very high cpu load (over 200%).
 From profiling one thread (there were two, taking up all the cpu time. The 
 total CPU time was over 200%):
 {code}
 Running Time  SelfSymbol Name
 27133.0ms   47.8% 0.0 _pthread_body  0x1adb50
 27133.0ms   47.8% 0.0  thread_start
 27133.0ms   47.8% 0.0   _pthread_start
 27133.0ms   47.8% 0.0_pthread_body
 27133.0ms   47.8% 0.0 process::schedule(void*)
 27133.0ms   47.8% 2.0  
 process::ProcessManager::resume(process::ProcessBase*)
 27126.0ms   47.8% 1.0   
 process::ProcessBase::serve(process::Event const)
 27125.0ms   47.8% 0.0
 process::DispatchEvent::visit(process::EventVisitor*) const
 27125.0ms   47.8% 0.0 
 process::ProcessBase::visit(process::DispatchEvent const)
 27125.0ms   47.8% 0.0  std::__1::functionvoid 
 (process::ProcessBase*)::operator()(process::ProcessBase*) const
 27124.0ms   47.8% 0.0   
 std::__1::__function::__funcprocess::Futuremesos::ResourceStatistics 
 process::dispatchmesos::ResourceStatistics, 
 mesos::internal::slave::IsolatorProcess, mesos::ContainerID const, 
 mesos::ContainerID(process::PIDmesos::internal::slave::IsolatorProcess 
 const, process::Futuremesos::ResourceStatistics 
 (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const), 
 mesos::ContainerID)::'lambda'(process::ProcessBase*), 
 std::__1::allocatorprocess::Futuremesos::ResourceStatistics 
 process::dispatchmesos::ResourceStatistics, 
 mesos::internal::slave::IsolatorProcess, mesos::ContainerID const, 
 mesos::ContainerID(process::PIDmesos::internal::slave::IsolatorProcess 
 const, process::Futuremesos::ResourceStatistics 
 (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const), 
 mesos::ContainerID)::'lambda'(process::ProcessBase*), void 
 (process::ProcessBase*)::operator()(process::ProcessBase*)
 27124.0ms   47.8% 1.0
 process::Futuremesos::ResourceStatistics 
 process::dispatchmesos::ResourceStatistics, 
 mesos::internal::slave::IsolatorProcess, mesos::ContainerID const, 
 mesos::ContainerID(process::PIDmesos::internal::slave::IsolatorProcess 
 const, process::Futuremesos::ResourceStatistics 
 (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const), 
 mesos::ContainerID)::'lambda'(process::ProcessBase*)::operator()(process::ProcessBase*)
  const
 27060.0ms   47.7% 1.0 
 mesos::internal::slave::PosixCpuIsolatorProcess::usage(mesos::ContainerID 
 const)
 27046.0ms   47.7% 2.0  
 mesos::internal::usage(int, bool, bool)
 27023.0ms   47.6% 2.0   os::pstree(Optionint)
 26748.0ms   47.1% 23.0   os::processes()
 24809.0ms   43.7% 349.0   os::process(int)
 8199.0ms   14.4%  47.0 os::sysctl::string() 
 const
 7562.0ms   13.3%  7562.0__sysctl
 {code}
 We could see that usage() in usage/usage.cpp is causing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-2753) Enforce revocable CPU invariant in Master

Ian Downes created MESOS-2753:
-

 Summary: Enforce revocable CPU invariant in Master
 Key: MESOS-2753
 URL: https://issues.apache.org/jira/browse/MESOS-2753
 Project: Mesos
  Issue Type: Task
  Components: isolation, master
Affects Versions: 0.23.0
Reporter: Ian Downes


Current implementation out for [review|https://reviews.apache.org/r/34310] only 
supports setting the priority of containers with revocable CPU if it's 
specified in the initial executor info resources. This should be enforced at 
the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2254) Posix CPU isolator usage call introduce high cpu load


[ 
https://issues.apache.org/jira/browse/MESOS-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551118#comment-14551118
 ] 

Ian Downes commented on MESOS-2254:
---

[~nugend] No, --perf_interval is just for the perf isolator which uses a 
perf_event cgroup to efficiently run perf against a container. Unrelated to 
this.

 Posix CPU isolator usage call introduce high cpu load
 -

 Key: MESOS-2254
 URL: https://issues.apache.org/jira/browse/MESOS-2254
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen

 With more than 20 executors running on a slave with the posix isolator, we 
 have seen an very high cpu load (over 200%).
 From profiling one thread (there were two, taking up all the cpu time. The 
 total CPU time was over 200%):
 {code}
 Running Time  SelfSymbol Name
 27133.0ms   47.8% 0.0 _pthread_body  0x1adb50
 27133.0ms   47.8% 0.0  thread_start
 27133.0ms   47.8% 0.0   _pthread_start
 27133.0ms   47.8% 0.0_pthread_body
 27133.0ms   47.8% 0.0 process::schedule(void*)
 27133.0ms   47.8% 2.0  
 process::ProcessManager::resume(process::ProcessBase*)
 27126.0ms   47.8% 1.0   
 process::ProcessBase::serve(process::Event const)
 27125.0ms   47.8% 0.0
 process::DispatchEvent::visit(process::EventVisitor*) const
 27125.0ms   47.8% 0.0 
 process::ProcessBase::visit(process::DispatchEvent const)
 27125.0ms   47.8% 0.0  std::__1::functionvoid 
 (process::ProcessBase*)::operator()(process::ProcessBase*) const
 27124.0ms   47.8% 0.0   
 std::__1::__function::__funcprocess::Futuremesos::ResourceStatistics 
 process::dispatchmesos::ResourceStatistics, 
 mesos::internal::slave::IsolatorProcess, mesos::ContainerID const, 
 mesos::ContainerID(process::PIDmesos::internal::slave::IsolatorProcess 
 const, process::Futuremesos::ResourceStatistics 
 (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const), 
 mesos::ContainerID)::'lambda'(process::ProcessBase*), 
 std::__1::allocatorprocess::Futuremesos::ResourceStatistics 
 process::dispatchmesos::ResourceStatistics, 
 mesos::internal::slave::IsolatorProcess, mesos::ContainerID const, 
 mesos::ContainerID(process::PIDmesos::internal::slave::IsolatorProcess 
 const, process::Futuremesos::ResourceStatistics 
 (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const), 
 mesos::ContainerID)::'lambda'(process::ProcessBase*), void 
 (process::ProcessBase*)::operator()(process::ProcessBase*)
 27124.0ms   47.8% 1.0
 process::Futuremesos::ResourceStatistics 
 process::dispatchmesos::ResourceStatistics, 
 mesos::internal::slave::IsolatorProcess, mesos::ContainerID const, 
 mesos::ContainerID(process::PIDmesos::internal::slave::IsolatorProcess 
 const, process::Futuremesos::ResourceStatistics 
 (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const), 
 mesos::ContainerID)::'lambda'(process::ProcessBase*)::operator()(process::ProcessBase*)
  const
 27060.0ms   47.7% 1.0 
 mesos::internal::slave::PosixCpuIsolatorProcess::usage(mesos::ContainerID 
 const)
 27046.0ms   47.7% 2.0  
 mesos::internal::usage(int, bool, bool)
 27023.0ms   47.6% 2.0   os::pstree(Optionint)
 26748.0ms   47.1% 23.0   os::processes()
 24809.0ms   43.7% 349.0   os::process(int)
 8199.0ms   14.4%  47.0 os::sysctl::string() 
 const
 7562.0ms   13.3%  7562.0__sysctl
 {code}
 We could see that usage() in usage/usage.cpp is causing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2748) /help generated links point to wrong URLs


[ 
https://issues.apache.org/jira/browse/MESOS-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550642#comment-14550642
 ] 

haosdent commented on MESOS-2748:
-

Hi, [~marco-mesos]. I am sorry for could not got your idea here. Do you mean 
/help endpoint is a absolute path and could not work when user want to show 
as /mesos/help after reverse proxy? In nginx, could add a rewrite rule to 
solve this problem.  

 /help generated links point to wrong URLs
 -

 Key: MESOS-2748
 URL: https://issues.apache.org/jira/browse/MESOS-2748
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.22.1
Reporter: Marco Massenzio
Priority: Minor

 As reported by Michael Lunøe mlu...@mesosphere.io (see also MESOS-329 and 
 MESOS-913 for background):
 {quote}
 In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, 
 which is then converted to html through a javascript library 
 All endpoints point to {{/help/...}}, they need to work dynamically for 
 reverse proxy to do its thing. {{/mesos/help}} works, and displays the 
 endpoints, but they each need to go to their respective {{/mesos/help/...}} 
 endpoint. 
 Note that this needs to work both for master, and for slaves. I think the 
 route to slaves help is something like this: 
 {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
 double check this.
 {quote}
 The fix appears to be not too complex (as it would require to simply 
 manipulate the generated URL) but a quick skim of the code would suggest that 
 something more substantial may be desirable too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2340) Publish JSON in ZK instead of serialized MasterInfo

2015-05-19 Thread Marco Massenzio (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550654#comment-14550654
 ] 

Marco Massenzio commented on MESOS-2340:


I'm not familiar with the {{multi}} operation, however, thinking a bit more 
about this, it turns out the solution should be simpler: post ephemeral node 
creation, create another mirror JSON-content znode, equally ephemeral, that 
will go away whenever the original PB-content znode does.

This seems a simple enough approach (and, as such, I'm sure I'm overlooking 
something!)

I'm looking into the code, and it seems to me that the 
{{GroupProcess::doJoin()}} is the place to do this (maybe?)

 Publish JSON in ZK instead of serialized MasterInfo
 ---

 Key: MESOS-2340
 URL: https://issues.apache.org/jira/browse/MESOS-2340
 Project: Mesos
  Issue Type: Improvement
Reporter: Zameer Manji
Assignee: haosdent

 Currently to discover the master a client needs the ZK node location and 
 access to the MasterInfo protobuf so it can deserialize the binary blob in 
 the node.
 I think it would be nice to publish JSON (like Twitter's ServerSets) so 
 clients are not tied to protobuf to do service discovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-2748) /help generated links point to wrong URLs


 [ 
https://issues.apache.org/jira/browse/MESOS-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-2748:
---

Assignee: haosdent

 /help generated links point to wrong URLs
 -

 Key: MESOS-2748
 URL: https://issues.apache.org/jira/browse/MESOS-2748
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.22.1
Reporter: Marco Massenzio
Assignee: haosdent
Priority: Minor

 As reported by Michael Lunøe mlu...@mesosphere.io (see also MESOS-329 and 
 MESOS-913 for background):
 {quote}
 In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, 
 which is then converted to html through a javascript library 
 All endpoints point to {{/help/...}}, they need to work dynamically for 
 reverse proxy to do its thing. {{/mesos/help}} works, and displays the 
 endpoints, but they each need to go to their respective {{/help/...}} 
 endpoint. 
 Note that this needs to work both for master, and for slaves. I think the 
 route to slaves help is something like this: 
 {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
 double check this.
 {quote}
 The fix appears to be not too complex (as it would require to simply 
 manipulate the generated URL) but a quick skim of the code would suggest that 
 something more substantial may be desirable too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2636) Segfault in inline TryIP getIP(const std::string hostname, int family)

2015-05-19 Thread Chi Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550913#comment-14550913
 ] 

Chi Zhang commented on MESOS-2636:
--

[~cnstar9988] thanks for reporting! I will submit a fix.

 Segfault in inline TryIP getIP(const std::string hostname, int family)
 -

 Key: MESOS-2636
 URL: https://issues.apache.org/jira/browse/MESOS-2636
 Project: Mesos
  Issue Type: Bug
Reporter: Chi Zhang
Assignee: Chi Zhang
  Labels: twitter
 Fix For: 0.23.0


 We saw a segfault in production. Attaching the coredump, we see:
 Core was generated by `/usr/local/sbin/mesos-slave --port=5051 
 --resources=cpus:23;mem:70298;ports:[31'.
 Program terminated with signal 11, Segmentation fault.
 #0  0x7f639867c77e in free () from /lib64/libc.so.6
 (gdb) bt
 #0  0x7f639867c77e in free () from /lib64/libc.so.6
 #1  0x7f63986c25d0 in freeaddrinfo () from /lib64/libc.so.6
 #2  0x7f6399deeafa in net::getIP (hostname=redacted, family=2) at 
 ./3rdparty/stout/include/stout/net.hpp:201
 #3  0x7f6399e1f273 in process::initialize (delegate=Unhandled dwarf 
 expression opcode 0xf3
 ) at src/process.cpp:837
 #4  0x0042342f in main ()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2340) Publish JSON in ZK instead of serialized MasterInfo


[ 
https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550857#comment-14550857
 ] 

haosdent commented on MESOS-2340:
-

I think create the protobuf node and json node at the same time maybe more 
clear. And zookeeper have multi operation api. 
https://github.com/apache/zookeeper/blob/trunk/src/c/tests/TestMulti.cc#L282-L284
 But it is also OK for me if you decide add separate process to watch the node. 
I would try to implement this after you have a conclusion. :-)

 Publish JSON in ZK instead of serialized MasterInfo
 ---

 Key: MESOS-2340
 URL: https://issues.apache.org/jira/browse/MESOS-2340
 Project: Mesos
  Issue Type: Improvement
Reporter: Zameer Manji
Assignee: haosdent

 Currently to discover the master a client needs the ZK node location and 
 access to the MasterInfo protobuf so it can deserialize the binary blob in 
 the node.
 I think it would be nice to publish JSON (like Twitter's ServerSets) so 
 clients are not tied to protobuf to do service discovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2748) /help generated links point to wrong URLs