[jira] [Commented] (MESOS-2735) Change the interaction between the slave and the resource estimator from polling to pushing

2015-05-19 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551804#comment-14551804
 ] 

Vinod Kone commented on MESOS-2735:
---

Thanks Ben for the comments.

I think the main motivations for the push model were to 1) make the writing of 
the slave logic (interfacing with estimator) simple and 2) make the writing of 
estimator module simple.

Originally, with the pull model, it looked like we need to have 2 intervals 
within the slave: one for slave sending estimates to the master and one for 
slave getting estimates from the estimator. But if we assume that the 
estimators will be well behaved then we don't need an interval for the latter.

The other issue, as you discussed in your comment, was about DoS. It *looked* 
like both the push and pull model had the same scope for DoS on the slave, so 
we didn't find a compelling a reason to go for pull because push was easier to 
implement on both sides of the interface. I said *looked*, because after 
processing your comments, I realized that the DoS behavior is different in push 
vs pull. In a push model a misbehaving estimator could do head of line blocking 
of other messages enqueued on the slave's queue, whereas in the pull model head 
of line blocking is not possible because the next (deferred) pull will be 
enqueued behind all the other messages.

So, I'm ok going with pull for safety. Also, the composition argument can't be 
denied.

Btw, the inspiration for the push model came from the allocator (and to a 
lesser extent Mesos class) which I think is very close to the estimator in 
terms of interactions. [~jieyu], ok with this?


> Change the interaction between the slave and the resource estimator from 
> polling to pushing 
> 
>
> Key: MESOS-2735
> URL: https://issues.apache.org/jira/browse/MESOS-2735
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: twitter
>
> This will make the semantics more clear. The resource estimator can control 
> the speed of sending resources estimation to the slave.
> To avoid cyclic dependency, slave will register a callback with the resource 
> estimator and the resource estimator will simply invoke that callback when 
> there's a new estimation ready. The callback will be a defer to the slave's 
> main event queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-354) Oversubscribe resources

2015-05-19 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551732#comment-14551732
 ] 

Joris Van Remoortere commented on MESOS-354:


[~vinodkone] I think the allocator logic generally makes sense. I would just 
call out that we will likely want to treat revocable_available differently for 
resources coming from the resource estimator as opposed to optimistic offers. 
The reason for that is:
1) resource estimator updates are a "rude edit", as in they purely overwrite 
the revocable resources
2) resources from optimistic offers are increased / decreased based on 
allocation by the original owner of the resources.

The same way that we expect the Revocable resources to be flagged differently 
in the offer protobuf, I think we may want to either:
1) have separate pools of revocable resources available in the allocator for 
each source (lender?) of the resource OR
2) ensure that all revocable resources are introduced into the allocator the 
same way (as in rude edits, or deltas).

In general, though, I think the behavior is common between them.

What do you think?

> Oversubscribe resources
> ---
>
> Key: MESOS-354
> URL: https://issues.apache.org/jira/browse/MESOS-354
> Project: Mesos
>  Issue Type: Epic
>  Components: isolation, master, slave
>Reporter: brian wickman
>Priority: Minor
>  Labels: mesosphere, twitter
> Attachments: mesos_virtual_offers.pdf
>
>
> This proposal is predicated upon offer revocation.
> The idea would be to add a new "revoked" status either by (1) piggybacking 
> off an existing status update (TASK_LOST or TASK_KILLED) or (2) introducing a 
> new status update TASK_REVOKED.
> In order to augment an offer with metadata about revocability, there are 
> options:
>   1) Add a revocable boolean to the Offer and
> a) offer only one type of Offer per slave at a particular time
> b) offer both revocable and non-revocable resources at the same time but 
> require frameworks to understand that Offers can contain overlapping resources
>   2) Add a revocable_resources field on the Offer which is a superset of the 
> regular resources field.  By consuming > resources <= revocable_resources in 
> a launchTask, the Task becomes a revocable task.  If launching a task with < 
> resources, the Task is non-revocable.
> The use cases for revocable tasks are batch tasks (e.g. hadoop/pig/mapreduce) 
> and non-revocable tasks are online higher-SLA tasks (e.g. services.)
> Consider a non-revocable that asks for 4 cores, 8 GB RAM and 20 GB of disk.  
> One of these resources is a rate (4 cpu seconds per second) and two of them 
> are fixed values (8GB and 20GB respectively, though disk resources can be 
> further broken down into spindles - fixed - and iops - a rate.)  In practice, 
> these are the maximum resources in the respective dimensions that this task 
> will use.  In reality, we provision tasks at some factor below peak, and only 
> hit peak resource consumption in rare circumstances or perhaps at a diurnal 
> peak.  
> In the meantime, we stand to gain from offering the some constant factor of 
> the difference between (reserved - actual) of non-revocable tasks as 
> revocable resources, depending upon our tolerance for revocable task churn.  
> The main challenge is coming up with an accurate short / medium / long-term 
> prediction of resource consumption based upon current behavior.
> In many cases it would be OK to be sloppy:
>   * CPU / iops / network IO are rates (compressible) and can often be OK 
> below guarantees for brief periods of time while task revocation takes place
>   * Memory slack can be provided by enabling swap and dynamically setting 
> swap paging boundaries.  Should swap ever be activated, that would be a 
> signal to revoke.
> The master / allocator would piggyback on the slave heartbeat mechanism to 
> learn of the amount of revocable resources available at any point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2044) Use one IP address per container for network isolation

2015-05-19 Thread Swapnil Daingade (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551652#comment-14551652
 ] 

Swapnil Daingade commented on MESOS-2044:
-

We are trying to support network isolation between different YARN clusters 
running on Mesos as part of the Apache Myriad project. We tried using 
OpenVSwitch and Socketplane(Docker). See the design docs here.

https://github.com/mesos/myriad/issues/96
https://docs.google.com/document/d/1uV2V0cSTngVfWs-5pYm2b9gOCYF4WSNkyzj2dm3bRnw/pub


> Use one IP address per container for network isolation
> --
>
> Key: MESOS-2044
> URL: https://issues.apache.org/jira/browse/MESOS-2044
> Project: Mesos
>  Issue Type: Epic
>Reporter: Cong Wang
>
> If there are enough IP addresses, either IPv4 or IPv6, we should use one IP 
> address per container, instead of the ugly port range based solution. One 
> problem with this is the IP address management, usually it is managed by a 
> DHCP server, maybe we need to manage them in mesos master/slave.
> Also, maybe use macvlan instead of veth for better isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-994) Add an Option os::getenv() to stout

2015-05-19 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551599#comment-14551599
 ] 

Benjamin Mahler commented on MESOS-994:
---

[~tnachen] I saw you reviewed the first one, can you review the rest as well? :)

> Add an Option os::getenv() to stout
> ---
>
> Key: MESOS-994
> URL: https://issues.apache.org/jira/browse/MESOS-994
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout, technical debt
>Reporter: Ian Downes
>Assignee: Greg Mann
>  Labels: newbie
>
> This would replace the common pattern of:
> Option = os::hasenv() ? Option(os::getenv()) : None()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-2735) Change the interaction between the slave and the resource estimator from polling to pushing

2015-05-19 Thread Benjamin Hindman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Hindman updated MESOS-2735:

Comment: was deleted

(was: There are definitely differences in message queue behavior, one of which 
is significantly safer than the other. There are two safety concerns that I can 
think of, one of which [~jieyu] has addressed here but I'll repeat to be sure I 
properly understood.

(1) Someone might write a ResourceEstimator that isn't asynchronous, causing 
the slave to "block" while the resource estimator estimates.

(2) The ResourceEstimator might cause a denial of service attack on the slave.

I understand the concern with (1) but I'm not too anxious about it. Why? It 
should be trivial to make a wrapper module which forces people to implement the 
ResourceEstimator to be asynchronous, either using `async` like you suggested 
or implementing a version of ResourceEstimator which wraps an actor (libprocess 
process). We'll only need to do this once and then other ResourceEstimator 
implementations can leverage this stuff.

On the other hand, I don't like the behavior of push because of (2). 
Fundamentally, if the slave can't keep up with the rate at which the 
ResourceEstimator is pushing then we could create a denial of service issue 
with the slave, i.e., it takes a long time to process non-ResourceEstimator 
messages because it's queue is full of just ResourceEstimator messages. I'm 
more anxious about (2) than (1) because it's harder to find bugs in (2) than 
with (1) since once you fix (1) it stays fixed forever but any time you updated 
the algorithm you impact the potential to cause (2).

Now, I acknowledge that implementing this as a pull versus push will make the 
implementation in the ResourceEstimator slightly more complicated, but not 
really. In particular, it should be trivial to always use a `Queue` to achieve 
the push semantics in any ResourceEstimator implementation, while still 
providing the pull semantics externally. Make sense?

Finally, one of the advantages of the pull model is that it's easier to reason 
about because we don't have "anonymous" lambdas that cause execution in some 
other random place in the code (i.e., you can easily see in the slave where the 
future that gets returned from `ResourceEstimator::estimate()` gets handled). 
In addition, the ResourceEstimator remains "functional" in the sense that it 
just has to return some value (or a future) from it's functions versus invoking 
some callback that causes something to get run some other place (and in fact, 
may also block, so isn't it safer for the ResourceEstimator to invoke the 
callback in it's own `async`?).

The invocation of the `ResourceEstimator::estimate()` followed by the `.then` 
is a nice pattern that let's us compose with other things as well, which is 
harder to do with the lambda style callbacks and why we've avoided it where 
we've been able (in fact, I'm curious which place in the code are you imitating 
here?).)

> Change the interaction between the slave and the resource estimator from 
> polling to pushing 
> 
>
> Key: MESOS-2735
> URL: https://issues.apache.org/jira/browse/MESOS-2735
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: twitter
>
> This will make the semantics more clear. The resource estimator can control 
> the speed of sending resources estimation to the slave.
> To avoid cyclic dependency, slave will register a callback with the resource 
> estimator and the resource estimator will simply invoke that callback when 
> there's a new estimation ready. The callback will be a defer to the slave's 
> main event queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2735) Change the interaction between the slave and the resource estimator from polling to pushing

2015-05-19 Thread Benjamin Hindman (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551569#comment-14551569
 ] 

Benjamin Hindman commented on MESOS-2735:
-

There are definitely differences in message queue behavior, one of which is 
significantly safer than the other. There are two safety concerns that I can 
think of, one of which [~jieyu] has addressed here but I'll repeat to be sure I 
properly understood.

(1) Someone might write a ResourceEstimator that isn't asynchronous, causing 
the slave to "block" while the resource estimator estimates.

(2) The ResourceEstimator might cause a denial of service attack on the slave.

I understand the concern with (1) but I'm not too anxious about it. Why? It 
should be trivial to make a wrapper module which forces people to implement the 
ResourceEstimator to be asynchronous, either using `async` like you suggested 
or implementing a version of ResourceEstimator which wraps an actor (libprocess 
process). We'll only need to do this once and then other ResourceEstimator 
implementations can leverage this stuff.

On the other hand, I don't like the behavior of push because of (2). 
Fundamentally, if the slave can't keep up with the rate at which the 
ResourceEstimator is pushing then we could create a denial of service issue 
with the slave, i.e., it takes a long time to process non-ResourceEstimator 
messages because it's queue is full of just ResourceEstimator messages. I'm 
more anxious about (2) than (1) because it's harder to find bugs in (2) than 
with (1) since once you fix (1) it stays fixed forever but any time you updated 
the algorithm you impact the potential to cause (2).

Now, I acknowledge that implementing this as a pull versus push will make the 
implementation in the ResourceEstimator slightly more complicated, but not 
really. In particular, it should be trivial to always use a `Queue` to achieve 
the push semantics in any ResourceEstimator implementation, while still 
providing the pull semantics externally. Make sense?

Finally, one of the advantages of the pull model is that it's easier to reason 
about because we don't have "anonymous" lambdas that cause execution in some 
other random place in the code (i.e., you can easily see in the slave where the 
future that gets returned from `ResourceEstimator::estimate()` gets handled). 
In addition, the ResourceEstimator remains "functional" in the sense that it 
just has to return some value (or a future) from it's functions versus invoking 
some callback that causes something to get run some other place (and in fact, 
may also block, so isn't it safer for the ResourceEstimator to invoke the 
callback in it's own `async`?).

The invocation of the `ResourceEstimator::estimate()` followed by the `.then` 
is a nice pattern that let's us compose with other things as well, which is 
harder to do with the lambda style callbacks and why we've avoided it where 
we've been able (in fact, I'm curious which place in the code are you imitating 
here?).

> Change the interaction between the slave and the resource estimator from 
> polling to pushing 
> 
>
> Key: MESOS-2735
> URL: https://issues.apache.org/jira/browse/MESOS-2735
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: twitter
>
> This will make the semantics more clear. The resource estimator can control 
> the speed of sending resources estimation to the slave.
> To avoid cyclic dependency, slave will register a callback with the resource 
> estimator and the resource estimator will simply invoke that callback when 
> there's a new estimation ready. The callback will be a defer to the slave's 
> main event queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2735) Change the interaction between the slave and the resource estimator from polling to pushing

2015-05-19 Thread Benjamin Hindman (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551568#comment-14551568
 ] 

Benjamin Hindman commented on MESOS-2735:
-

There are definitely differences in message queue behavior, one of which is 
significantly safer than the other. There are two safety concerns that I can 
think of, one of which [~jieyu] has addressed here but I'll repeat to be sure I 
properly understood.

(1) Someone might write a ResourceEstimator that isn't asynchronous, causing 
the slave to "block" while the resource estimator estimates.

(2) The ResourceEstimator might cause a denial of service attack on the slave.

I understand the concern with (1) but I'm not too anxious about it. Why? It 
should be trivial to make a wrapper module which forces people to implement the 
ResourceEstimator to be asynchronous, either using `async` like you suggested 
or implementing a version of ResourceEstimator which wraps an actor (libprocess 
process). We'll only need to do this once and then other ResourceEstimator 
implementations can leverage this stuff.

On the other hand, I don't like the behavior of push because of (2). 
Fundamentally, if the slave can't keep up with the rate at which the 
ResourceEstimator is pushing then we could create a denial of service issue 
with the slave, i.e., it takes a long time to process non-ResourceEstimator 
messages because it's queue is full of just ResourceEstimator messages. I'm 
more anxious about (2) than (1) because it's harder to find bugs in (2) than 
with (1) since once you fix (1) it stays fixed forever but any time you updated 
the algorithm you impact the potential to cause (2).

Now, I acknowledge that implementing this as a pull versus push will make the 
implementation in the ResourceEstimator slightly more complicated, but not 
really. In particular, it should be trivial to always use a `Queue` to achieve 
the push semantics in any ResourceEstimator implementation, while still 
providing the pull semantics externally. Make sense?

Finally, one of the advantages of the pull model is that it's easier to reason 
about because we don't have "anonymous" lambdas that cause execution in some 
other random place in the code (i.e., you can easily see in the slave where the 
future that gets returned from `ResourceEstimator::estimate()` gets handled). 
In addition, the ResourceEstimator remains "functional" in the sense that it 
just has to return some value (or a future) from it's functions versus invoking 
some callback that causes something to get run some other place (and in fact, 
may also block, so isn't it safer for the ResourceEstimator to invoke the 
callback in it's own `async`?).

The invocation of the `ResourceEstimator::estimate()` followed by the `.then` 
is a nice pattern that let's us compose with other things as well, which is 
harder to do with the lambda style callbacks and why we've avoided it where 
we've been able (in fact, I'm curious which place in the code are you imitating 
here?).

> Change the interaction between the slave and the resource estimator from 
> polling to pushing 
> 
>
> Key: MESOS-2735
> URL: https://issues.apache.org/jira/browse/MESOS-2735
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: twitter
>
> This will make the semantics more clear. The resource estimator can control 
> the speed of sending resources estimation to the slave.
> To avoid cyclic dependency, slave will register a callback with the resource 
> estimator and the resource estimator will simply invoke that callback when 
> there's a new estimation ready. The callback will be a defer to the slave's 
> main event queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources

2015-05-19 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551462#comment-14551462
 ] 

Joris Van Remoortere commented on MESOS-2652:
-

Review for setting core affinity:
https://reviews.apache.org/r/34442

Will base the SCHED_OTHER over SCHED_IDLE pre-emption test on this.

> Update Mesos containerizer to understand revocable cpu resources
> 
>
> Key: MESOS-2652
> URL: https://issues.apache.org/jira/browse/MESOS-2652
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Ian Downes
>  Labels: twitter
>
> The CPU isolator needs to properly set limits for revocable and non-revocable 
> containers.
> The proposed strategy is to use a two-way split of the cpu cgroup hierarchy 
> -- normal (non-revocable) and low priority (revocable) subtrees -- and to use 
> a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split 
> (TBD). Containers would be present in only one of the subtrees. CFS quotas 
> will *not* be set on subtree roots, only cpu.shares. Each container would set 
> CFS quota and shares as done currently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2753) Enforce revocable CPU invariant in Master

2015-05-19 Thread Ian Downes (JIRA)
Ian Downes created MESOS-2753:
-

 Summary: Enforce revocable CPU invariant in Master
 Key: MESOS-2753
 URL: https://issues.apache.org/jira/browse/MESOS-2753
 Project: Mesos
  Issue Type: Task
  Components: isolation, master
Affects Versions: 0.23.0
Reporter: Ian Downes


Current implementation out for [review|https://reviews.apache.org/r/34310] only 
supports setting the priority of containers with revocable CPU if it's 
specified in the initial executor info resources. This should be enforced at 
the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources

2015-05-19 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551381#comment-14551381
 ] 

Ian Downes commented on MESOS-2652:
---

Borg does prod and non-prod as coarse prioritization bands but supports 
different priorities within each.

> Update Mesos containerizer to understand revocable cpu resources
> 
>
> Key: MESOS-2652
> URL: https://issues.apache.org/jira/browse/MESOS-2652
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Ian Downes
>  Labels: twitter
>
> The CPU isolator needs to properly set limits for revocable and non-revocable 
> containers.
> The proposed strategy is to use a two-way split of the cpu cgroup hierarchy 
> -- normal (non-revocable) and low priority (revocable) subtrees -- and to use 
> a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split 
> (TBD). Containers would be present in only one of the subtrees. CFS quotas 
> will *not* be set on subtree roots, only cpu.shares. Each container would set 
> CFS quota and shares as done currently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2636) Segfault in inline Try getIP(const std::string& hostname, int family)

2015-05-19 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551337#comment-14551337
 ] 

Benjamin Mahler commented on MESOS-2636:


net::hostname fix was committed:

{noformat}
commit 08e11d372afbb66907130998b485c185687fae34
Author: Chi Zhang 
Date:   Tue May 19 15:03:23 2015 -0700

Removed bad call to freeaddrinfo in net::hostname.

Review: https://reviews.apache.org/r/34438
{noformat}

> Segfault in inline Try getIP(const std::string& hostname, int family)
> -
>
> Key: MESOS-2636
> URL: https://issues.apache.org/jira/browse/MESOS-2636
> Project: Mesos
>  Issue Type: Bug
>Reporter: Chi Zhang
>Assignee: Chi Zhang
>  Labels: twitter
> Fix For: 0.23.0
>
>
> We saw a segfault in production. Attaching the coredump, we see:
> Core was generated by `/usr/local/sbin/mesos-slave --port=5051 
> --resources=cpus:23;mem:70298;ports:[31'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x7f639867c77e in free () from /lib64/libc.so.6
> (gdb) bt
> #0  0x7f639867c77e in free () from /lib64/libc.so.6
> #1  0x7f63986c25d0 in freeaddrinfo () from /lib64/libc.so.6
> #2  0x7f6399deeafa in net::getIP (hostname="", family=2) at 
> ./3rdparty/stout/include/stout/net.hpp:201
> #3  0x7f6399e1f273 in process::initialize (delegate=Unhandled dwarf 
> expression opcode 0xf3
> ) at src/process.cpp:837
> #4  0x0042342f in main ()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2044) Use one IP address per container for network isolation

2015-05-19 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551326#comment-14551326
 ] 

Ian Downes commented on MESOS-2044:
---

This JIRA is intended to address a single IP per container which is shared by 
the executor and all tasks within the container and is different to the host's. 
That's a very valid requirement though so please raise a separate ticket.

> Use one IP address per container for network isolation
> --
>
> Key: MESOS-2044
> URL: https://issues.apache.org/jira/browse/MESOS-2044
> Project: Mesos
>  Issue Type: Epic
>Reporter: Cong Wang
>
> If there are enough IP addresses, either IPv4 or IPv6, we should use one IP 
> address per container, instead of the ugly port range based solution. One 
> problem with this is the IP address management, usually it is managed by a 
> DHCP server, maybe we need to manage them in mesos master/slave.
> Also, maybe use macvlan instead of veth for better isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2752) Add HTB queueing discipline wrapper class

2015-05-19 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2752:
-

 Summary: Add HTB queueing discipline wrapper class
 Key: MESOS-2752
 URL: https://issues.apache.org/jira/browse/MESOS-2752
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett


Network isolator uses a Hierarchical Token Bucket (HTB) traffic control 
discipline on the egress filter inside each container as the root for adding 
traffic filters.  A HTB wrapper is needed to access the network statistics for 
this interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2636) Segfault in inline Try getIP(const std::string& hostname, int family)

2015-05-19 Thread Chi Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551289#comment-14551289
 ] 

Chi Zhang commented on MESOS-2636:
--

https://reviews.apache.org/r/34438/

did a round of grepping. getIP and hostname are the only 2 places use 
freeaddrinfo.

> Segfault in inline Try getIP(const std::string& hostname, int family)
> -
>
> Key: MESOS-2636
> URL: https://issues.apache.org/jira/browse/MESOS-2636
> Project: Mesos
>  Issue Type: Bug
>Reporter: Chi Zhang
>Assignee: Chi Zhang
>  Labels: twitter
> Fix For: 0.23.0
>
>
> We saw a segfault in production. Attaching the coredump, we see:
> Core was generated by `/usr/local/sbin/mesos-slave --port=5051 
> --resources=cpus:23;mem:70298;ports:[31'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x7f639867c77e in free () from /lib64/libc.so.6
> (gdb) bt
> #0  0x7f639867c77e in free () from /lib64/libc.so.6
> #1  0x7f63986c25d0 in freeaddrinfo () from /lib64/libc.so.6
> #2  0x7f6399deeafa in net::getIP (hostname="", family=2) at 
> ./3rdparty/stout/include/stout/net.hpp:201
> #3  0x7f6399e1f273 in process::initialize (delegate=Unhandled dwarf 
> expression opcode 0xf3
> ) at src/process.cpp:837
> #4  0x0042342f in main ()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2665) Fix queuing discipline wrapper in linux/routing/queueing

2015-05-19 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551270#comment-14551270
 ] 

Paul Brett commented on MESOS-2665:
---

Added:

https://reviews.apache.org/r/34426/

> Fix queuing discipline wrapper in linux/routing/queueing 
> -
>
> Key: MESOS-2665
> URL: https://issues.apache.org/jira/browse/MESOS-2665
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: Paul Brett
>Assignee: Paul Brett
>Priority: Critical
>
> qdisc search function is dependent on matching a single hard coded handle and 
> does not correctly test for interface, making the implementation fragile.  
> Additionally, the current setup scripts (using dynamically created shell 
> commands) do not match the hard coded handles.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-2254) Posix CPU isolator usage call introduce high cpu load

2015-05-19 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551142#comment-14551142
 ] 

Ian Downes edited comment on MESOS-2254 at 5/19/15 8:18 PM:


I presume he's referring to the slave flag {{--resource_monitoring_interval}} 
which currently defaults to {{RESOURCE_MONITORING_INTERVAL = Seconds(1)}} but 
which [~nnielsen] has marked as soon to be deprecated.
{noformat}
  // TODO(nnielsen): Deprecate resource_monitoring_interval flag after
  // Mesos 0.23.0.
  Duration resource_monitoring_interval;
{noformat}
In the meantime, if this is causing performance issues then you could set 
{{--resource_monitoring_interval}} to something longer than the default.


was (Author: idownes):
I presume he's referring to the slave flag {{--resource_monitoring_interval}} 
which currently defaults to {{RESOURCE_MONITORING_INTERVAL = Seconds(1)}} but 
which [~nnielsen] has marked as soon to be deprecated.
{noformat}
  // TODO(nnielsen): Deprecate resource_monitoring_interval flag after
  // Mesos 0.23.0.
  Duration resource_monitoring_interval;
{noformat}
In the meantime, if this is causing performance issues then you could set 
{{--resource_monitoring_internal}} to something longer than the default.

> Posix CPU isolator usage call introduce high cpu load
> -
>
> Key: MESOS-2254
> URL: https://issues.apache.org/jira/browse/MESOS-2254
> Project: Mesos
>  Issue Type: Bug
>Reporter: Niklas Quarfot Nielsen
>
> With more than 20 executors running on a slave with the posix isolator, we 
> have seen an very high cpu load (over 200%).
> From profiling one thread (there were two, taking up all the cpu time. The 
> total CPU time was over 200%):
> {code}
> Running Time  SelfSymbol Name
> 27133.0ms   47.8% 0.0 _pthread_body  0x1adb50
> 27133.0ms   47.8% 0.0  thread_start
> 27133.0ms   47.8% 0.0   _pthread_start
> 27133.0ms   47.8% 0.0_pthread_body
> 27133.0ms   47.8% 0.0 process::schedule(void*)
> 27133.0ms   47.8% 2.0  
> process::ProcessManager::resume(process::ProcessBase*)
> 27126.0ms   47.8% 1.0   
> process::ProcessBase::serve(process::Event const&)
> 27125.0ms   47.8% 0.0
> process::DispatchEvent::visit(process::EventVisitor*) const
> 27125.0ms   47.8% 0.0 
> process::ProcessBase::visit(process::DispatchEvent const&)
> 27125.0ms   47.8% 0.0  std::__1::function (process::ProcessBase*)>::operator()(process::ProcessBase*) const
> 27124.0ms   47.8% 0.0   
> std::__1::__function::__func 
> process::dispatch mesos::internal::slave::IsolatorProcess, mesos::ContainerID const&, 
> mesos::ContainerID>(process::PID 
> const&, process::Future 
> (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const&), 
> mesos::ContainerID)::'lambda'(process::ProcessBase*), 
> std::__1::allocator 
> process::dispatch mesos::internal::slave::IsolatorProcess, mesos::ContainerID const&, 
> mesos::ContainerID>(process::PID 
> const&, process::Future 
> (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const&), 
> mesos::ContainerID)::'lambda'(process::ProcessBase*)>, void 
> (process::ProcessBase*)>::operator()(process::ProcessBase*&&)
> 27124.0ms   47.8% 1.0
> process::Future 
> process::dispatch mesos::internal::slave::IsolatorProcess, mesos::ContainerID const&, 
> mesos::ContainerID>(process::PID 
> const&, process::Future 
> (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const&), 
> mesos::ContainerID)::'lambda'(process::ProcessBase*)::operator()(process::ProcessBase*)
>  const
> 27060.0ms   47.7% 1.0 
> mesos::internal::slave::PosixCpuIsolatorProcess::usage(mesos::ContainerID 
> const&)
> 27046.0ms   47.7% 2.0  
> mesos::internal::usage(int, bool, bool)
> 27023.0ms   47.6% 2.0   os::pstree(Option)
> 26748.0ms   47.1% 23.0   os::processes()
> 24809.0ms   43.7% 349.0   os::process(int)
> 8199.0ms   14.4%  47.0 os::sysctl::string() 
> const
> 7562.0ms   13.3%  7562.0__sysctl
> {code}
> We could see that usage() in usage/usage.cpp is causing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2254) Posix CPU isolator usage call introduce high cpu load

2015-05-19 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551142#comment-14551142
 ] 

Ian Downes commented on MESOS-2254:
---

I presume he's referring to the slave flag {{--resource_monitoring_interval}} 
which currently defaults to {{RESOURCE_MONITORING_INTERVAL = Seconds(1)}} but 
which [~nnielsen] has marked as soon to be deprecated.
{noformat}
  // TODO(nnielsen): Deprecate resource_monitoring_interval flag after
  // Mesos 0.23.0.
  Duration resource_monitoring_interval;
{noformat}
In the meantime, if this is causing performance issues then you could set 
{{--resource_monitoring_internal}} to something longer than the default.

> Posix CPU isolator usage call introduce high cpu load
> -
>
> Key: MESOS-2254
> URL: https://issues.apache.org/jira/browse/MESOS-2254
> Project: Mesos
>  Issue Type: Bug
>Reporter: Niklas Quarfot Nielsen
>
> With more than 20 executors running on a slave with the posix isolator, we 
> have seen an very high cpu load (over 200%).
> From profiling one thread (there were two, taking up all the cpu time. The 
> total CPU time was over 200%):
> {code}
> Running Time  SelfSymbol Name
> 27133.0ms   47.8% 0.0 _pthread_body  0x1adb50
> 27133.0ms   47.8% 0.0  thread_start
> 27133.0ms   47.8% 0.0   _pthread_start
> 27133.0ms   47.8% 0.0_pthread_body
> 27133.0ms   47.8% 0.0 process::schedule(void*)
> 27133.0ms   47.8% 2.0  
> process::ProcessManager::resume(process::ProcessBase*)
> 27126.0ms   47.8% 1.0   
> process::ProcessBase::serve(process::Event const&)
> 27125.0ms   47.8% 0.0
> process::DispatchEvent::visit(process::EventVisitor*) const
> 27125.0ms   47.8% 0.0 
> process::ProcessBase::visit(process::DispatchEvent const&)
> 27125.0ms   47.8% 0.0  std::__1::function (process::ProcessBase*)>::operator()(process::ProcessBase*) const
> 27124.0ms   47.8% 0.0   
> std::__1::__function::__func 
> process::dispatch mesos::internal::slave::IsolatorProcess, mesos::ContainerID const&, 
> mesos::ContainerID>(process::PID 
> const&, process::Future 
> (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const&), 
> mesos::ContainerID)::'lambda'(process::ProcessBase*), 
> std::__1::allocator 
> process::dispatch mesos::internal::slave::IsolatorProcess, mesos::ContainerID const&, 
> mesos::ContainerID>(process::PID 
> const&, process::Future 
> (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const&), 
> mesos::ContainerID)::'lambda'(process::ProcessBase*)>, void 
> (process::ProcessBase*)>::operator()(process::ProcessBase*&&)
> 27124.0ms   47.8% 1.0
> process::Future 
> process::dispatch mesos::internal::slave::IsolatorProcess, mesos::ContainerID const&, 
> mesos::ContainerID>(process::PID 
> const&, process::Future 
> (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const&), 
> mesos::ContainerID)::'lambda'(process::ProcessBase*)::operator()(process::ProcessBase*)
>  const
> 27060.0ms   47.7% 1.0 
> mesos::internal::slave::PosixCpuIsolatorProcess::usage(mesos::ContainerID 
> const&)
> 27046.0ms   47.7% 2.0  
> mesos::internal::usage(int, bool, bool)
> 27023.0ms   47.6% 2.0   os::pstree(Option)
> 26748.0ms   47.1% 23.0   os::processes()
> 24809.0ms   43.7% 349.0   os::process(int)
> 8199.0ms   14.4%  47.0 os::sysctl::string() 
> const
> 7562.0ms   13.3%  7562.0__sysctl
> {code}
> We could see that usage() in usage/usage.cpp is causing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2254) Posix CPU isolator usage call introduce high cpu load

2015-05-19 Thread Daniel Nugent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551127#comment-14551127
 ] 

Daniel Nugent commented on MESOS-2254:
--

[~idownes] In that case, do you know what the rate limiting is that [~nnielsen] 
referred to?

> Posix CPU isolator usage call introduce high cpu load
> -
>
> Key: MESOS-2254
> URL: https://issues.apache.org/jira/browse/MESOS-2254
> Project: Mesos
>  Issue Type: Bug
>Reporter: Niklas Quarfot Nielsen
>
> With more than 20 executors running on a slave with the posix isolator, we 
> have seen an very high cpu load (over 200%).
> From profiling one thread (there were two, taking up all the cpu time. The 
> total CPU time was over 200%):
> {code}
> Running Time  SelfSymbol Name
> 27133.0ms   47.8% 0.0 _pthread_body  0x1adb50
> 27133.0ms   47.8% 0.0  thread_start
> 27133.0ms   47.8% 0.0   _pthread_start
> 27133.0ms   47.8% 0.0_pthread_body
> 27133.0ms   47.8% 0.0 process::schedule(void*)
> 27133.0ms   47.8% 2.0  
> process::ProcessManager::resume(process::ProcessBase*)
> 27126.0ms   47.8% 1.0   
> process::ProcessBase::serve(process::Event const&)
> 27125.0ms   47.8% 0.0
> process::DispatchEvent::visit(process::EventVisitor*) const
> 27125.0ms   47.8% 0.0 
> process::ProcessBase::visit(process::DispatchEvent const&)
> 27125.0ms   47.8% 0.0  std::__1::function (process::ProcessBase*)>::operator()(process::ProcessBase*) const
> 27124.0ms   47.8% 0.0   
> std::__1::__function::__func 
> process::dispatch mesos::internal::slave::IsolatorProcess, mesos::ContainerID const&, 
> mesos::ContainerID>(process::PID 
> const&, process::Future 
> (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const&), 
> mesos::ContainerID)::'lambda'(process::ProcessBase*), 
> std::__1::allocator 
> process::dispatch mesos::internal::slave::IsolatorProcess, mesos::ContainerID const&, 
> mesos::ContainerID>(process::PID 
> const&, process::Future 
> (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const&), 
> mesos::ContainerID)::'lambda'(process::ProcessBase*)>, void 
> (process::ProcessBase*)>::operator()(process::ProcessBase*&&)
> 27124.0ms   47.8% 1.0
> process::Future 
> process::dispatch mesos::internal::slave::IsolatorProcess, mesos::ContainerID const&, 
> mesos::ContainerID>(process::PID 
> const&, process::Future 
> (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const&), 
> mesos::ContainerID)::'lambda'(process::ProcessBase*)::operator()(process::ProcessBase*)
>  const
> 27060.0ms   47.7% 1.0 
> mesos::internal::slave::PosixCpuIsolatorProcess::usage(mesos::ContainerID 
> const&)
> 27046.0ms   47.7% 2.0  
> mesos::internal::usage(int, bool, bool)
> 27023.0ms   47.6% 2.0   os::pstree(Option)
> 26748.0ms   47.1% 23.0   os::processes()
> 24809.0ms   43.7% 349.0   os::process(int)
> 8199.0ms   14.4%  47.0 os::sysctl::string() 
> const
> 7562.0ms   13.3%  7562.0__sysctl
> {code}
> We could see that usage() in usage/usage.cpp is causing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2254) Posix CPU isolator usage call introduce high cpu load

2015-05-19 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551118#comment-14551118
 ] 

Ian Downes commented on MESOS-2254:
---

[~nugend] No, --perf_interval is just for the perf isolator which uses a 
perf_event cgroup to efficiently run perf against a container. Unrelated to 
this.

> Posix CPU isolator usage call introduce high cpu load
> -
>
> Key: MESOS-2254
> URL: https://issues.apache.org/jira/browse/MESOS-2254
> Project: Mesos
>  Issue Type: Bug
>Reporter: Niklas Quarfot Nielsen
>
> With more than 20 executors running on a slave with the posix isolator, we 
> have seen an very high cpu load (over 200%).
> From profiling one thread (there were two, taking up all the cpu time. The 
> total CPU time was over 200%):
> {code}
> Running Time  SelfSymbol Name
> 27133.0ms   47.8% 0.0 _pthread_body  0x1adb50
> 27133.0ms   47.8% 0.0  thread_start
> 27133.0ms   47.8% 0.0   _pthread_start
> 27133.0ms   47.8% 0.0_pthread_body
> 27133.0ms   47.8% 0.0 process::schedule(void*)
> 27133.0ms   47.8% 2.0  
> process::ProcessManager::resume(process::ProcessBase*)
> 27126.0ms   47.8% 1.0   
> process::ProcessBase::serve(process::Event const&)
> 27125.0ms   47.8% 0.0
> process::DispatchEvent::visit(process::EventVisitor*) const
> 27125.0ms   47.8% 0.0 
> process::ProcessBase::visit(process::DispatchEvent const&)
> 27125.0ms   47.8% 0.0  std::__1::function (process::ProcessBase*)>::operator()(process::ProcessBase*) const
> 27124.0ms   47.8% 0.0   
> std::__1::__function::__func 
> process::dispatch mesos::internal::slave::IsolatorProcess, mesos::ContainerID const&, 
> mesos::ContainerID>(process::PID 
> const&, process::Future 
> (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const&), 
> mesos::ContainerID)::'lambda'(process::ProcessBase*), 
> std::__1::allocator 
> process::dispatch mesos::internal::slave::IsolatorProcess, mesos::ContainerID const&, 
> mesos::ContainerID>(process::PID 
> const&, process::Future 
> (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const&), 
> mesos::ContainerID)::'lambda'(process::ProcessBase*)>, void 
> (process::ProcessBase*)>::operator()(process::ProcessBase*&&)
> 27124.0ms   47.8% 1.0
> process::Future 
> process::dispatch mesos::internal::slave::IsolatorProcess, mesos::ContainerID const&, 
> mesos::ContainerID>(process::PID 
> const&, process::Future 
> (mesos::internal::slave::IsolatorProcess::*)(mesos::ContainerID const&), 
> mesos::ContainerID)::'lambda'(process::ProcessBase*)::operator()(process::ProcessBase*)
>  const
> 27060.0ms   47.7% 1.0 
> mesos::internal::slave::PosixCpuIsolatorProcess::usage(mesos::ContainerID 
> const&)
> 27046.0ms   47.7% 2.0  
> mesos::internal::usage(int, bool, bool)
> 27023.0ms   47.6% 2.0   os::pstree(Option)
> 26748.0ms   47.1% 23.0   os::processes()
> 24809.0ms   43.7% 349.0   os::process(int)
> 8199.0ms   14.4%  47.0 os::sysctl::string() 
> const
> 7562.0ms   13.3%  7562.0__sysctl
> {code}
> We could see that usage() in usage/usage.cpp is causing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-354) Oversubscribe resources

2015-05-19 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551042#comment-14551042
 ] 

Vinod Kone commented on MESOS-354:
--

This is the high level idea of how the different components (described in the 
design doc) interact for oversubscription for the MVP.

--> Resource estimator sends an estimate of 'oversubscribable' resources to the 
slave.

--> Slave periodically checks if its cached value of 'revocable resources' 
(i.e., allocations of revocable containers + oversubscribable resources) has 
changed. If changed, slave forwards 'revocable resources' to the master.

--> Master rescinds outstanding revocable offers when it gets new 'revocable 
resources' estimate and updates the allocator.

--> On receiving 'revocable resources' update, allocator updates 
'revocable_available' (revocable resources - revocable allocation) resources.

--> 'revocable_available' gets allocated to (and recovered from) frameworks in 
the same way as 'available' (regular resources).

--> When sending offers master sends separate offers for revocable and regular 
resources.

Some salient features of this proposal:
--> Allocator changes are minimal.
--> Slave forwards estimates only when there is a change => low load on master.
--> Split offers allows master to rescind only revocable resources when 
necessary.

Thoughts?

> Oversubscribe resources
> ---
>
> Key: MESOS-354
> URL: https://issues.apache.org/jira/browse/MESOS-354
> Project: Mesos
>  Issue Type: Epic
>  Components: isolation, master, slave
>Reporter: brian wickman
>Priority: Minor
>  Labels: mesosphere, twitter
> Attachments: mesos_virtual_offers.pdf
>
>
> This proposal is predicated upon offer revocation.
> The idea would be to add a new "revoked" status either by (1) piggybacking 
> off an existing status update (TASK_LOST or TASK_KILLED) or (2) introducing a 
> new status update TASK_REVOKED.
> In order to augment an offer with metadata about revocability, there are 
> options:
>   1) Add a revocable boolean to the Offer and
> a) offer only one type of Offer per slave at a particular time
> b) offer both revocable and non-revocable resources at the same time but 
> require frameworks to understand that Offers can contain overlapping resources
>   2) Add a revocable_resources field on the Offer which is a superset of the 
> regular resources field.  By consuming > resources <= revocable_resources in 
> a launchTask, the Task becomes a revocable task.  If launching a task with < 
> resources, the Task is non-revocable.
> The use cases for revocable tasks are batch tasks (e.g. hadoop/pig/mapreduce) 
> and non-revocable tasks are online higher-SLA tasks (e.g. services.)
> Consider a non-revocable that asks for 4 cores, 8 GB RAM and 20 GB of disk.  
> One of these resources is a rate (4 cpu seconds per second) and two of them 
> are fixed values (8GB and 20GB respectively, though disk resources can be 
> further broken down into spindles - fixed - and iops - a rate.)  In practice, 
> these are the maximum resources in the respective dimensions that this task 
> will use.  In reality, we provision tasks at some factor below peak, and only 
> hit peak resource consumption in rare circumstances or perhaps at a diurnal 
> peak.  
> In the meantime, we stand to gain from offering the some constant factor of 
> the difference between (reserved - actual) of non-revocable tasks as 
> revocable resources, depending upon our tolerance for revocable task churn.  
> The main challenge is coming up with an accurate short / medium / long-term 
> prediction of resource consumption based upon current behavior.
> In many cases it would be OK to be sloppy:
>   * CPU / iops / network IO are rates (compressible) and can often be OK 
> below guarantees for brief periods of time while task revocation takes place
>   * Memory slack can be provided by enabling swap and dynamically setting 
> swap paging boundaries.  Should swap ever be activated, that would be a 
> signal to revoke.
> The master / allocator would piggyback on the slave heartbeat mechanism to 
> learn of the amount of revocable resources available at any point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2751) Stopping the scheduler driver w/o failover requires a sleep to ensure the UnregisterFrameworkMessage is delivered.

2015-05-19 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-2751:
--

 Summary: Stopping the scheduler driver w/o failover requires a 
sleep to ensure the UnregisterFrameworkMessage is delivered.
 Key: MESOS-2751
 URL: https://issues.apache.org/jira/browse/MESOS-2751
 Project: Mesos
  Issue Type: Bug
  Components: framework
Reporter: Benjamin Mahler
Priority: Minor


When the call to {{driver.stop(false)}} completes, the 
UnregisterFrameworkMessage will be sent asynchronously once the 
SchedulerProcess processes the dispatch event.

This requires schedulers to sleep to ensure the message is processed:
http://markmail.org/thread/yuzq5i3hkpttxc2s

We could block on a Future result from the dispatch, if safe. But this still 
doesn't ensure the message is flushed out of libprocess. And without 
acknowledgements, we don't know if the master has successfully unregistered the 
framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2750) Extend queueing discipline wrappers to expose network isolator statistics

2015-05-19 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2750:
--
Summary: Extend queueing discipline wrappers to expose network isolator 
statistics  (was: Extend qeueing discipline wrappers to expose network isolator 
statistics)

> Extend queueing discipline wrappers to expose network isolator statistics
> -
>
> Key: MESOS-2750
> URL: https://issues.apache.org/jira/browse/MESOS-2750
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: Paul Brett
>Assignee: Paul Brett
>
> Export Traffic Control statistics in queueing library to enable reporting out 
> impact of network bandwidth statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2750) Extend qeueing discipline wrappers to expose network isolator statistics

2015-05-19 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2750:
-

 Summary: Extend qeueing discipline wrappers to expose network 
isolator statistics
 Key: MESOS-2750
 URL: https://issues.apache.org/jira/browse/MESOS-2750
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Reporter: Paul Brett
Assignee: Paul Brett


Export Traffic Control statistics in queueing library to enable reporting out 
impact of network bandwidth statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2717) Qemu/KVM containerizer

2015-05-19 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550964#comment-14550964
 ] 

Ian Downes commented on MESOS-2717:
---

The containerizer interface was designed to support this and I'd be happy to 
shepherd any efforts.

Some initial notes:
1) I agree, bridged networking would be simplest.
2) This could be done by the custom executor. Work is being started on making 
IP addresses a global resource.
3) The fetcher should be used. Patches for caching objects will soon be 
committed.
4) You could just run the VM inside cgroups/namespaces etc. and leverage the 
existing code for managing them.
5) Be aware that you need to architect the code so the slave can be restarted 
while VMs/containers are running, i.e., you'll need to re-establish said 
connections during slave recovery.

> Qemu/KVM containerizer
> --
>
> Key: MESOS-2717
> URL: https://issues.apache.org/jira/browse/MESOS-2717
> Project: Mesos
>  Issue Type: Wish
>  Components: containerization
>Reporter: Pierre-Yves Ritschard
>
> I think it would make sense for Mesos to have the ability to treat 
> hypervisors as containerizers and the most sensible one to start with would 
> probably be Qemu/KVM.
> There are a few workloads that can require full-fledged VMs (the most obvious 
> one being Windows workloads).
> The containerization code is well decoupled and seems simple enough, I can 
> definitely take a shot at it. VMs do bring some questions with them here is 
> my take on them:
> 1. Routing, network strategy
> ==
> The simplest approach here might very well be to go for bridged networks
> and leave the setup and inter slave routing up to the administrator
> 2. IP Address assignment
> 
> At first, it can be up to the Frameworks to deal with IP assignment.
> The simplest way to address this could be to have an executor running
> on slaves providing the qemu/kvm containerizer which would instrument a DHCP 
> server and collect IP + Mac address resources from slaves. While it may be up 
> to the frameworks to provide this, an example should most likely be provided.
> 3. VM Templates
> ==
> VM templates should probably leverage the fetcher and could thus be copied 
> locally or fetch from HTTP(s) / HDFS.
> 4. Resource limiting
> 
> Mapping resouce constraints to the qemu command line is probably the easiest 
> part, Additional command line should also be fetchable. For Unix VMs, the 
> sandbox could show the output of the serial console
> 5. Libvirt / plain Qemu
> =
> I tend to favor limiting the amount of necessary hoops to jump through and 
> would thus investigate working directly with Qemu, maintaining an open 
> connection to the monitor to assert status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-2340) Publish JSON in ZK instead of serialized MasterInfo

2015-05-19 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550958#comment-14550958
 ] 

Marco Massenzio edited comment on MESOS-2340 at 5/19/15 6:30 PM:
-

so, this is the simplest code I could came up with (this needs refining, 
obviously!) (in {{src/zookeeper/group.cpp}}:

{code}
  // if label is not None, this is the MasterInfo being serialized
  if (label.isSome()) {
// TODO: how do we serialize MasterInfo to JSON? we only have the
// raw serialized data here
string json = "{\"value\": \"foobar\"}";
string loc = result + ".json";
string jsonResult;
zk->create(
  loc,
  json,
  acl,
  ZOO_EPHEMERAL,
  &jsonResult);
LOG(INFO) << "Added JSON data to " << jsonResult;
  }
{code}

If I now start the Master, I can see both nodes in the {{/test/json}} folder.
{noformat}
$ ./bin/mesos-master.sh --zk=zk://127.0.0.1:2181/json/test 
--work_dir=/tmp/mesos --quorum=1
{noformat}
{noformat}
[zk: localhost:2181(CONNECTED) 8] ls /json/test 
[log_replicas, info_10, info_10.json]

[zk: localhost:2181(CONNECTED) 9] get /json/test/info_10.json
{"value": "foobar"}
cZxid = 0xe6
ctime = Tue May 19 11:24:55 PDT 2015
mZxid = 0xe6
mtime = Tue May 19 11:24:55 PDT 2015
pZxid = 0xe6
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x14d496680460057
dataLength = 19
numChildren = 0
{noformat}

and there's no .json in {{log_replicas}}.
I would like to get suggestions as to where to "inject" the JSON, as in the 
Group class, we only get the serialized String, not the {{MasterInfo}} PB.
There are obviously way around this, but I'd like to come up with an extensible 
way.


was (Author: marco-mesos):
so, this is the simplest code I could came up with (this needs refining, 
obviously!) (in {{src/zookeeper/group.cpp}}:

{code}
  // if label is not None, this is the MasterInfo being serialized
  if (label.isSome()) {
// TODO: how do we serialize MasterInfo to JSON? we only have the
// raw serialized data here
string json = "{\"value\": \"foobar\"}";
string loc = result + ".json";
string jsonResult;
zk->create(
  loc,
  json,
  acl,
  ZOO_EPHEMERAL,
  &jsonResult);
LOG(INFO) << "Added JSON data to " << jsonResult;
  }
{info}

If I now start the Master, I can see both nodes in the {{/test/json}} folder.
{noformat}
$ ./bin/mesos-master.sh --zk=zk://127.0.0.1:2181/json/test 
--work_dir=/tmp/mesos --quorum=1
{noformat}
{noformat}
[zk: localhost:2181(CONNECTED) 8] ls /json/test 
[log_replicas, info_10, info_10.json]

[zk: localhost:2181(CONNECTED) 9] get /json/test/info_10.json
{"value": "foobar"}
cZxid = 0xe6
ctime = Tue May 19 11:24:55 PDT 2015
mZxid = 0xe6
mtime = Tue May 19 11:24:55 PDT 2015
pZxid = 0xe6
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x14d496680460057
dataLength = 19
numChildren = 0
{noformat}

and there's no .json in {{log_replicas}}.
I would like to get suggestions as to where to "inject" the JSON, as in the 
Group class, we only get the serialized String, not the {{MasterInfo}} PB.
There are obviously way around this, but I'd like to come up with an extensible 
way.

> Publish JSON in ZK instead of serialized MasterInfo
> ---
>
> Key: MESOS-2340
> URL: https://issues.apache.org/jira/browse/MESOS-2340
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Zameer Manji
>Assignee: haosdent
>
> Currently to discover the master a client needs the ZK node location and 
> access to the MasterInfo protobuf so it can deserialize the binary blob in 
> the node.
> I think it would be nice to publish JSON (like Twitter's ServerSets) so 
> clients are not tied to protobuf to do service discovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2340) Publish JSON in ZK instead of serialized MasterInfo

2015-05-19 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550958#comment-14550958
 ] 

Marco Massenzio commented on MESOS-2340:


so, this is the simplest code I could came up with (this needs refining, 
obviously!) (in {{src/zookeeper/group.cpp}}:

{code}
  // if label is not None, this is the MasterInfo being serialized
  if (label.isSome()) {
// TODO: how do we serialize MasterInfo to JSON? we only have the
// raw serialized data here
string json = "{\"value\": \"foobar\"}";
string loc = result + ".json";
string jsonResult;
zk->create(
  loc,
  json,
  acl,
  ZOO_EPHEMERAL,
  &jsonResult);
LOG(INFO) << "Added JSON data to " << jsonResult;
  }
{info}

If I now start the Master, I can see both nodes in the {{/test/json}} folder.
{noformat}
$ ./bin/mesos-master.sh --zk=zk://127.0.0.1:2181/json/test 
--work_dir=/tmp/mesos --quorum=1
{noformat}
{noformat}
[zk: localhost:2181(CONNECTED) 8] ls /json/test 
[log_replicas, info_10, info_10.json]

[zk: localhost:2181(CONNECTED) 9] get /json/test/info_10.json
{"value": "foobar"}
cZxid = 0xe6
ctime = Tue May 19 11:24:55 PDT 2015
mZxid = 0xe6
mtime = Tue May 19 11:24:55 PDT 2015
pZxid = 0xe6
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x14d496680460057
dataLength = 19
numChildren = 0
{noformat}

and there's no .json in {{log_replicas}}.
I would like to get suggestions as to where to "inject" the JSON, as in the 
Group class, we only get the serialized String, not the {{MasterInfo}} PB.
There are obviously way around this, but I'd like to come up with an extensible 
way.

> Publish JSON in ZK instead of serialized MasterInfo
> ---
>
> Key: MESOS-2340
> URL: https://issues.apache.org/jira/browse/MESOS-2340
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Zameer Manji
>Assignee: haosdent
>
> Currently to discover the master a client needs the ZK node location and 
> access to the MasterInfo protobuf so it can deserialize the binary blob in 
> the node.
> I think it would be nice to publish JSON (like Twitter's ServerSets) so 
> clients are not tied to protobuf to do service discovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2636) Segfault in inline Try getIP(const std::string& hostname, int family)

2015-05-19 Thread Chi Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550913#comment-14550913
 ] 

Chi Zhang commented on MESOS-2636:
--

[~cnstar9988] thanks for reporting! I will submit a fix.

> Segfault in inline Try getIP(const std::string& hostname, int family)
> -
>
> Key: MESOS-2636
> URL: https://issues.apache.org/jira/browse/MESOS-2636
> Project: Mesos
>  Issue Type: Bug
>Reporter: Chi Zhang
>Assignee: Chi Zhang
>  Labels: twitter
> Fix For: 0.23.0
>
>
> We saw a segfault in production. Attaching the coredump, we see:
> Core was generated by `/usr/local/sbin/mesos-slave --port=5051 
> --resources=cpus:23;mem:70298;ports:[31'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x7f639867c77e in free () from /lib64/libc.so.6
> (gdb) bt
> #0  0x7f639867c77e in free () from /lib64/libc.so.6
> #1  0x7f63986c25d0 in freeaddrinfo () from /lib64/libc.so.6
> #2  0x7f6399deeafa in net::getIP (hostname="", family=2) at 
> ./3rdparty/stout/include/stout/net.hpp:201
> #3  0x7f6399e1f273 in process::initialize (delegate=Unhandled dwarf 
> expression opcode 0xf3
> ) at src/process.cpp:837
> #4  0x0042342f in main ()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2748) /help generated links point to wrong URLs

2015-05-19 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-2748:
---

Assignee: haosdent

> /help generated links point to wrong URLs
> -
>
> Key: MESOS-2748
> URL: https://issues.apache.org/jira/browse/MESOS-2748
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.22.1
>Reporter: Marco Massenzio
>Assignee: haosdent
>Priority: Minor
>
> As reported by Michael Lunøe  (see also MESOS-329 and 
> MESOS-913 for background):
> {quote}
> In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, 
> which is then converted to html through a javascript library 
> All endpoints point to {{/help/...}}, they need to work dynamically for 
> reverse proxy to do its thing. {{/mesos/help}} works, and displays the 
> endpoints, but they each need to go to their respective {{/help/...}} 
> endpoint. 
> Note that this needs to work both for master, and for slaves. I think the 
> route to slaves help is something like this: 
> {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
> double check this.
> {quote}
> The fix appears to be not too complex (as it would require to simply 
> manipulate the generated URL) but a quick skim of the code would suggest that 
> something more substantial may be desirable too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2748) /help generated links point to wrong URLs

2015-05-19 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550877#comment-14550877
 ] 

haosdent commented on MESOS-2748:
-

Yes, thank you for your explain. Let me try fix it.

> /help generated links point to wrong URLs
> -
>
> Key: MESOS-2748
> URL: https://issues.apache.org/jira/browse/MESOS-2748
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.22.1
>Reporter: Marco Massenzio
>Priority: Minor
>
> As reported by Michael Lunøe  (see also MESOS-329 and 
> MESOS-913 for background):
> {quote}
> In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, 
> which is then converted to html through a javascript library 
> All endpoints point to {{/help/...}}, they need to work dynamically for 
> reverse proxy to do its thing. {{/mesos/help}} works, and displays the 
> endpoints, but they each need to go to their respective {{/help/...}} 
> endpoint. 
> Note that this needs to work both for master, and for slaves. I think the 
> route to slaves help is something like this: 
> {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
> double check this.
> {quote}
> The fix appears to be not too complex (as it would require to simply 
> manipulate the generated URL) but a quick skim of the code would suggest that 
> something more substantial may be desirable too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2340) Publish JSON in ZK instead of serialized MasterInfo

2015-05-19 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550857#comment-14550857
 ] 

haosdent commented on MESOS-2340:
-

I think create the protobuf node and json node at the same time maybe more 
clear. And zookeeper have multi operation api. 
https://github.com/apache/zookeeper/blob/trunk/src/c/tests/TestMulti.cc#L282-L284
 But it is also OK for me if you decide add separate process to watch the node. 
I would try to implement this after you have a conclusion. :-)

> Publish JSON in ZK instead of serialized MasterInfo
> ---
>
> Key: MESOS-2340
> URL: https://issues.apache.org/jira/browse/MESOS-2340
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Zameer Manji
>Assignee: haosdent
>
> Currently to discover the master a client needs the ZK node location and 
> access to the MasterInfo protobuf so it can deserialize the binary blob in 
> the node.
> I think it would be nice to publish JSON (like Twitter's ServerSets) so 
> clients are not tied to protobuf to do service discovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2735) Change the interaction between the slave and the resource estimator from polling to pushing

2015-05-19 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550838#comment-14550838
 ] 

Jie Yu commented on MESOS-2735:
---

{quote} One of the advantages that we had discussed in the past was that the 
pull model enables us to move as fast as we possibly can, rather than just 
getting a bunch of messages queued up in the slave that we have to process. 
{quote}

I don't think there is a difference in terms of queueing messages. The pull 
model also queues messages in the slave (e.g., 
'estimator->oversubscribed().then(defer(...))' also queues messages in slave's 
queue).

{quote} Even if we want to collect more fine-grained resource estimations a 
ResourceEstimator could do this and store this information until future polls. 
{quote}

I think there's no fundamental difference between the pull and the push model. 
The are only two subtle differences between the two: 1) the push model makes 
less assumptions about the slave behavior. 2) the push model is safer in the 
face of bad behaved resource estimator. Let me elaborate both of them below:

Regarding (1), let's use an example. Say we want to write a resource estimator 
which sends constant number of cpus (say 2 cpus) every 10 seconds. If we use a 
push model, we could just follow the 
[NoopResourceEstimatorProcess|https://github.com/apache/mesos/blob/master/src/slave/resource_estimator.cpp#L52]
 implementation in the code. Basically, we fork a libprocess and invoke the 
registered callback every 10 seconds with 2 cpus.

Now, if we use a pull model, we first need to make an assumption that the slave 
pull the resource estimator as fast as it can without any delay. If there's a 
delay say 1 second, the resource estimator needs to adjust its internal delay 
to be 9 seconds so that the total interval between two estimations is 10 
seconds apart. When implementing the `Future oversubscribed()` 
interface, the module writer needs to make another assumption about the slave 
that the slave will not invoke the interface again if the previous estimation 
is still pending. This is important because otherwise, the module writer needs 
to maintain a list of Promises (instead of just one). I just feels that 
there're so many implicit assumptions that the module writer needs to make in a 
pull model.

Regarding (2), as I already stated in this ticket, since the slave invoked the 
interface ('oversubscribed()') in its context, the module writer needs to make 
sure the implementation of the interface does not block, otherwise the slave 
will hang. An alternative is to use 'async' while invoking the interface in the 
slave. I just feel this is rather not necessary if we use a push model.

> Change the interaction between the slave and the resource estimator from 
> polling to pushing 
> 
>
> Key: MESOS-2735
> URL: https://issues.apache.org/jira/browse/MESOS-2735
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: twitter
>
> This will make the semantics more clear. The resource estimator can control 
> the speed of sending resources estimation to the slave.
> To avoid cyclic dependency, slave will register a callback with the resource 
> estimator and the resource estimator will simply invoke that callback when 
> there's a new estimation ready. The callback will be a defer to the slave's 
> main event queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2741) Exposing Resources along with ResourceStatistics from resource monitor

2015-05-19 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550825#comment-14550825
 ] 

haosdent commented on MESOS-2741:
-

And form implement this issue, how about change the interface in Container from 
{code}
virtual process::Future usage(
  const ContainerID& containerId) = 0;
{code}

to

{code}
virtual process::Future usage(
  const ContainerID& containerId) = 0;
{code}

> Exposing Resources along with ResourceStatistics from resource monitor
> --
>
> Key: MESOS-2741
> URL: https://issues.apache.org/jira/browse/MESOS-2741
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>  Labels: mesosphere, twitter
>
> Right now, the resource monitor returns a Usage which contains ContainerId, 
> ExecutorInfo and ResourceStatistics. In order for resource estimator/qos 
> controller to calculate usage slack, or tell if a container is using 
> revokable resources or not, we need to expose the Resources that are 
> currently assigned to the container.
> This requires us the change the containerizer interface to get the Resources 
> as well while calling 'usage()'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-2748) /help generated links point to wrong URLs

2015-05-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550809#comment-14550809
 ] 

Michael Lunøe edited comment on MESOS-2748 at 5/19/15 5:25 PM:
---

[~haosd...@gmail.com] Yes, the problem is exactly the use of absolute paths. 
{{/mesos/help}} works (showing a page with urls), but the urls listed are 
absolute paths, i.e. {{/help/metrics}} or {{/help/\_\_processess\_\_}}. If it 
were to use relative paths, so they would show correct paths: 
{{/mesos/help/metrics}} and  {{/mesos/help/\_\_processess\_\_}} in stead. Does 
that answer your question?


was (Author: mlunoe):
[~haosd...@gmail.com] Yes, the problem is exactly the use of absolute paths. 
{{/mesos/help}} works (showing a page with urls), but the urls listed are 
absolute paths, i.e. {{/help/metrics}} or {{/help/__processes__}}. If it were 
to use relative paths, so they would show correct paths: 
{{/mesos/help/metrics}} and  {{/mesos/help/__processes__}} in stead. Does that 
answer your question?

> /help generated links point to wrong URLs
> -
>
> Key: MESOS-2748
> URL: https://issues.apache.org/jira/browse/MESOS-2748
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.22.1
>Reporter: Marco Massenzio
>Priority: Minor
>
> As reported by Michael Lunøe  (see also MESOS-329 and 
> MESOS-913 for background):
> {quote}
> In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, 
> which is then converted to html through a javascript library 
> All endpoints point to {{/help/...}}, they need to work dynamically for 
> reverse proxy to do its thing. {{/mesos/help}} works, and displays the 
> endpoints, but they each need to go to their respective {{/help/...}} 
> endpoint. 
> Note that this needs to work both for master, and for slaves. I think the 
> route to slaves help is something like this: 
> {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
> double check this.
> {quote}
> The fix appears to be not too complex (as it would require to simply 
> manipulate the generated URL) but a quick skim of the code would suggest that 
> something more substantial may be desirable too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-2748) /help generated links point to wrong URLs

2015-05-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550809#comment-14550809
 ] 

Michael Lunøe edited comment on MESOS-2748 at 5/19/15 5:22 PM:
---

[~haosd...@gmail.com] Yes, the problem is exactly the use of absolute paths. 
{{/mesos/help}} works (showing a page with urls), but the urls listed are 
absolute paths, i.e. {{/help/metrics}} or {{/help/__processes__}}. If it were 
to use relative paths, so they would show correct paths: 
{{/mesos/help/metrics}} and  {{/mesos/help/__processes__}} in stead. Does that 
answer your question?


was (Author: mlunoe):
[~haosd...@gmail.com] Yes, the problem is exactly the use of absolute paths. 
"/mesos/help" works (showing a page with urls), but the urls listed are 
absolute paths, i.e. "/help/metrics" or "/help/__processes__". If it were to 
use relative paths, so they would show correct paths: "/mesos/help/metrics" and 
 "/mesos/help/__processes__" in stead. Does that answer your question?

> /help generated links point to wrong URLs
> -
>
> Key: MESOS-2748
> URL: https://issues.apache.org/jira/browse/MESOS-2748
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.22.1
>Reporter: Marco Massenzio
>Priority: Minor
>
> As reported by Michael Lunøe  (see also MESOS-329 and 
> MESOS-913 for background):
> {quote}
> In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, 
> which is then converted to html through a javascript library 
> All endpoints point to {{/help/...}}, they need to work dynamically for 
> reverse proxy to do its thing. {{/mesos/help}} works, and displays the 
> endpoints, but they each need to go to their respective {{/help/...}} 
> endpoint. 
> Note that this needs to work both for master, and for slaves. I think the 
> route to slaves help is something like this: 
> {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
> double check this.
> {quote}
> The fix appears to be not too complex (as it would require to simply 
> manipulate the generated URL) but a quick skim of the code would suggest that 
> something more substantial may be desirable too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2748) /help generated links point to wrong URLs

2015-05-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550809#comment-14550809
 ] 

Michael Lunøe commented on MESOS-2748:
--

[~haosd...@gmail.com] Yes, the problem is exactly the use of absolute paths. 
"/mesos/help" works (showing a page with urls), but the urls listed are 
absolute paths, i.e. "/help/metrics" or "/help/__processes__". If it were to 
use relative paths, so they would show correct paths: "/mesos/help/metrics" and 
 "/mesos/help/__processes__" in stead. Does that answer your question?

> /help generated links point to wrong URLs
> -
>
> Key: MESOS-2748
> URL: https://issues.apache.org/jira/browse/MESOS-2748
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.22.1
>Reporter: Marco Massenzio
>Priority: Minor
>
> As reported by Michael Lunøe  (see also MESOS-329 and 
> MESOS-913 for background):
> {quote}
> In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, 
> which is then converted to html through a javascript library 
> All endpoints point to {{/help/...}}, they need to work dynamically for 
> reverse proxy to do its thing. {{/mesos/help}} works, and displays the 
> endpoints, but they each need to go to their respective {{/help/...}} 
> endpoint. 
> Note that this needs to work both for master, and for slaves. I think the 
> route to slaves help is something like this: 
> {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
> double check this.
> {quote}
> The fix appears to be not too complex (as it would require to simply 
> manipulate the generated URL) but a quick skim of the code would suggest that 
> something more substantial may be desirable too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2748) /help generated links point to wrong URLs

2015-05-19 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Lunøe updated MESOS-2748:
-
Description: 
As reported by Michael Lunøe  (see also MESOS-329 and 
MESOS-913 for background):

{quote}
In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, which 
is then converted to html through a javascript library 

All endpoints point to {{/help/...}}, they need to work dynamically for reverse 
proxy to do its thing. {{/mesos/help}} works, and displays the endpoints, but 
they each need to go to their respective {{/help/...}} endpoint. 

Note that this needs to work both for master, and for slaves. I think the route 
to slaves help is something like this: 
{{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
double check this.
{quote}

The fix appears to be not too complex (as it would require to simply manipulate 
the generated URL) but a quick skim of the code would suggest that something 
more substantial may be desirable too.

  was:
As reported by Michael Lunøe  (see also MESOS-329 and 
MESOS-913 for background):

{quote}
In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, which 
is then converted to html through a javascript library 

All endpoints point to {{/help/...}}, they need to work dynamically for reverse 
proxy to do its thing. {{/mesos/help}} works, and displays the endpoints, but 
they each need to go to their respective {{/mesos/help/...}} endpoint. 

Note that this needs to work both for master, and for slaves. I think the route 
to slaves help is something like this: 
{{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
double check this.
{quote}

The fix appears to be not too complex (as it would require to simply manipulate 
the generated URL) but a quick skim of the code would suggest that something 
more substantial may be desirable too.


> /help generated links point to wrong URLs
> -
>
> Key: MESOS-2748
> URL: https://issues.apache.org/jira/browse/MESOS-2748
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.22.1
>Reporter: Marco Massenzio
>Priority: Minor
>
> As reported by Michael Lunøe  (see also MESOS-329 and 
> MESOS-913 for background):
> {quote}
> In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, 
> which is then converted to html through a javascript library 
> All endpoints point to {{/help/...}}, they need to work dynamically for 
> reverse proxy to do its thing. {{/mesos/help}} works, and displays the 
> endpoints, but they each need to go to their respective {{/help/...}} 
> endpoint. 
> Note that this needs to work both for master, and for slaves. I think the 
> route to slaves help is something like this: 
> {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
> double check this.
> {quote}
> The fix appears to be not too complex (as it would require to simply 
> manipulate the generated URL) but a quick skim of the code would suggest that 
> something more substantial may be desirable too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2588) Create pre-create hook before a Docker container launches

2015-05-19 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550779#comment-14550779
 ] 

haosdent commented on MESOS-2588:
-

[~baotiao]Sorry for not update this issue quickly. I unassigned it now.

> Create pre-create hook before a Docker container launches
> -
>
> Key: MESOS-2588
> URL: https://issues.apache.org/jira/browse/MESOS-2588
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Reporter: Timothy Chen
>
> To be able to support custom actions to be called before launching a docker 
> contianer, we should create a hook that can be extensible and allow 
> module/hooks to be performed before a docker container is launched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2588) Create pre-create hook before a Docker container launches

2015-05-19 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-2588:

Assignee: (was: haosdent)

> Create pre-create hook before a Docker container launches
> -
>
> Key: MESOS-2588
> URL: https://issues.apache.org/jira/browse/MESOS-2588
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Reporter: Timothy Chen
>
> To be able to support custom actions to be called before launching a docker 
> contianer, we should create a hook that can be extensible and allow 
> module/hooks to be performed before a docker container is launched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2588) Create pre-create hook before a Docker container launches

2015-05-19 Thread chenzongzhi (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550765#comment-14550765
 ] 

chenzongzhi commented on MESOS-2588:


Hey haosdent, Adam Avilla.  
Do you have any plan about this issue?
We really need this feature, so if you don't have time, maybe you can assign 
this feature to me.


> Create pre-create hook before a Docker container launches
> -
>
> Key: MESOS-2588
> URL: https://issues.apache.org/jira/browse/MESOS-2588
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Reporter: Timothy Chen
>Assignee: haosdent
>
> To be able to support custom actions to be called before launching a docker 
> contianer, we should create a hook that can be extensible and allow 
> module/hooks to be performed before a docker container is launched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2731) Allow frameworks to deploy storage drivers on demand.

2015-05-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2731:
--
Description: 
Certain storage options require storage drivers to access them including HDFS 
driver, Quobyte client, Database driver, and so on.
When Tasks in Mesos require access to such storage they also need access to the 
respective driver on the node where they were scheduled to.
As it is not desirable to deploy the driver onto all nodes in the cluster, it 
would be good to deploy the driver on demand.

Use Cases:
1. Fetcher Cache pulling resources from user-provided URIs
2. Framework executors/tasks requiring r/w access to HDFS/DFS
3. Framework executors/tasks requiring r/w Databases access (requiring drivers)



  was:
Certain storage options require storage drivers to access them including HDFS 
driver, Quobyte client, Database driver, and so on.
When Tasks in Mesos require access to such storage they also need access to the 
respective driver on the node where they were scheduled to.
As it is not desirable to deploy the driver onto all nodes in the cluster, it 
would be good to deploy the driver on demand.

Use Cases:
1. Fetcher Cache accessing resources from user-provided URIs
2. Framework executors/tasks requiring access to HDFS/DFS
3. Framework executors/tasks requiring Databases access (requiring drivers)




> Allow frameworks to deploy storage drivers on demand.
> -
>
> Key: MESOS-2731
> URL: https://issues.apache.org/jira/browse/MESOS-2731
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>  Labels: mesosphere
>
> Certain storage options require storage drivers to access them including HDFS 
> driver, Quobyte client, Database driver, and so on.
> When Tasks in Mesos require access to such storage they also need access to 
> the respective driver on the node where they were scheduled to.
> As it is not desirable to deploy the driver onto all nodes in the cluster, it 
> would be good to deploy the driver on demand.
> Use Cases:
> 1. Fetcher Cache pulling resources from user-provided URIs
> 2. Framework executors/tasks requiring r/w access to HDFS/DFS
> 3. Framework executors/tasks requiring r/w Databases access (requiring 
> drivers)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2731) Allow frameworks to deploy storage drivers on demand.

2015-05-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2731:
--
Labels: mesosphere  (was: )

> Allow frameworks to deploy storage drivers on demand.
> -
>
> Key: MESOS-2731
> URL: https://issues.apache.org/jira/browse/MESOS-2731
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>  Labels: mesosphere
>
> Certain storage options require storage drivers to access them including HDFS 
> driver, Quobyte client, Database driver, and so on.
> When Tasks in Mesos require access to such storage they also need access to 
> the respective driver on the node where they were scheduled to.
> As it is not desirable to deploy the driver onto all nodes in the cluster, it 
> would be good to deploy the driver on demand.
> Use Cases:
> 1. Fetcher Cache accessing resources from user-provided URIs
> 2. Framework executors/tasks requiring access to HDFS/DFS
> 3. Framework executors/tasks requiring Databases access (requiring drivers)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2728) Introduce concept of cluster wide resources.

2015-05-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2728:
--
Description: 
There are resources which are not provided by a single node. Consider for 
example a external Network Bandwidth of a cluster. Being a limited resource it 
makes sense for Mesos to manage it but still it is not a resource being offered 
by a single node. A cluster-wide resource is still consumed by a task, and when 
that task completes, the resources are then available to be allocated to 
another framework/task.

Use Cases:
1. Network Bandwidth
2. IP Addresses
3. Global Service Ports
2. Distributed File System Storage
3. Software Licences




  was:
There are resources which are not provided by a single node. Consider for 
example a external Network Bandwidth of a cluster. Being a limited resource it 
makes sense for Mesos to manage it but still it is not a resource being offered 
by a single node.

Use Cases:
1. Network Bandwidth
2. IP Addresses
3. Global Service Ports
2. Distributed File System Storage
3. Software Licences





> Introduce concept of cluster wide resources.
> 
>
> Key: MESOS-2728
> URL: https://issues.apache.org/jira/browse/MESOS-2728
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>  Labels: mesosphere
>
> There are resources which are not provided by a single node. Consider for 
> example a external Network Bandwidth of a cluster. Being a limited resource 
> it makes sense for Mesos to manage it but still it is not a resource being 
> offered by a single node. A cluster-wide resource is still consumed by a 
> task, and when that task completes, the resources are then available to be 
> allocated to another framework/task.
> Use Cases:
> 1. Network Bandwidth
> 2. IP Addresses
> 3. Global Service Ports
> 2. Distributed File System Storage
> 3. Software Licences



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2732) Expose Mount Tables

2015-05-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2732:
--
Description: 
When there are multiple distributed/network-attached filesystems connected to a 
Mesos cluster, clients (e.g. the Mesos fetcher, or a Mesos task) of those 
filesystems need a clear way to distinguish between them and Mesos needs a way 
to direct requests to the correct (distributed) filesystem.

_Use Cases_:
 - Multiple HDFS clusters on the same Mesos cluster
 - Connecting HDFS, MapRFS, Ceph, Lustre, GlusterFS, S3, GCS, and other SAN/NAS 
to a Mesos cluster
 - The Mesos fetcher may want to pull from any of the above.
 - An executor or task may want to read or write to multiple filesystems, 
within the same process.

_Traditional Operating System Analogy_:
Each line in Linux's fstab describes a different filesystem to mount into the 
root filesystem:

 1. The device name or remote filesystem to be mounted.
 2. The mount point, where the data is to be attached to the root file system.
 3. The file system type or algorithm used to interpret the file system.
 4. Options to be used when mounting (e.g. Read-Only).

_What we need for each filesystem in the Mesos ecosystem_:

 1. The metadata server or dfs/san entrypoint host:port
 2. Mount point, where this filesystem fits into the universal Mesos-accessible 
filesystem namespace.
 3. The protocol to speak, perhaps acceptable URI prefixes.
 4. Options, ACLs for which frameworks/principals can access a particular 
filesystem, and how.

  was:
When there are multiple distributed filesystems connected to a Mesos cluster, 
clients (e.g. the Mesos fetcher, or a Mesos task) of those filesystems need a 
clear way to distinguish between them and Mesos needs a way to direct requests 
to the correct (distributed) filesystem.

#Use Cases:
 - Multiple HDFS clusters on the same Mesos cluster
 - Connecting HDFS, MapRFS, Ceph, Lustre, GlusterFS, S3, GCS, and other SAN/NAS 
to a Mesos cluster
 - The Mesos fetcher may want to pull from any of the above.
 - An executor or task may want to read or write to multiple filesystems, 
within the same process.

#Traditional Operating System Analogy:
Each line in Linux's fstab describes a different filesystem to mount into the 
root filesystem:

1. The device name or remote filesystem to be mounted.
2. The mount point, where the data is to be attached to the root file system.
3. The file system type or algorithm used to interpret the file system.
4. Options to be used when mounting (e.g. Read-Only).

What we need for each filesystem in the Mesos ecosystem:

1. The metadata server or dfs/san entrypoint host:port
2. Mount point, where this filesystem fits into the universal Mesos-accessible 
filesystem namespace.
3. The protocol to speak, perhaps acceptable URI prefixes.
4. Options, ACLs for which frameworks/principals can access a particular 
filesystem, and how.


> Expose Mount Tables
> ---
>
> Key: MESOS-2732
> URL: https://issues.apache.org/jira/browse/MESOS-2732
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>  Labels: mesosphere
>
> When there are multiple distributed/network-attached filesystems connected to 
> a Mesos cluster, clients (e.g. the Mesos fetcher, or a Mesos task) of those 
> filesystems need a clear way to distinguish between them and Mesos needs a 
> way to direct requests to the correct (distributed) filesystem.
> _Use Cases_:
>  - Multiple HDFS clusters on the same Mesos cluster
>  - Connecting HDFS, MapRFS, Ceph, Lustre, GlusterFS, S3, GCS, and other 
> SAN/NAS to a Mesos cluster
>  - The Mesos fetcher may want to pull from any of the above.
>  - An executor or task may want to read or write to multiple filesystems, 
> within the same process.
> _Traditional Operating System Analogy_:
> Each line in Linux's fstab describes a different filesystem to mount into the 
> root filesystem:
>  1. The device name or remote filesystem to be mounted.
>  2. The mount point, where the data is to be attached to the root file system.
>  3. The file system type or algorithm used to interpret the file system.
>  4. Options to be used when mounting (e.g. Read-Only).
> _What we need for each filesystem in the Mesos ecosystem_:
>  1. The metadata server or dfs/san entrypoint host:port
>  2. Mount point, where this filesystem fits into the universal 
> Mesos-accessible filesystem namespace.
>  3. The protocol to speak, perhaps acceptable URI prefixes.
>  4. Options, ACLs for which frameworks/principals can access a particular 
> filesystem, and how.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2732) Expose Mount Tables

2015-05-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2732:
--
Labels: mesosphere  (was: )

> Expose Mount Tables
> ---
>
> Key: MESOS-2732
> URL: https://issues.apache.org/jira/browse/MESOS-2732
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>  Labels: mesosphere
>
> When there are multiple distributed filesystems connected to a Mesos cluster, 
> clients (e.g. the Mesos fetcher, or a Mesos task) of those filesystems need a 
> clear way to distinguish between them and Mesos needs a way to direct 
> requests to the correct (distributed) filesystem.
> #Use Cases:
>  - Multiple HDFS clusters on the same Mesos cluster
>  - Connecting HDFS, MapRFS, Ceph, Lustre, GlusterFS, S3, GCS, and other 
> SAN/NAS to a Mesos cluster
>  - The Mesos fetcher may want to pull from any of the above.
>  - An executor or task may want to read or write to multiple filesystems, 
> within the same process.
> #Traditional Operating System Analogy:
> Each line in Linux's fstab describes a different filesystem to mount into the 
> root filesystem:
> 1. The device name or remote filesystem to be mounted.
> 2. The mount point, where the data is to be attached to the root file system.
> 3. The file system type or algorithm used to interpret the file system.
> 4. Options to be used when mounting (e.g. Read-Only).
> What we need for each filesystem in the Mesos ecosystem:
> 1. The metadata server or dfs/san entrypoint host:port
> 2. Mount point, where this filesystem fits into the universal 
> Mesos-accessible filesystem namespace.
> 3. The protocol to speak, perhaps acceptable URI prefixes.
> 4. Options, ACLs for which frameworks/principals can access a particular 
> filesystem, and how.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2741) Exposing Resources along with ResourceStatistics from resource monitor

2015-05-19 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550708#comment-14550708
 ] 

haosdent commented on MESOS-2741:
-

`calculate usage slack` or ` calculate usage stack` 

> Exposing Resources along with ResourceStatistics from resource monitor
> --
>
> Key: MESOS-2741
> URL: https://issues.apache.org/jira/browse/MESOS-2741
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>  Labels: mesosphere, twitter
>
> Right now, the resource monitor returns a Usage which contains ContainerId, 
> ExecutorInfo and ResourceStatistics. In order for resource estimator/qos 
> controller to calculate usage slack, or tell if a container is using 
> revokable resources or not, we need to expose the Resources that are 
> currently assigned to the container.
> This requires us the change the containerizer interface to get the Resources 
> as well while calling 'usage()'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2745) Add 'Path' to stout's user guide

2015-05-19 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550687#comment-14550687
 ] 

haosdent commented on MESOS-2745:
-

Review board: https://reviews.apache.org/r/34416/

> Add 'Path' to stout's user guide 
> -
>
> Key: MESOS-2745
> URL: https://issues.apache.org/jira/browse/MESOS-2745
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Till Toenshoff
>  Labels: newbie
>
> stout's README does not yet include 'Path', we should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2340) Publish JSON in ZK instead of serialized MasterInfo

2015-05-19 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550654#comment-14550654
 ] 

Marco Massenzio commented on MESOS-2340:


I'm not familiar with the {{multi}} operation, however, thinking a bit more 
about this, it turns out the solution should be simpler: post ephemeral node 
creation, create another "mirror" JSON-content znode, equally ephemeral, that 
will go away whenever the original PB-content znode does.

This seems a simple enough approach (and, as such, I'm sure I'm overlooking 
something!)

I'm looking into the code, and it seems to me that the 
{{GroupProcess::doJoin()}} is the place to do this (maybe?)

> Publish JSON in ZK instead of serialized MasterInfo
> ---
>
> Key: MESOS-2340
> URL: https://issues.apache.org/jira/browse/MESOS-2340
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Zameer Manji
>Assignee: haosdent
>
> Currently to discover the master a client needs the ZK node location and 
> access to the MasterInfo protobuf so it can deserialize the binary blob in 
> the node.
> I think it would be nice to publish JSON (like Twitter's ServerSets) so 
> clients are not tied to protobuf to do service discovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2748) /help generated links point to wrong URLs

2015-05-19 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550642#comment-14550642
 ] 

haosdent commented on MESOS-2748:
-

Hi, [~marco-mesos]. I am sorry for could not got your idea here. Do you mean 
"/help" endpoint is a absolute path and could not work when user want to show 
as "/mesos/help" after reverse proxy? In nginx, could add a rewrite rule to 
solve this problem.  

> /help generated links point to wrong URLs
> -
>
> Key: MESOS-2748
> URL: https://issues.apache.org/jira/browse/MESOS-2748
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.22.1
>Reporter: Marco Massenzio
>Priority: Minor
>
> As reported by Michael Lunøe  (see also MESOS-329 and 
> MESOS-913 for background):
> {quote}
> In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, 
> which is then converted to html through a javascript library 
> All endpoints point to {{/help/...}}, they need to work dynamically for 
> reverse proxy to do its thing. {{/mesos/help}} works, and displays the 
> endpoints, but they each need to go to their respective {{/mesos/help/...}} 
> endpoint. 
> Note that this needs to work both for master, and for slaves. I think the 
> route to slaves help is something like this: 
> {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please 
> double check this.
> {quote}
> The fix appears to be not too complex (as it would require to simply 
> manipulate the generated URL) but a quick skim of the code would suggest that 
> something more substantial may be desirable too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2735) Change the interaction between the slave and the resource estimator from polling to pushing

2015-05-19 Thread Benjamin Hindman (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550575#comment-14550575
 ] 

Benjamin Hindman commented on MESOS-2735:
-

I'd also like to understand better why to go push instead of pull (poll). One 
of the advantages that we had discussed in the past was that the pull model 
enables us to move as fast as we possibly can, rather than just getting a bunch 
of messages queued up in the slave that we have to process. Even if we want to 
collect more fine-grained resource estimations a ResourceEstimator could do 
this and store this information until future polls.

> Change the interaction between the slave and the resource estimator from 
> polling to pushing 
> 
>
> Key: MESOS-2735
> URL: https://issues.apache.org/jira/browse/MESOS-2735
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: twitter
>
> This will make the semantics more clear. The resource estimator can control 
> the speed of sending resources estimation to the slave.
> To avoid cyclic dependency, slave will register a callback with the resource 
> estimator and the resource estimator will simply invoke that callback when 
> there's a new estimation ready. The callback will be a defer to the slave's 
> main event queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2747) Add "watch" to the state abstraction

2015-05-19 Thread Connor Doyle (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Connor Doyle updated MESOS-2747:

Description: 
Use case: Frameworks that intend to survive failover tend to implement leader 
election.  Adding the ability to listen for changes to a variable's value could 
be a first step towards reusable leader election libraries that don't depend on 
a particular backing store.

cc [~kozyraki]

  was:
Use case: Frameworks that intend to survive failover tend to implement leader 
election.  Watchable storage could be a first step towards reusable leader 
election libraries that don't depend on a particular backing store.

cc [~kozyraki]


> Add "watch" to the state abstraction
> 
>
> Key: MESOS-2747
> URL: https://issues.apache.org/jira/browse/MESOS-2747
> Project: Mesos
>  Issue Type: Wish
>  Components: c++ api, java api
>Reporter: Connor Doyle
>Priority: Minor
>  Labels: mesosphere
>
> Use case: Frameworks that intend to survive failover tend to implement leader 
> election.  Adding the ability to listen for changes to a variable's value 
> could be a first step towards reusable leader election libraries that don't 
> depend on a particular backing store.
> cc [~kozyraki]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-708) Static files missing "Last-Modified" HTTP headers

2015-05-19 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320300#comment-14320300
 ] 

Alexander Rojas edited comment on MESOS-708 at 5/19/15 11:41 AM:
-

https://reviews.apache.org/r/34392/
https://reviews.apache.org/r/30032/


was (Author: arojas):
https://reviews.apache.org/r/30032/

> Static files missing "Last-Modified" HTTP headers
> -
>
> Key: MESOS-708
> URL: https://issues.apache.org/jira/browse/MESOS-708
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess, webui
>Affects Versions: 0.13.0
>Reporter: Ross Allen
>Assignee: Alexander Rojas
>  Labels: mesosphere
>
> Static assets served by the Mesos master don't return "Last-Modified" HTTP 
> headers. That means clients receive a 200 status code and re-download assets 
> on every page request even if the assets haven't changed. Because Angular JS 
> does most of the work, the downloading happens only when you navigate to 
> Mesos master in your browser or use the browser's refresh.
> Example header for "mesos.css":
> HTTP/1.1 200 OK
> Date: Thu, 26 Sep 2013 17:18:52 GMT
> Content-Length: 1670
> Content-Type: text/css
> Clients sometimes use the "Date" header for the same effect as 
> "Last-Modified", but the date is always the time of the response from the 
> server, i.e. it changes on every request and makes the assets look new every 
> time.
> The "Last-Modified" header should be added and should be the last modified 
> time of the file. On subsequent requests for the same files, the master 
> should return 304 responses with no content rather than 200 with the full 
> files. It could save clients a lot of download time since Mesos assets are 
> rather heavyweight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)