[jira] [Closed] (MESOS-1172) Update system check (libev)
[ https://issues.apache.org/jira/browse/MESOS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy St. Clair closed MESOS-1172. Update system check (libev) --- Key: MESOS-1172 URL: https://issues.apache.org/jira/browse/MESOS-1172 Project: Mesos Issue Type: Bug Components: build Affects Versions: 0.19.0 Reporter: Timothy St. Clair Assignee: Timothy St. Clair Fix For: 0.20.0 Clean up libev detection to follow https://issues.apache.org/jira/browse/MESOS-1071 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1585) Container level network isolation
[ https://issues.apache.org/jira/browse/MESOS-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062159#comment-14062159 ] Timothy St. Clair commented on MESOS-1585: -- [~benjaminhindman] [~jieyu] Is there a map between the native work being done here, and possible iptables mods in Docker containers? Arguably you can easily fudge iptables of a Docker container to get similar behavior, and I believe this is the roadmap for their QoS tiers in Kubernetes. Container level network isolation - Key: MESOS-1585 URL: https://issues.apache.org/jira/browse/MESOS-1585 Project: Mesos Issue Type: Epic Components: isolation Reporter: Jie Yu The goal here is to provide network isolation between containers so that one container cannot saturate the entire network, affecting the performance of other containers. There are many options here. With the current network monitoring code (MESOS-1228, already committed), one option is to add a tc police action on the 'veth' of each container to drop packets when the traffic exceeds a certain limit. Other options include advanced shape control using tc classes (e.g., HTB, CBQ, etc.). We're gonna need to extend the current routing library to support that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MESOS-1593) Add DockerInfo Configuration
[ https://issues.apache.org/jira/browse/MESOS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated MESOS-1593: Description: We want to add a new proto message to encapsulate all Docker related configurations into DockerInfo. Here is the document that describes the design for DockerInfo: https://github.com/tnachen/mesos/wiki/DockerInfo-design was:We want to add a new proto message to encapsulate all Docker related configurations into DockerInfo. Add DockerInfo Configuration Key: MESOS-1593 URL: https://issues.apache.org/jira/browse/MESOS-1593 Project: Mesos Issue Type: Task Reporter: Timothy Chen Assignee: Timothy Chen We want to add a new proto message to encapsulate all Docker related configurations into DockerInfo. Here is the document that describes the design for DockerInfo: https://github.com/tnachen/mesos/wiki/DockerInfo-design -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MESOS-1466) Race between executor exited event and launch task can cause overcommit of resources
[ https://issues.apache.org/jira/browse/MESOS-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-1466: --- Labels: reliability (was: ) Race between executor exited event and launch task can cause overcommit of resources Key: MESOS-1466 URL: https://issues.apache.org/jira/browse/MESOS-1466 Project: Mesos Issue Type: Bug Components: allocation, master Reporter: Vinod Kone Labels: reliability The following sequence of events can cause an overcommit -- Launch task is called for a task whose executor is already running -- Executor's resources are not accounted for on the master -- Executor exits and the event is enqueued behind launch tasks on the master -- Master sends the task to the slave which needs to commit for resources for task and the (new) executor. -- Master processes the executor exited event and re-offers the executor's resources causing an overcommit of resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MESOS-1600) IP classifiers in routing lib should ignore IP packets with IP options
Jie Yu created MESOS-1600: - Summary: IP classifiers in routing lib should ignore IP packets with IP options Key: MESOS-1600 URL: https://issues.apache.org/jira/browse/MESOS-1600 Project: Mesos Issue Type: Task Reporter: Jie Yu Currently, the IP classifiers simply assume that all IP packets do not have IP options. If an IP packet has IP options, the current behavior is undefined (e.g., might be redirected/mirrored to a random link). We can solve that by adding one more rule to the filter requiring the length of the IP packet to be 5 bytes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MESOS-1601) Add metrics for port mapping network isolator
Jie Yu created MESOS-1601: - Summary: Add metrics for port mapping network isolator Key: MESOS-1601 URL: https://issues.apache.org/jira/browse/MESOS-1601 Project: Mesos Issue Type: Task Reporter: Jie Yu Assignee: Jie Yu We need to expose a few metrics so that we can monitor the behavior of the network isolator. For example, we need to know how many errors we have for adding/removing/updating various filters. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1601) Add metrics for port mapping network isolator
[ https://issues.apache.org/jira/browse/MESOS-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062484#comment-14062484 ] Jie Yu commented on MESOS-1601: --- https://reviews.apache.org/r/23492/ Add metrics for port mapping network isolator - Key: MESOS-1601 URL: https://issues.apache.org/jira/browse/MESOS-1601 Project: Mesos Issue Type: Task Reporter: Jie Yu Assignee: Jie Yu We need to expose a few metrics so that we can monitor the behavior of the network isolator. For example, we need to know how many errors we have for adding/removing/updating various filters. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1524) Implement Docker support in Mesos
[ https://issues.apache.org/jira/browse/MESOS-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062485#comment-14062485 ] Vinod Kone commented on MESOS-1524: --- Thanks for the doc [~tnachen]. Couldn't find a way to comment on the github wiki, so commenting here. Is the plan to kill ContainerInfo from CommandInfo in favor of DockerInfo? What user is the docker container launched as? If its customizable, you need to have a 'user' field in there. Why do you need repeated string args? Why can't it be just part of the command string? What is the rationale for DockerInfo to be alongside CommandInfo instead of replacing ContainerInfo? Implement Docker support in Mesos - Key: MESOS-1524 URL: https://issues.apache.org/jira/browse/MESOS-1524 Project: Mesos Issue Type: Epic Reporter: Tobi Knaup Assignee: Benjamin Hindman There have been two projects to add Docker support to Mesos, first via an executor, and more recently via an external containerizer written in Python - Deimos: https://github.com/mesosphere/deimos We've got a lot of feedback from folks who use Docker and Mesos, and the main wish was to make Docker a first class citizen in Mesos instead of a plugin that needs to be installed separately. Mesos has been using Linux containers for a long time, first via LXC, then via cgroups, and now also via the external containerizer. For a long time it wasn't clear what the winning technology would be, but with Docker becoming the de-facto standard for handling containers I think Mesos should make it a first class citizen and part of core. Let's use this JIRA to track wishes/feedback on the implementation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1585) Container level network isolation
[ https://issues.apache.org/jira/browse/MESOS-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062521#comment-14062521 ] Benjamin Hindman commented on MESOS-1585: - I haven't chatted with anyone (yet) about integrating this with the DockerContainerizer, but it's definitely possible and could be a nice win to get network isolation for Docker. Container level network isolation - Key: MESOS-1585 URL: https://issues.apache.org/jira/browse/MESOS-1585 Project: Mesos Issue Type: Epic Components: isolation Reporter: Jie Yu The goal here is to provide network isolation between containers so that one container cannot saturate the entire network, affecting the performance of other containers. There are many options here. With the current network monitoring code (MESOS-1228, already committed), one option is to add a tc police action on the 'veth' of each container to drop packets when the traffic exceeds a certain limit. Other options include advanced shape control using tc classes (e.g., HTB, CBQ, etc.). We're gonna need to extend the current routing library to support that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1593) Add DockerInfo Configuration
[ https://issues.apache.org/jira/browse/MESOS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062535#comment-14062535 ] Benjamin Hindman commented on MESOS-1593: - Thanks for posting the document [~tnachen]! I'd love to get some feedback from folks about how we might capture multiple Docker containers for the same task or executor. For example, we could easily make a task or executor be represented by a 'repeated DockerInfo' field, with the contract being that each of the Docker containers represented by each DockerInfo will be started, killed, and cleaned up as a single atomic unit for that task/executor! (We should also consider the semantics if just one of those Docker containers exits, i.e., restart it? kill everything? etc.) I think this is worth a discussion up front (even if we don't implement anything now, it would be nice to be prepared to do something like that in the future). There might be a more generic pod like abstraction that we want to introduce for DockerInfo or CommandInfo, but motivating it from the Docker perspective seems valuable since this seems to be a theme for deploying Docker containers. Add DockerInfo Configuration Key: MESOS-1593 URL: https://issues.apache.org/jira/browse/MESOS-1593 Project: Mesos Issue Type: Task Reporter: Timothy Chen Assignee: Timothy Chen We want to add a new proto message to encapsulate all Docker related configurations into DockerInfo. Here is the document that describes the design for DockerInfo: https://github.com/tnachen/mesos/wiki/DockerInfo-design -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1595) Provide a way to install libprocess
[ https://issues.apache.org/jira/browse/MESOS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062590#comment-14062590 ] Timothy St. Clair commented on MESOS-1595: -- I created a quick patch here: https://github.com/timothysc/mesos/tree/MESOS-1595 But I have 2 concerns: 1. libmesos contains the symbols, duping them into another library is kind of weird. 2. The headers leak the dep-graph. e.g. include glog/boost but they are not installed. Twitter should really consider using koji, given the level of CentOS usage. Provide a way to install libprocess --- Key: MESOS-1595 URL: https://issues.apache.org/jira/browse/MESOS-1595 Project: Mesos Issue Type: Story Reporter: Vinod Kone Assignee: Vinod Kone For C++ framework developers that want to use libprocess in their code base, it would be great if Mesos provides a way to easily get access to the headers. A first step in that direction would be to provide a install target in the libprocess Makefile for the same. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MESOS-1172) Update system check (libev)
[ https://issues.apache.org/jira/browse/MESOS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy St. Clair resolved MESOS-1172. -- Resolution: Fixed Update system check (libev) --- Key: MESOS-1172 URL: https://issues.apache.org/jira/browse/MESOS-1172 Project: Mesos Issue Type: Bug Components: build Affects Versions: 0.19.0 Reporter: Timothy St. Clair Assignee: Timothy St. Clair Fix For: 0.20.0 Clean up libev detection to follow https://issues.apache.org/jira/browse/MESOS-1071 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (MESOS-1173) Update system check (picojson)
[ https://issues.apache.org/jira/browse/MESOS-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy St. Clair reopened MESOS-1173: -- Update system check (picojson) -- Key: MESOS-1173 URL: https://issues.apache.org/jira/browse/MESOS-1173 Project: Mesos Issue Type: Bug Components: build Affects Versions: 0.19.0 Reporter: Timothy St. Clair Assignee: Timothy St. Clair Fix For: 0.20.0 Clean up picojson detection to follow https://issues.apache.org/jira/browse/MESOS-1071 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1593) Add DockerInfo Configuration
[ https://issues.apache.org/jira/browse/MESOS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062596#comment-14062596 ] Tobi Knaup commented on MESOS-1593: --- Note that Dockers are typically launched as root (but they don't have to be). So defaulting to root is probably a good thing. Add DockerInfo Configuration Key: MESOS-1593 URL: https://issues.apache.org/jira/browse/MESOS-1593 Project: Mesos Issue Type: Task Reporter: Timothy Chen Assignee: Timothy Chen We want to add a new proto message to encapsulate all Docker related configurations into DockerInfo. Here is the document that describes the design for DockerInfo: https://github.com/tnachen/mesos/wiki/DockerInfo-design -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MESOS-1602) Add compilation check for unbundled libev
Timothy St. Clair created MESOS-1602: Summary: Add compilation check for unbundled libev Key: MESOS-1602 URL: https://issues.apache.org/jira/browse/MESOS-1602 Project: Mesos Issue Type: Bug Components: build Affects Versions: 0.20.0 Reporter: Timothy St. Clair Assignee: Timothy St. Clair Per review breakout a check to ensure libev has been compiled with -DEV_CHILD_ENABLE=0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1330) Introduce connection/transport abstraction to stout
[ https://issues.apache.org/jira/browse/MESOS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062595#comment-14062595 ] Niklas Quarfot Nielsen commented on MESOS-1330: --- I have been working on a SSL port of libprocess lately and have now have a couple of hard earned experiences with interfacing with libev and doing a middle layer transport. TL;DR I'd recommend to look into ways to replace libev all together with a event system that supports SSL off-the-shelf. For example libuv or lib event. In the case of SSL, SSL buffer state and socket state are not necessarily tied (and is hard to do non-blocking fashion) and was very hard to play nicely in libprocess. That would also be a good time to clean up and document a bunch of libprocess code. Let me know what you think. Introduce connection/transport abstraction to stout --- Key: MESOS-1330 URL: https://issues.apache.org/jira/browse/MESOS-1330 Project: Mesos Issue Type: Improvement Components: general, libprocess Reporter: Niklas Quarfot Nielsen Labels: libprocess, network I think it makes sense to think in terms of different low or middle layer transports (which can accommodate channels like SSL). We could capture connection life-cycles and network send/receive primitives in a much explicit manner than currently in libprocess. I have a proof of concept transport / connection abstraction ready and which we can use to iterate a design. Notably, there are opportunities to change the current SocketManager/Socket abstractions to explicit ConnectionManager/Connection, which allow several and composeable communication layers. I am proposing to own this ticket and am looking for a shepherd to (thoroughly) go over design considerations before jumping into an actual implementation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1573) Having Problems running Mesosphere-Docker
[ https://issues.apache.org/jira/browse/MESOS-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062616#comment-14062616 ] Tobi Knaup commented on MESOS-1573: --- The first thing to check is log output from Mesos or Deimos. If you followed the tutorial the logs will be in syslog. The Docker integration code lives at https://github.com/mesosphere/deimos Please report Docker related issues there. Best place for real time help is #mesos. Having Problems running Mesosphere-Docker - Key: MESOS-1573 URL: https://issues.apache.org/jira/browse/MESOS-1573 Project: Mesos Issue Type: Bug Components: containerization, ec2, general Reporter: Nayeem Syed Priority: Blocker Labels: newbie I am not sure this is the best place to ask for but I tried checking the IRC channel which seemed quite empty and didnt find anywhere else to ask general user questions/problems. But would appreciate some pointers and directions on how I can get it up. I tried following the instructions set on here: http://mesosphere.io/learn/run-docker-on-mesosphere/ Instead of using a local machine (I am on osx), I setup a ubuntu 14 m3.large instance on my AWS acount then followed the instructions. I connected an elastic IP and a subdomain to the instance and opened all the ports on the firewall. However I am not getting any docker containers running. Here are my mesos and marathon urls: Marathon: mesos.cronycle.net:8080 Mesos: mesos.cronycle.net:5050 I just want to get a Rails application up on a docker container and be able to scale it automatically based on resource consumptions and also be able to use the procfile for it similar to Heroku. Is Mesos a good tool for this? I am currently looking at using Deis+CoreOS, but their statistics and monitoring tools seem to be non-existant, there's no automated way of monitoring processes like there is with marathon for instance. So would have liked to make it work ideally if possible. Thanks in advance. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1593) Add DockerInfo Configuration
[ https://issues.apache.org/jira/browse/MESOS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062634#comment-14062634 ] Timothy St. Clair commented on MESOS-1593: -- +1 re root default. Also re: [~benjaminhindman] comment - How does this resolve in the pod-esk model. Multiple containers - single executor. https://github.com/GoogleCloudPlatform/kubernetes/blob/master/DESIGN.md#pods Add DockerInfo Configuration Key: MESOS-1593 URL: https://issues.apache.org/jira/browse/MESOS-1593 Project: Mesos Issue Type: Task Reporter: Timothy Chen Assignee: Timothy Chen We want to add a new proto message to encapsulate all Docker related configurations into DockerInfo. Here is the document that describes the design for DockerInfo: https://github.com/tnachen/mesos/wiki/DockerInfo-design -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (MESOS-1595) Provide a way to install libprocess
[ https://issues.apache.org/jira/browse/MESOS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062590#comment-14062590 ] Timothy St. Clair edited comment on MESOS-1595 at 7/15/14 8:54 PM: --- I created a quick patch here: https://github.com/timothysc/mesos/tree/MESOS-1595 But I have 2 concerns: 1. libmesos contains the symbols, duping them into another library is kind of weird. 2. The headers leak the dep-graph. e.g. include glog/boost but they are not installed. Twitter should really consider using koji, given the level of CentOS usage. This was also one of the root reasons for all the un-bundling. was (Author: tstclair): I created a quick patch here: https://github.com/timothysc/mesos/tree/MESOS-1595 But I have 2 concerns: 1. libmesos contains the symbols, duping them into another library is kind of weird. 2. The headers leak the dep-graph. e.g. include glog/boost but they are not installed. Twitter should really consider using koji, given the level of CentOS usage. Provide a way to install libprocess --- Key: MESOS-1595 URL: https://issues.apache.org/jira/browse/MESOS-1595 Project: Mesos Issue Type: Story Reporter: Vinod Kone Assignee: Vinod Kone For C++ framework developers that want to use libprocess in their code base, it would be great if Mesos provides a way to easily get access to the headers. A first step in that direction would be to provide a install target in the libprocess Makefile for the same. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (MESOS-1600) IP classifiers in routing lib should ignore IP packets with IP options
[ https://issues.apache.org/jira/browse/MESOS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu reassigned MESOS-1600: - Assignee: Jie Yu IP classifiers in routing lib should ignore IP packets with IP options -- Key: MESOS-1600 URL: https://issues.apache.org/jira/browse/MESOS-1600 Project: Mesos Issue Type: Task Reporter: Jie Yu Assignee: Jie Yu Currently, the IP classifiers simply assume that all IP packets do not have IP options. If an IP packet has IP options, the current behavior is undefined (e.g., might be redirected/mirrored to a random link). We can solve that by adding one more rule to the filter requiring the length of the IP packet to be 5 bytes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1600) IP classifiers in routing lib should ignore IP packets with IP options
[ https://issues.apache.org/jira/browse/MESOS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062659#comment-14062659 ] Jie Yu commented on MESOS-1600: --- https://reviews.apache.org/r/23525/ IP classifiers in routing lib should ignore IP packets with IP options -- Key: MESOS-1600 URL: https://issues.apache.org/jira/browse/MESOS-1600 Project: Mesos Issue Type: Task Reporter: Jie Yu Currently, the IP classifiers simply assume that all IP packets do not have IP options. If an IP packet has IP options, the current behavior is undefined (e.g., might be redirected/mirrored to a random link). We can solve that by adding one more rule to the filter requiring the length of the IP packet to be 5 bytes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1517) Maintain a queue of messages that arrive before the master recovers.
[ https://issues.apache.org/jira/browse/MESOS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062666#comment-14062666 ] Timothy Chen commented on MESOS-1517: - If no one is working on this I can look at this. Maintain a queue of messages that arrive before the master recovers. Key: MESOS-1517 URL: https://issues.apache.org/jira/browse/MESOS-1517 Project: Mesos Issue Type: Improvement Components: master Reporter: Benjamin Mahler Assignee: Timothy Chen Labels: reliability Currently when the master is recovering, we drop all incoming messages. If slaves and frameworks knew about the leading master only once it has recovered, then we would only expect to see messages after we've recovered. We previously considered enqueuing all messages through the recovery future, but this has the downside of forcing all messages to go through the master's queue twice: {code} // TODO(bmahler): Consider instead re-enqueing *all* messages // through recover(). What are the performance implications of // the additional queueing delay and the accumulated backlog // of messages post-recovery? if (!recovered.get().isReady()) { VLOG(1) Dropping ' event.message-name ' message since not recovered yet; ++metrics.dropped_messages; return; } {code} However, an easy solution to this problem is to maintain an explicit queue of incoming messages that gets flushed once we finish recovery. This ensures that all messages post-recovery are processed normally. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (MESOS-1517) Maintain a queue of messages that arrive before the master recovers.
[ https://issues.apache.org/jira/browse/MESOS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen reassigned MESOS-1517: --- Assignee: Timothy Chen Maintain a queue of messages that arrive before the master recovers. Key: MESOS-1517 URL: https://issues.apache.org/jira/browse/MESOS-1517 Project: Mesos Issue Type: Improvement Components: master Reporter: Benjamin Mahler Assignee: Timothy Chen Labels: reliability Currently when the master is recovering, we drop all incoming messages. If slaves and frameworks knew about the leading master only once it has recovered, then we would only expect to see messages after we've recovered. We previously considered enqueuing all messages through the recovery future, but this has the downside of forcing all messages to go through the master's queue twice: {code} // TODO(bmahler): Consider instead re-enqueing *all* messages // through recover(). What are the performance implications of // the additional queueing delay and the accumulated backlog // of messages post-recovery? if (!recovered.get().isReady()) { VLOG(1) Dropping ' event.message-name ' message since not recovered yet; ++metrics.dropped_messages; return; } {code} However, an easy solution to this problem is to maintain an explicit queue of incoming messages that gets flushed once we finish recovery. This ensures that all messages post-recovery are processed normally. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MESOS-857) restructure mesos python namespace
[ https://issues.apache.org/jira/browse/MESOS-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Hindman updated MESOS-857: --- Shepherd: Benjamin Hindman restructure mesos python namespace -- Key: MESOS-857 URL: https://issues.apache.org/jira/browse/MESOS-857 Project: Mesos Issue Type: Improvement Components: python api Reporter: brian wickman Assignee: Thomas Rampelberg Right now the mesos_pb2 and mesos dependencies are bundled together into the mesos egg. We have some tooling that uses just the compiled protobufs, but because they're lumped together with the mesos egg, we get all the dependency/platform nightmare that comes along with it, not to mention the bloat of including 20MB of .so files. This proposes splitting the mesos protobufs into a separate mesos_pb distribution that the mesos distribution should depend upon via install_requires (e.g. mesos_pb==0.15.0-rc4) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1593) Add DockerInfo Configuration
[ https://issues.apache.org/jira/browse/MESOS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062684#comment-14062684 ] Timothy Chen commented on MESOS-1593: - Updated doc to make root as default user. Add DockerInfo Configuration Key: MESOS-1593 URL: https://issues.apache.org/jira/browse/MESOS-1593 Project: Mesos Issue Type: Task Reporter: Timothy Chen Assignee: Timothy Chen We want to add a new proto message to encapsulate all Docker related configurations into DockerInfo. Here is the document that describes the design for DockerInfo: https://github.com/tnachen/mesos/wiki/DockerInfo-design -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MESOS-1600) IP classifiers in routing lib should ignore IP packets with IP options
[ https://issues.apache.org/jira/browse/MESOS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-1600: -- Sprint: Q3 Sprint 1 IP classifiers in routing lib should ignore IP packets with IP options -- Key: MESOS-1600 URL: https://issues.apache.org/jira/browse/MESOS-1600 Project: Mesos Issue Type: Task Reporter: Jie Yu Assignee: Jie Yu Fix For: 0.20.0 Currently, the IP classifiers simply assume that all IP packets do not have IP options. If an IP packet has IP options, the current behavior is undefined (e.g., might be redirected/mirrored to a random link). We can solve that by adding one more rule to the filter requiring the length of the IP packet to be 5 bytes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1593) Add DockerInfo Configuration
[ https://issues.apache.org/jira/browse/MESOS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062747#comment-14062747 ] Timothy Chen commented on MESOS-1593: - Actually there is another configuration in Docker that allows you to specify the user that is launched within the container: https://docs.docker.com/reference/run/#user If I'm not mistaken I believe we're just talking about the user that is used to launch the container itself right? If we want to distinguish the user override within the container and the user that launches the container, perhaps we should call the first container_user and second launch_user? Add DockerInfo Configuration Key: MESOS-1593 URL: https://issues.apache.org/jira/browse/MESOS-1593 Project: Mesos Issue Type: Task Reporter: Timothy Chen Assignee: Timothy Chen We want to add a new proto message to encapsulate all Docker related configurations into DockerInfo. Here is the document that describes the design for DockerInfo: https://github.com/tnachen/mesos/wiki/DockerInfo-design -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1593) Add DockerInfo Configuration
[ https://issues.apache.org/jira/browse/MESOS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062819#comment-14062819 ] Timothy Chen commented on MESOS-1593: - [~vi...@twitter.com] I believe you're only referring to the user being launched in the container right? https://docs.docker.com/reference/run/#user Add DockerInfo Configuration Key: MESOS-1593 URL: https://issues.apache.org/jira/browse/MESOS-1593 Project: Mesos Issue Type: Task Reporter: Timothy Chen Assignee: Timothy Chen We want to add a new proto message to encapsulate all Docker related configurations into DockerInfo. Here is the document that describes the design for DockerInfo: https://github.com/tnachen/mesos/wiki/DockerInfo-design -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (MESOS-1593) Add DockerInfo Configuration
[ https://issues.apache.org/jira/browse/MESOS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062819#comment-14062819 ] Timothy Chen edited comment on MESOS-1593 at 7/15/14 11:05 PM: --- [~vi...@twitter.com] I believe you're only referring to the user being launched in the container right? https://docs.docker.com/reference/run/#user Also if you don't specify the user for Docker run it already defaults to root so we don't need to have a default user setup. was (Author: tnachen): [~vi...@twitter.com] I believe you're only referring to the user being launched in the container right? https://docs.docker.com/reference/run/#user Add DockerInfo Configuration Key: MESOS-1593 URL: https://issues.apache.org/jira/browse/MESOS-1593 Project: Mesos Issue Type: Task Reporter: Timothy Chen Assignee: Timothy Chen We want to add a new proto message to encapsulate all Docker related configurations into DockerInfo. Here is the document that describes the design for DockerInfo: https://github.com/tnachen/mesos/wiki/DockerInfo-design -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1593) Add DockerInfo Configuration
[ https://issues.apache.org/jira/browse/MESOS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062880#comment-14062880 ] Vinod Kone commented on MESOS-1593: --- I meant the launch_user not the container_user. You should also wire that up with the authorization code that we recently added in 0.20.0. Currently, master authorizes CommandInfo.user(). You want to extend it to authorize DockerInfo.user(). Add DockerInfo Configuration Key: MESOS-1593 URL: https://issues.apache.org/jira/browse/MESOS-1593 Project: Mesos Issue Type: Task Reporter: Timothy Chen Assignee: Timothy Chen We want to add a new proto message to encapsulate all Docker related configurations into DockerInfo. Here is the document that describes the design for DockerInfo: https://github.com/tnachen/mesos/wiki/DockerInfo-design -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (MESOS-1593) Add DockerInfo Configuration
[ https://issues.apache.org/jira/browse/MESOS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062918#comment-14062918 ] Benjamin Hindman edited comment on MESOS-1593 at 7/16/14 12:15 AM: --- IIUC, Docker forces us to launch containers as root (I'd be pleasantly surprised if there was another way). The Docker daemon runs as root (which it must, because it's doing things like manipulating cgroups) and I believe the process that it forks within the container is thus root by default. Assuming the above, the best we can do is use --user=foo, but an image must be set up to actually have that user! We can definitely do authz on that user, although it's a little different than a user running on the host and I'm not sure exactly what doing authz buys us? (Eventually I believe the hope is that containers will be safe enough that giving them root from within their container will be safe, even if it's not today.) was (Author: benjaminhindman): IIUC, Docker forces us to launch containers as root (I'd be pleasantly surprised if there was another way). The Docker daemon runs as root (which it must, because it's doing things like manipulating cgroups) and I believe the process that it forks within the container is thus root by default. So, the best we can do is use --user=foo, but an image must be set up to actually have that user! We can definitely do authz on that user, although it's a little different than a user running on the host and I'm not sure exactly what doing authz buys us. (Eventually I believe the hope is that containers will be safe enough that giving them root from within their container will be safe, even if it's not today.) Add DockerInfo Configuration Key: MESOS-1593 URL: https://issues.apache.org/jira/browse/MESOS-1593 Project: Mesos Issue Type: Task Reporter: Timothy Chen Assignee: Timothy Chen We want to add a new proto message to encapsulate all Docker related configurations into DockerInfo. Here is the document that describes the design for DockerInfo: https://github.com/tnachen/mesos/wiki/DockerInfo-design -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1593) Add DockerInfo Configuration
[ https://issues.apache.org/jira/browse/MESOS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062918#comment-14062918 ] Benjamin Hindman commented on MESOS-1593: - IIUC, Docker forces us to launch containers as root (I'd be pleasantly surprised if there was another way). The Docker daemon runs as root (which it must, because it's doing things like manipulating cgroups) and I believe the process that it forks within the container is thus root by default. So, the best we can do is use --user=foo, but an image must be set up to actually have that user! We can definitely do authz on that user, although it's a little different than a user running on the host and I'm not sure exactly what doing authz buys us. (Eventually I believe the hope is that containers will be safe enough that giving them root from within their container will be safe, even if it's not today.) Add DockerInfo Configuration Key: MESOS-1593 URL: https://issues.apache.org/jira/browse/MESOS-1593 Project: Mesos Issue Type: Task Reporter: Timothy Chen Assignee: Timothy Chen We want to add a new proto message to encapsulate all Docker related configurations into DockerInfo. Here is the document that describes the design for DockerInfo: https://github.com/tnachen/mesos/wiki/DockerInfo-design -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1593) Add DockerInfo Configuration
[ https://issues.apache.org/jira/browse/MESOS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062976#comment-14062976 ] Tobi Knaup commented on MESOS-1593: --- You can actually launch as any user in the docker group: $ ll /var/run/docker.sock srw-rw 1 root docker 0 Jul 15 05:22 /var/run/docker.sock I don't expect this to be very common though so supporting just root in the first pass will be fine. Add DockerInfo Configuration Key: MESOS-1593 URL: https://issues.apache.org/jira/browse/MESOS-1593 Project: Mesos Issue Type: Task Reporter: Timothy Chen Assignee: Timothy Chen We want to add a new proto message to encapsulate all Docker related configurations into DockerInfo. Here is the document that describes the design for DockerInfo: https://github.com/tnachen/mesos/wiki/DockerInfo-design -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MESOS-1603) SlaveTest.TerminatingSlaveDoesNotReregister is flaky.
[ https://issues.apache.org/jira/browse/MESOS-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-1603: --- Sprint: Q3 Sprint 1 SlaveTest.TerminatingSlaveDoesNotReregister is flaky. - Key: MESOS-1603 URL: https://issues.apache.org/jira/browse/MESOS-1603 Project: Mesos Issue Type: Bug Components: test Reporter: Benjamin Mahler Assignee: Benjamin Mahler {noformat} [ RUN ] SlaveTest.TerminatingSlaveDoesNotReregister Using temporary directory '/tmp/SlaveTest_TerminatingSlaveDoesNotReregister_6OCQiU' I0715 18:16:03.231495 5857 leveldb.cpp:176] Opened db in 27.552259ms I0715 18:16:03.240953 5857 leveldb.cpp:183] Compacted db in 8.801497ms I0715 18:16:03.241580 5857 leveldb.cpp:198] Created db iterator in 39823ns I0715 18:16:03.241945 5857 leveldb.cpp:204] Seeked to beginning of db in 15498ns I0715 18:16:03.242385 5857 leveldb.cpp:273] Iterated through 0 keys in the db in 15153ns I0715 18:16:03.242780 5857 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0715 18:16:03.243475 5882 recover.cpp:425] Starting replica recovery I0715 18:16:03.243540 5882 recover.cpp:451] Replica is in EMPTY status I0715 18:16:03.243862 5882 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0715 18:16:03.243919 5882 recover.cpp:188] Received a recover response from a replica in EMPTY status I0715 18:16:03.244112 5875 recover.cpp:542] Updating replica status to STARTING I0715 18:16:03.249405 5880 master.cpp:288] Master 20140715-181603-16842879-36514-5857 (trusty) started on 127.0.1.1:36514 I0715 18:16:03.249445 5880 master.cpp:325] Master only allowing authenticated frameworks to register I0715 18:16:03.249454 5880 master.cpp:330] Master only allowing authenticated slaves to register I0715 18:16:03.249480 5880 credentials.hpp:36] Loading credentials for authentication from '/tmp/SlaveTest_TerminatingSlaveDoesNotReregister_6OCQiU/credentials' I0715 18:16:03.250130 5880 master.cpp:359] Authorization enabled I0715 18:16:03.250900 5880 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:36514 I0715 18:16:03.250951 5880 master.cpp:122] No whitelist given. Advertising offers for all slaves I0715 18:16:03.251145 5880 master.cpp:1128] The newly elected leader is master@127.0.1.1:36514 with id 20140715-181603-16842879-36514-5857 I0715 18:16:03.251164 5880 master.cpp:1141] Elected as the leading master! I0715 18:16:03.251173 5880 master.cpp:959] Recovering from registrar I0715 18:16:03.251225 5880 registrar.cpp:313] Recovering registrar I0715 18:16:03.254640 5875 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 10.421369ms I0715 18:16:03.254683 5875 replica.cpp:320] Persisted replica status to STARTING I0715 18:16:03.254770 5875 recover.cpp:451] Replica is in STARTING status I0715 18:16:03.255097 5875 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0715 18:16:03.255166 5875 recover.cpp:188] Received a recover response from a replica in STARTING status I0715 18:16:03.255280 5875 recover.cpp:542] Updating replica status to VOTING I0715 18:16:03.263897 5875 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 8.581313ms I0715 18:16:03.263944 5875 replica.cpp:320] Persisted replica status to VOTING I0715 18:16:03.264010 5875 recover.cpp:556] Successfully joined the Paxos group I0715 18:16:03.264085 5875 recover.cpp:440] Recover process terminated I0715 18:16:03.264227 5875 log.cpp:656] Attempting to start the writer I0715 18:16:03.264570 5875 replica.cpp:474] Replica received implicit promise request with proposal 1 I0715 18:16:03.322881 5875 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 58.31469ms I0715 18:16:03.323349 5875 replica.cpp:342] Persisted promised to 1 I0715 18:16:03.328495 5876 coordinator.cpp:230] Coordinator attemping to fill missing position I0715 18:16:03.328910 5876 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0715 18:16:03.338655 5876 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 9.73834ms I0715 18:16:03.338693 5876 replica.cpp:676] Persisted action at 0 I0715 18:16:03.338964 5876 replica.cpp:508] Replica received write request for position 0 I0715 18:16:03.338997 5876 leveldb.cpp:438] Reading position from leveldb took 21691ns I0715 18:16:03.349257 5876 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 10.25515ms I0715 18:16:03.349551 5876 replica.cpp:676] Persisted action at 0 I0715 18:16:03.354379 5877 replica.cpp:655] Replica received learned notice
[jira] [Resolved] (MESOS-1460) SlaveTest.TerminatingSlaveDoesNotRegister is flaky
[ https://issues.apache.org/jira/browse/MESOS-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler resolved MESOS-1460. Resolution: Fixed {noformat} commit ebee9afee7f6e5f04a5f259642c12eb0b99c35e0 Author: Yifan Gu yi...@mesosphere.io Date: Thu Jun 12 12:24:46 2014 -0700 Fixed a flaky test: SlaveTest.TerminatingSlaveDoesNotReregister. Review: https://reviews.apache.org/r/22472 {noformat} SlaveTest.TerminatingSlaveDoesNotRegister is flaky -- Key: MESOS-1460 URL: https://issues.apache.org/jira/browse/MESOS-1460 Project: Mesos Issue Type: Bug Reporter: Dominic Hamon Assignee: Yifan Gu [ RUN ] SlaveTest.TerminatingSlaveDoesNotReregister Using temporary directory '/tmp/SlaveTest_TerminatingSlaveDoesNotReregister_U2FkN5' I0605 11:04:21.890828 32082 leveldb.cpp:176] Opened db in 49.661187ms I0605 11:04:21.908869 32082 leveldb.cpp:183] Compacted db in 17.671793ms I0605 11:04:21.909230 32082 leveldb.cpp:198] Created db iterator in 26848ns I0605 11:04:21.909484 32082 leveldb.cpp:204] Seeked to beginning of db in 1705ns I0605 11:04:21.909740 32082 leveldb.cpp:273] Iterated through 0 keys in the db in 815ns I0605 11:04:21.910032 32082 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0605 11:04:21.910549 32105 recover.cpp:425] Starting replica recovery I0605 11:04:21.910626 32105 recover.cpp:451] Replica is in EMPTY status I0605 11:04:21.910951 32105 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0605 11:04:21.911013 32105 recover.cpp:188] Received a recover response from a replica in EMPTY status I0605 11:04:21.93 32105 recover.cpp:542] Updating replica status to STARTING I0605 11:04:21.914664 32109 master.cpp:272] Master 20140605-110421-16842879-56385-32082 (precise) started on 127.0.1.1:56385 I0605 11:04:21.914690 32109 master.cpp:309] Master only allowing authenticated frameworks to register I0605 11:04:21.914695 32109 master.cpp:314] Master only allowing authenticated slaves to register I0605 11:04:21.914702 32109 credentials.hpp:35] Loading credentials for authentication from '/tmp/SlaveTest_TerminatingSlaveDoesNotReregister_U2FkN5/credentials' I0605 11:04:21.914765 32109 master.cpp:340] Master enabling authorization I0605 11:04:21.915194 32109 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:56385 I0605 11:04:21.915230 32109 master.cpp:108] No whitelist given. Advertising offers for all slaves I0605 11:04:21.915393 32109 master.cpp:957] The newly elected leader is master@127.0.1.1:56385 with id 20140605-110421-16842879-56385-32082 I0605 11:04:21.915405 32109 master.cpp:970] Elected as the leading master! I0605 11:04:21.915410 32109 master.cpp:788] Recovering from registrar I0605 11:04:21.915458 32109 registrar.cpp:313] Recovering registrar I0605 11:04:21.931046 32105 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 19.869329ms I0605 11:04:21.931084 32105 replica.cpp:320] Persisted replica status to STARTING I0605 11:04:21.931169 32105 recover.cpp:451] Replica is in STARTING status I0605 11:04:21.931500 32105 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0605 11:04:21.931560 32105 recover.cpp:188] Received a recover response from a replica in STARTING status I0605 11:04:21.931656 32105 recover.cpp:542] Updating replica status to VOTING I0605 11:04:21.945734 32105 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 14.013731ms I0605 11:04:21.945791 32105 replica.cpp:320] Persisted replica status to VOTING I0605 11:04:21.945868 32105 recover.cpp:556] Successfully joined the Paxos group I0605 11:04:21.945930 32105 recover.cpp:440] Recover process terminated I0605 11:04:21.946071 32105 log.cpp:656] Attempting to start the writer I0605 11:04:21.946374 32105 replica.cpp:474] Replica received implicit promise request with proposal 1 I0605 11:04:21.960847 32105 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 14.444258ms I0605 11:04:21.961493 32105 replica.cpp:342] Persisted promised to 1 I0605 11:04:21.965292 32107 coordinator.cpp:230] Coordinator attemping to fill missing position I0605 11:04:21.965626 32107 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0605 11:04:21.982533 32107 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 16.8754ms I0605 11:04:21.982589 32107 replica.cpp:676] Persisted action at 0 I0605 11:04:21.982921 32107 replica.cpp:508] Replica received write request for position 0 I0605 11:04:21.982952 32107 leveldb.cpp:438] Reading position from leveldb took 16276ns I0605 11:04:21.999135 32107
[jira] [Created] (MESOS-1603) SlaveTest.TerminatingSlaveDoesNotReregister is flaky.
Benjamin Mahler created MESOS-1603: -- Summary: SlaveTest.TerminatingSlaveDoesNotReregister is flaky. Key: MESOS-1603 URL: https://issues.apache.org/jira/browse/MESOS-1603 Project: Mesos Issue Type: Bug Components: test Reporter: Benjamin Mahler Assignee: Benjamin Mahler {noformat} [ RUN ] SlaveTest.TerminatingSlaveDoesNotReregister Using temporary directory '/tmp/SlaveTest_TerminatingSlaveDoesNotReregister_6OCQiU' I0715 18:16:03.231495 5857 leveldb.cpp:176] Opened db in 27.552259ms I0715 18:16:03.240953 5857 leveldb.cpp:183] Compacted db in 8.801497ms I0715 18:16:03.241580 5857 leveldb.cpp:198] Created db iterator in 39823ns I0715 18:16:03.241945 5857 leveldb.cpp:204] Seeked to beginning of db in 15498ns I0715 18:16:03.242385 5857 leveldb.cpp:273] Iterated through 0 keys in the db in 15153ns I0715 18:16:03.242780 5857 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0715 18:16:03.243475 5882 recover.cpp:425] Starting replica recovery I0715 18:16:03.243540 5882 recover.cpp:451] Replica is in EMPTY status I0715 18:16:03.243862 5882 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0715 18:16:03.243919 5882 recover.cpp:188] Received a recover response from a replica in EMPTY status I0715 18:16:03.244112 5875 recover.cpp:542] Updating replica status to STARTING I0715 18:16:03.249405 5880 master.cpp:288] Master 20140715-181603-16842879-36514-5857 (trusty) started on 127.0.1.1:36514 I0715 18:16:03.249445 5880 master.cpp:325] Master only allowing authenticated frameworks to register I0715 18:16:03.249454 5880 master.cpp:330] Master only allowing authenticated slaves to register I0715 18:16:03.249480 5880 credentials.hpp:36] Loading credentials for authentication from '/tmp/SlaveTest_TerminatingSlaveDoesNotReregister_6OCQiU/credentials' I0715 18:16:03.250130 5880 master.cpp:359] Authorization enabled I0715 18:16:03.250900 5880 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:36514 I0715 18:16:03.250951 5880 master.cpp:122] No whitelist given. Advertising offers for all slaves I0715 18:16:03.251145 5880 master.cpp:1128] The newly elected leader is master@127.0.1.1:36514 with id 20140715-181603-16842879-36514-5857 I0715 18:16:03.251164 5880 master.cpp:1141] Elected as the leading master! I0715 18:16:03.251173 5880 master.cpp:959] Recovering from registrar I0715 18:16:03.251225 5880 registrar.cpp:313] Recovering registrar I0715 18:16:03.254640 5875 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 10.421369ms I0715 18:16:03.254683 5875 replica.cpp:320] Persisted replica status to STARTING I0715 18:16:03.254770 5875 recover.cpp:451] Replica is in STARTING status I0715 18:16:03.255097 5875 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0715 18:16:03.255166 5875 recover.cpp:188] Received a recover response from a replica in STARTING status I0715 18:16:03.255280 5875 recover.cpp:542] Updating replica status to VOTING I0715 18:16:03.263897 5875 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 8.581313ms I0715 18:16:03.263944 5875 replica.cpp:320] Persisted replica status to VOTING I0715 18:16:03.264010 5875 recover.cpp:556] Successfully joined the Paxos group I0715 18:16:03.264085 5875 recover.cpp:440] Recover process terminated I0715 18:16:03.264227 5875 log.cpp:656] Attempting to start the writer I0715 18:16:03.264570 5875 replica.cpp:474] Replica received implicit promise request with proposal 1 I0715 18:16:03.322881 5875 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 58.31469ms I0715 18:16:03.323349 5875 replica.cpp:342] Persisted promised to 1 I0715 18:16:03.328495 5876 coordinator.cpp:230] Coordinator attemping to fill missing position I0715 18:16:03.328910 5876 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0715 18:16:03.338655 5876 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 9.73834ms I0715 18:16:03.338693 5876 replica.cpp:676] Persisted action at 0 I0715 18:16:03.338964 5876 replica.cpp:508] Replica received write request for position 0 I0715 18:16:03.338997 5876 leveldb.cpp:438] Reading position from leveldb took 21691ns I0715 18:16:03.349257 5876 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 10.25515ms I0715 18:16:03.349551 5876 replica.cpp:676] Persisted action at 0 I0715 18:16:03.354379 5877 replica.cpp:655] Replica received learned notice for position 0 I0715 18:16:03.367383 5877 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 12.99789ms I0715 18:16:03.367434 5877 replica.cpp:676] Persisted action at 0 I0715 18:16:03.367444 5877 replica.cpp:661] Replica learned NOP action
[jira] [Created] (MESOS-1604) LowLevelSchedulerLibprocess did not receive offers from Master
Zuyu Zhang created MESOS-1604: - Summary: LowLevelSchedulerLibprocess did not receive offers from Master Key: MESOS-1604 URL: https://issues.apache.org/jira/browse/MESOS-1604 Project: Mesos Issue Type: Bug Environment: cent os Reporter: Zuyu Zhang {noformat} [ RUN ] ExamplesTest.LowLevelSchedulerLibprocess Using temporary directory '/tmp/ExamplesTest_LowLevelSchedulerLibprocess_MT54ja' Enabling checkpoint for the framework Enabling authentication for the scheduler I0715 18:51:48.456028 2019271440 scheduler.cpp:132] Version: 0.20.0 I0715 18:51:48.459760 2019271440 leveldb.cpp:176] Opened db in 1764us I0715 18:51:48.460237 2019271440 leveldb.cpp:183] Compacted db in 463us I0715 18:51:48.460283 2019271440 leveldb.cpp:198] Created db iterator in 24us I0715 18:51:48.460311 2019271440 leveldb.cpp:204] Seeked to beginning of db in 15us I0715 18:51:48.460337 2019271440 leveldb.cpp:273] Iterated through 0 keys in the db in 18us I0715 18:51:48.460381 2019271440 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0715 18:51:48.461017 222433280 recover.cpp:425] Starting replica recovery I0715 18:51:48.461197 222433280 recover.cpp:451] Replica is in EMPTY status I0715 18:51:48.461845 222433280 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0715 18:51:48.462077 220823552 recover.cpp:188] Received a recover response from a replica in EMPTY status I0715 18:51:48.462210 220823552 recover.cpp:542] Updating replica status to STARTING I0715 18:51:48.462533 222433280 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 288us I0715 18:51:48.462589 222433280 replica.cpp:320] Persisted replica status to STARTING I0715 18:51:48.462645 220823552 master.cpp:288] Master 20140715-185148-16777343-58730-32992 (localhost) started on 127.0.0.1:58730 I0715 18:51:48.462714 220823552 master.cpp:325] Master only allowing authenticated frameworks to register I0715 18:51:48.462739 220823552 master.cpp:332] Master allowing unauthenticated slaves to register I0715 18:51:48.462757 220823552 credentials.hpp:36] Loading credentials for authentication from '/tmp/ExamplesTest_LowLevelSchedulerLibprocess_MT54ja/credentials' W0715 18:51:48.462885 220823552 credentials.hpp:51] Permissions on credentials file '/tmp/ExamplesTest_LowLevelSchedulerLibprocess_MT54ja/credentials' are too open. It is recommended that your credentials file is NOT accessible by others. I0715 18:51:48.462924 2019271440 containerizer.cpp:124] Using isolation: posix/cpu,posix/mem I0715 18:51:48.462988 220823552 master.cpp:359] Authorization enabled I0715 18:51:48.463070 220286976 recover.cpp:451] Replica is in STARTING status I0715 18:51:48.463729 219750400 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0715 18:51:48.463878 222433280 slave.cpp:168] Slave started on 1)@127.0.0.1:58730 I0715 18:51:48.464135 221896704 recover.cpp:188] Received a recover response from a replica in STARTING status I0715 18:51:48.464149 222433280 slave.cpp:279] Slave resources: cpus(*):4; mem(*):7168; disk(*):470714; ports(*):[31000-32000] I0715 18:51:48.464359 2019271440 containerizer.cpp:124] Using isolation: posix/cpu,posix/mem I0715 18:51:48.464381 222433280 slave.cpp:324] Slave hostname: localhost I0715 18:51:48.464397 222433280 slave.cpp:325] Slave checkpoint: false I0715 18:51:48.464488 219213824 recover.cpp:542] Updating replica status to VOTING I0715 18:51:48.464767 222969856 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 191us I0715 18:51:48.464807 222969856 replica.cpp:320] Persisted replica status to VOTING I0715 18:51:48.464859 222969856 recover.cpp:556] Successfully joined the Paxos group I0715 18:51:48.465044 219750400 state.cpp:33] Recovering state from '/Users/zuyuzhang/workspace/mesos/build-clang/mesos-FxDHkE/0/meta' I0715 18:51:48.465116 222969856 recover.cpp:440] Recover process terminated I0715 18:51:48.465334 220823552 status_update_manager.cpp:193] Recovering status update manager I0715 18:51:48.465350 221896704 master.cpp:1128] The newly elected leader is master@127.0.0.1:58730 with id 20140715-185148-16777343-58730-32992 I0715 18:51:48.465384 221896704 master.cpp:1141] Elected as the leading master! I0715 18:51:48.465416 221896704 master.cpp:959] Recovering from registrar I0715 18:51:48.465492 219750400 containerizer.cpp:287] Recovering containerizer I0715 18:51:48.465584 222433280 registrar.cpp:313] Recovering registrar I0715 18:51:48.465770 222969856 slave.cpp:168] Slave started on 2)@127.0.0.1:58730 I0715 18:51:48.465998 222969856 slave.cpp:279] Slave resources: cpus(*):4; mem(*):7168; disk(*):470714; ports(*):[31000-32000] I0715 18:51:48.466120 2019271440 containerizer.cpp:124] Using isolation: posix/cpu,posix/mem I0715 18:51:48.466194 220823552 log.cpp:656] Attempting to start
[jira] [Commented] (MESOS-1525) Don't require slave id for reconciliation requests.
[ https://issues.apache.org/jira/browse/MESOS-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063057#comment-14063057 ] Benjamin Mahler commented on MESOS-1525: Review: https://reviews.apache.org/r/23542/ Don't require slave id for reconciliation requests. --- Key: MESOS-1525 URL: https://issues.apache.org/jira/browse/MESOS-1525 Project: Mesos Issue Type: Improvement Affects Versions: 0.19.0 Reporter: Benjamin Mahler Assignee: Benjamin Mahler Reconciliation requests currently specify a list of TaskStatuses. SlaveID is optional inside TaskStatus but reconciliation requests are dropped when the SlaveID is not specified. We can answer reconciliation requests for a task so long as there are no transient slaves, this is what we should do when the slave id is not specified. There's an open question around whether we want the Reconcile Event to specify TaskID/SlaveID instead of TaskStatus, but I'll save that for later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1603) SlaveTest.TerminatingSlaveDoesNotReregister is flaky.
[ https://issues.apache.org/jira/browse/MESOS-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063058#comment-14063058 ] Benjamin Mahler commented on MESOS-1603: Review: https://reviews.apache.org/r/23543/ SlaveTest.TerminatingSlaveDoesNotReregister is flaky. - Key: MESOS-1603 URL: https://issues.apache.org/jira/browse/MESOS-1603 Project: Mesos Issue Type: Bug Components: test Reporter: Benjamin Mahler Assignee: Benjamin Mahler {noformat} [ RUN ] SlaveTest.TerminatingSlaveDoesNotReregister Using temporary directory '/tmp/SlaveTest_TerminatingSlaveDoesNotReregister_6OCQiU' I0715 18:16:03.231495 5857 leveldb.cpp:176] Opened db in 27.552259ms I0715 18:16:03.240953 5857 leveldb.cpp:183] Compacted db in 8.801497ms I0715 18:16:03.241580 5857 leveldb.cpp:198] Created db iterator in 39823ns I0715 18:16:03.241945 5857 leveldb.cpp:204] Seeked to beginning of db in 15498ns I0715 18:16:03.242385 5857 leveldb.cpp:273] Iterated through 0 keys in the db in 15153ns I0715 18:16:03.242780 5857 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0715 18:16:03.243475 5882 recover.cpp:425] Starting replica recovery I0715 18:16:03.243540 5882 recover.cpp:451] Replica is in EMPTY status I0715 18:16:03.243862 5882 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0715 18:16:03.243919 5882 recover.cpp:188] Received a recover response from a replica in EMPTY status I0715 18:16:03.244112 5875 recover.cpp:542] Updating replica status to STARTING I0715 18:16:03.249405 5880 master.cpp:288] Master 20140715-181603-16842879-36514-5857 (trusty) started on 127.0.1.1:36514 I0715 18:16:03.249445 5880 master.cpp:325] Master only allowing authenticated frameworks to register I0715 18:16:03.249454 5880 master.cpp:330] Master only allowing authenticated slaves to register I0715 18:16:03.249480 5880 credentials.hpp:36] Loading credentials for authentication from '/tmp/SlaveTest_TerminatingSlaveDoesNotReregister_6OCQiU/credentials' I0715 18:16:03.250130 5880 master.cpp:359] Authorization enabled I0715 18:16:03.250900 5880 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:36514 I0715 18:16:03.250951 5880 master.cpp:122] No whitelist given. Advertising offers for all slaves I0715 18:16:03.251145 5880 master.cpp:1128] The newly elected leader is master@127.0.1.1:36514 with id 20140715-181603-16842879-36514-5857 I0715 18:16:03.251164 5880 master.cpp:1141] Elected as the leading master! I0715 18:16:03.251173 5880 master.cpp:959] Recovering from registrar I0715 18:16:03.251225 5880 registrar.cpp:313] Recovering registrar I0715 18:16:03.254640 5875 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 10.421369ms I0715 18:16:03.254683 5875 replica.cpp:320] Persisted replica status to STARTING I0715 18:16:03.254770 5875 recover.cpp:451] Replica is in STARTING status I0715 18:16:03.255097 5875 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0715 18:16:03.255166 5875 recover.cpp:188] Received a recover response from a replica in STARTING status I0715 18:16:03.255280 5875 recover.cpp:542] Updating replica status to VOTING I0715 18:16:03.263897 5875 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 8.581313ms I0715 18:16:03.263944 5875 replica.cpp:320] Persisted replica status to VOTING I0715 18:16:03.264010 5875 recover.cpp:556] Successfully joined the Paxos group I0715 18:16:03.264085 5875 recover.cpp:440] Recover process terminated I0715 18:16:03.264227 5875 log.cpp:656] Attempting to start the writer I0715 18:16:03.264570 5875 replica.cpp:474] Replica received implicit promise request with proposal 1 I0715 18:16:03.322881 5875 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 58.31469ms I0715 18:16:03.323349 5875 replica.cpp:342] Persisted promised to 1 I0715 18:16:03.328495 5876 coordinator.cpp:230] Coordinator attemping to fill missing position I0715 18:16:03.328910 5876 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0715 18:16:03.338655 5876 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 9.73834ms I0715 18:16:03.338693 5876 replica.cpp:676] Persisted action at 0 I0715 18:16:03.338964 5876 replica.cpp:508] Replica received write request for position 0 I0715 18:16:03.338997 5876 leveldb.cpp:438] Reading position from leveldb took 21691ns I0715 18:16:03.349257 5876 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 10.25515ms I0715 18:16:03.349551 5876 replica.cpp:676] Persisted action at 0
[jira] [Closed] (MESOS-1604) LowLevelSchedulerLibprocess did not receive offers from Master
[ https://issues.apache.org/jira/browse/MESOS-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zuyu Zhang closed MESOS-1604. - Resolution: Not a Problem make check with disabling ckpt. LowLevelSchedulerLibprocess did not receive offers from Master -- Key: MESOS-1604 URL: https://issues.apache.org/jira/browse/MESOS-1604 Project: Mesos Issue Type: Bug Environment: cent os Reporter: Zuyu Zhang {noformat} [ RUN ] ExamplesTest.LowLevelSchedulerLibprocess Using temporary directory '/tmp/ExamplesTest_LowLevelSchedulerLibprocess_MT54ja' Enabling checkpoint for the framework Enabling authentication for the scheduler I0715 18:51:48.456028 2019271440 scheduler.cpp:132] Version: 0.20.0 I0715 18:51:48.459760 2019271440 leveldb.cpp:176] Opened db in 1764us I0715 18:51:48.460237 2019271440 leveldb.cpp:183] Compacted db in 463us I0715 18:51:48.460283 2019271440 leveldb.cpp:198] Created db iterator in 24us I0715 18:51:48.460311 2019271440 leveldb.cpp:204] Seeked to beginning of db in 15us I0715 18:51:48.460337 2019271440 leveldb.cpp:273] Iterated through 0 keys in the db in 18us I0715 18:51:48.460381 2019271440 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0715 18:51:48.461017 222433280 recover.cpp:425] Starting replica recovery I0715 18:51:48.461197 222433280 recover.cpp:451] Replica is in EMPTY status I0715 18:51:48.461845 222433280 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0715 18:51:48.462077 220823552 recover.cpp:188] Received a recover response from a replica in EMPTY status I0715 18:51:48.462210 220823552 recover.cpp:542] Updating replica status to STARTING I0715 18:51:48.462533 222433280 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 288us I0715 18:51:48.462589 222433280 replica.cpp:320] Persisted replica status to STARTING I0715 18:51:48.462645 220823552 master.cpp:288] Master 20140715-185148-16777343-58730-32992 (localhost) started on 127.0.0.1:58730 I0715 18:51:48.462714 220823552 master.cpp:325] Master only allowing authenticated frameworks to register I0715 18:51:48.462739 220823552 master.cpp:332] Master allowing unauthenticated slaves to register I0715 18:51:48.462757 220823552 credentials.hpp:36] Loading credentials for authentication from '/tmp/ExamplesTest_LowLevelSchedulerLibprocess_MT54ja/credentials' W0715 18:51:48.462885 220823552 credentials.hpp:51] Permissions on credentials file '/tmp/ExamplesTest_LowLevelSchedulerLibprocess_MT54ja/credentials' are too open. It is recommended that your credentials file is NOT accessible by others. I0715 18:51:48.462924 2019271440 containerizer.cpp:124] Using isolation: posix/cpu,posix/mem I0715 18:51:48.462988 220823552 master.cpp:359] Authorization enabled I0715 18:51:48.463070 220286976 recover.cpp:451] Replica is in STARTING status I0715 18:51:48.463729 219750400 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0715 18:51:48.463878 222433280 slave.cpp:168] Slave started on 1)@127.0.0.1:58730 I0715 18:51:48.464135 221896704 recover.cpp:188] Received a recover response from a replica in STARTING status I0715 18:51:48.464149 222433280 slave.cpp:279] Slave resources: cpus(*):4; mem(*):7168; disk(*):470714; ports(*):[31000-32000] I0715 18:51:48.464359 2019271440 containerizer.cpp:124] Using isolation: posix/cpu,posix/mem I0715 18:51:48.464381 222433280 slave.cpp:324] Slave hostname: localhost I0715 18:51:48.464397 222433280 slave.cpp:325] Slave checkpoint: false I0715 18:51:48.464488 219213824 recover.cpp:542] Updating replica status to VOTING I0715 18:51:48.464767 222969856 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 191us I0715 18:51:48.464807 222969856 replica.cpp:320] Persisted replica status to VOTING I0715 18:51:48.464859 222969856 recover.cpp:556] Successfully joined the Paxos group I0715 18:51:48.465044 219750400 state.cpp:33] Recovering state from '/Users/zuyuzhang/workspace/mesos/build-clang/mesos-FxDHkE/0/meta' I0715 18:51:48.465116 222969856 recover.cpp:440] Recover process terminated I0715 18:51:48.465334 220823552 status_update_manager.cpp:193] Recovering status update manager I0715 18:51:48.465350 221896704 master.cpp:1128] The newly elected leader is master@127.0.0.1:58730 with id 20140715-185148-16777343-58730-32992 I0715 18:51:48.465384 221896704 master.cpp:1141] Elected as the leading master! I0715 18:51:48.465416 221896704 master.cpp:959] Recovering from registrar I0715 18:51:48.465492 219750400 containerizer.cpp:287] Recovering containerizer I0715 18:51:48.465584 222433280 registrar.cpp:313] Recovering registrar I0715 18:51:48.465770 222969856 slave.cpp:168] Slave started
[jira] [Created] (MESOS-1605) Cleanup stout build setup
Vinod Kone created MESOS-1605: - Summary: Cleanup stout build setup Key: MESOS-1605 URL: https://issues.apache.org/jira/browse/MESOS-1605 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone While investigating stout build setup for making it installable, I came across some discrepancies. stout tests are included in libprocess's Makefile instead of stout Makefile. stout's 3rd party dependencies (e.g., picojson) live in libprocess's 3rdparty directory instead of living in stout's (non-existent) 3rd party directory. It would be nice to fix these issues before making stout installable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (MESOS-1605) Cleanup stout build setup
[ https://issues.apache.org/jira/browse/MESOS-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone reassigned MESOS-1605: - Assignee: Vinod Kone Cleanup stout build setup - Key: MESOS-1605 URL: https://issues.apache.org/jira/browse/MESOS-1605 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Vinod Kone While investigating stout build setup for making it installable, I came across some discrepancies. stout tests are included in libprocess's Makefile instead of stout Makefile. stout's 3rd party dependencies (e.g., picojson) live in libprocess's 3rdparty directory instead of living in stout's (non-existent) 3rd party directory. It would be nice to fix these issues before making stout installable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MESOS-1595) Provide a way to install libprocess
[ https://issues.apache.org/jira/browse/MESOS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-1595: -- Sprint: Q3 Sprint 1 Provide a way to install libprocess --- Key: MESOS-1595 URL: https://issues.apache.org/jira/browse/MESOS-1595 Project: Mesos Issue Type: Story Reporter: Vinod Kone Assignee: Vinod Kone For C++ framework developers that want to use libprocess in their code base, it would be great if Mesos provides a way to easily get access to the headers. A first step in that direction would be to provide a install target in the libprocess Makefile for the same. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MESOS-1605) Cleanup stout build setup
[ https://issues.apache.org/jira/browse/MESOS-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-1605: -- Sprint: Q3 Sprint 1 Cleanup stout build setup - Key: MESOS-1605 URL: https://issues.apache.org/jira/browse/MESOS-1605 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Vinod Kone While investigating stout build setup for making it installable, I came across some discrepancies. stout tests are included in libprocess's Makefile instead of stout Makefile. stout's 3rd party dependencies (e.g., picojson) live in libprocess's 3rdparty directory instead of living in stout's (non-existent) 3rd party directory. It would be nice to fix these issues before making stout installable. -- This message was sent by Atlassian JIRA (v6.2#6252)