[VOTE] Release Apache Mesos 1.0.2 (rc2)
Hi all, Please vote on releasing the following candidate as Apache Mesos 1.0.2. This is a bug fix release. The CHANGELOG for the release is available at: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.0.2-rc2 The candidate for Mesos 1.0.2 release is available at: https://dist.apache.org/repos/dist/dev/mesos/1.0.2-rc2/mesos-1.0.2.tar.gz The tag to be voted on is 1.0.2-rc2: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.0.2-rc2 The MD5 checksum of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/1.0.2-rc2/mesos-1.0.2.tar.gz.md5 The signature of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/1.0.2-rc2/mesos-1.0.2.tar.gz.asc The PGP key used to sign the release is here: https://dist.apache.org/repos/dist/release/mesos/KEYS The JAR is up in Maven in a staging repository here: https://repository.apache.org/content/repositories/orgapachemesos-1164 Please vote on releasing this package as Apache Mesos 1.0.2! The vote is open until Thu Nov 3 16:34:20 PDT 2016 and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Mesos 1.0.2 [ ] -1 Do not release this package because ... Thanks,
Re: outstanding offers
Right, I have written my own scheduler and sometimes end up in a state that Mesos believes that there are outstanding offers for my framework but I don't seem to have received them and the normal Mesos trace is now showing the IDs when it offers resources just when they get declined or used. I'll look into using that trace. Beside that the question is how one can get back to a state where there are no outstanding offers. For tasks I can call "reconcileTasks" to check with Mesos on the tasks state. But there does not seem to be an equivalent for offers, which is odd given that offers don't timeout by default. Thus I was wondering what happens if there are communication problems and Mesos sends out an offer that I never receive. And what happens if my framework gets reregistered with Mesos, do outstanding offers get automatically reseted or not? On 31.10.2016 18:49, Vinod Kone wrote: Are you running a custom framework? Can you see in scheduler logs which offers you are receiving? Am I understanding your question correctly that Mesos thinks offers are being sent to your framework but (you think) your framework hasn't received them? Note that you can increase logging on the framework (driver) and Mesos master by setting GLOG_v=1 in the environment. On Mon, Oct 31, 2016 at 12:42 AM, Hendrik Haddorp mailto:hendrik.hadd...@gmx.net>> wrote: Hi, I have a Mesos 0.28.2 system and generally things seem to run fine. The "Outstanding Offers" normally shows nothing, which I believe is normal. However at some point my framework gets disconnected for some odd reason, might be due to some high load or so. A few seconds later I receive a reregistered call from Mesos. However it looks like around this time offers start to get listed on the "Oustanding Offers" page. Even more strangely no Mesos log file contains any information for the offer IDs shown. Unfortunately the default logging does not show what offer IDs are being send out while it shows the IDs that are being declined or got accepted. So I don't know when these actually offers got send out. How can I deal with such situation? Should I: Stop the SchedulerDriver when I get disconnected instead of waiting for a reregistered call? Is it advised to set --offer_timeout to recover from such a situation? Is there any way to reconcile offers like one can do for tasks? thanks, Hendrik
Re: outstanding offers
Are you running a custom framework? Can you see in scheduler logs which offers you are receiving? Am I understanding your question correctly that Mesos thinks offers are being sent to your framework but (you think) your framework hasn't received them? Note that you can increase logging on the framework (driver) and Mesos master by setting GLOG_v=1 in the environment. On Mon, Oct 31, 2016 at 12:42 AM, Hendrik Haddorp wrote: > Hi, > > I have a Mesos 0.28.2 system and generally things seem to run fine. The > "Outstanding Offers" normally shows nothing, which I believe is normal. > However at some point my framework gets disconnected for some odd reason, > might be due to some high load or so. A few seconds later I receive a > reregistered call from Mesos. However it looks like around this time offers > start to get listed on the "Oustanding Offers" page. Even more strangely no > Mesos log file contains any information for the offer IDs shown. > Unfortunately the default logging does not show what offer IDs are being > send out while it shows the IDs that are being declined or got accepted. So > I don't know when these actually offers got send out. > > How can I deal with such situation? Should I: > Stop the SchedulerDriver when I get disconnected instead of waiting > for a reregistered call? > Is it advised to set --offer_timeout to recover from such a situation? > Is there any way to reconcile offers like one can do for tasks? > > thanks, > Hendrik >
Transition TASK_KILLING -> TASK_RUNNING
We've recently discovered a bug that may lead to a task being transitioned from killing to running state. More information about it in MESOS-6457 [1]. We plan to fix it in 1.2.0 and will backport it to all supported versions. [1] https://issues.apache.org/jira/browse/MESOS-6457
[VOTE] Release Apache Mesos 1.1.0 (rc2)
Hi all, Please vote on releasing the following candidate as Apache Mesos 1.1.0. 1.1.0 includes the following: * [MESOS-2449] - **Experimental** support for launching a group of tasks via a new `LAUNCH_GROUP` Offer operation. Mesos will guarantee that either all tasks or none of the tasks in the group are delivered to the executor. Executors receive the task group via a new `LAUNCH_GROUP` event. * [MESOS-2533] - **Experimental** support for HTTP and HTTPS health checks. Executors may now use the updated `HealthCheck` protobuf to implement HTTP(S) health checks. Both default executors (command and docker) leverage `curl` binary for sending HTTP(S) requests and connect to `127.0.0.1`, hence a task must listen on all interfaces. On Linux, For BRIDGE and USER modes, docker executor enters the task's network namespace. * [MESOS-3421] - **Experimental** Support sharing of resources across containers. Currently persistent volumes are the only resources allowed to be shared. * [MESOS-3567] - **Experimental** support for TCP health checks. Executors may now use the updated `HealthCheck` protobuf to implement TCP health checks. Both default executors (command and docker) connect to `127.0.0.1`, hence a task must listen on all interfaces. On Linux, For BRIDGE and USER modes, docker executor enters the task's network namespace. * [MESOS-4324] - Allow access to persistent volumes as read-only or read-write by tasks. Mesos doesn't allow persistent volumes to be created as read-only but in 1.1 it starts allow tasks to use the volumes as read-only. This is mainly motivated by shared persistent volumes but applies to regular persistent volumes as well. * [MESOS-5275] - **Experimental** support for linux capabilities. Frameworks or operators now have fine-grained control over the capabilities that a container may have. This allows a container to run as root, but not have all the privileges associated with the root user (e.g., CAP_SYS_ADMIN). * [MESOS-5344] -- **Experimental** support for partition-aware Mesos frameworks. In previous Mesos releases, when an agent is partitioned from the master and then reregisters with the cluster, all tasks running on the agent are terminated and the agent is shutdown. In Mesos 1.1, partitioned agents will no longer be shutdown when they reregister with the master. By default, tasks running on such agents will still be killed (for backward compatibility); however, frameworks can opt-in to the new PARTITION_AWARE capability. If they do this, their tasks will not be killed when a partition is healed. This allows frameworks to define their own policies for how to handle partitioned tasks. Enabling the PARTITION_AWARE capability also introduces a new set of task states: TASK_UNREACHABLE, TASK_DROPPED, TASK_GONE, TASK_GONE_BY_OPERATOR, and TASK_UNKNOWN. These new states are intended to eventually replace the TASK_LOST state. * [MESOS-6077] - **Experimental** A new default executor is introduced which frameworks can use to launch task groups as nested containers. All the nested containers share resources likes cpu, memory, network and volumes. * [MESOS-6014] - **Experimental** A new port-mapper CNI plugin, the `mesos-cni-port-mapper` has been introduced. For Mesos containers, with the CNI port-mapper plugin, users can now expose container ports through host ports using DNAT. This is especially useful when Mesos containers are attached to isolated CNI networks such as private bridge networks, and the services running in the container needs to be exposed outside these isolated networks. The CHANGELOG for the release is available at: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.1.0-rc2 The candidate for Mesos 1.1.0 release is available at: https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc2/mesos-1.1.0.tar.gz The tag to be voted on is 1.1.0-rc2: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.1.0-rc2 The MD5 checksum of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc2/mesos-1.1.0.tar.gz.md5 The signature of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc2/mesos-1.1.0.tar.gz.asc The PGP key used to sign the release is here: https://dist.apache.org/repos/dist/release/mesos/KEYS The JAR is up in Maven in a staging repository here: https://repository.apache.org/content/repositories/orgapachemesos-1162 Please vote on releasing this package as Apache Mesos 1.1.0! The vote is open until Thu Nov 3 14:46:55 CET 2016 and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Mesos 1.1.
outstanding offers
Hi, I have a Mesos 0.28.2 system and generally things seem to run fine. The "Outstanding Offers" normally shows nothing, which I believe is normal. However at some point my framework gets disconnected for some odd reason, might be due to some high load or so. A few seconds later I receive a reregistered call from Mesos. However it looks like around this time offers start to get listed on the "Oustanding Offers" page. Even more strangely no Mesos log file contains any information for the offer IDs shown. Unfortunately the default logging does not show what offer IDs are being send out while it shows the IDs that are being declined or got accepted. So I don't know when these actually offers got send out. How can I deal with such situation? Should I: Stop the SchedulerDriver when I get disconnected instead of waiting for a reregistered call? Is it advised to set --offer_timeout to recover from such a situation? Is there any way to reconcile offers like one can do for tasks? thanks, Hendrik