Re: Launching tasks with reserved resources
Sounds good, will do. Regards, Gidon From: Alex Rukletsov To: user@mesos.apache.org Date: 17/08/2015 05:30 PM Subject:Re: Launching tasks with reserved resources > if there were an api for splitting a resource object I think it's a good idea, "resource math" is something that each framework re-implements. We were discussing the idea of providing a "framework kit", but AFAIK there has been no work done in this direction yet. Mind filing a JIRA ticket? > sending the reserved and unreserved resources in two separate offers indeed helps here I would say this one also deserves a ticket. I may not see some use cases where this is undesirable, but will be happy to see the discussion around that documented in the ticket. Even if the ticket will end up in "won't fix", the discussion and reasoning can be helpful for posterity. On Mon, Aug 17, 2015 at 3:46 PM, Gidon Gershinsky wrote: Hi Alex, Yep, this setup is using static reservations in agents. I haven't tried running a big task with two or more resources (reserved and unreserved), but guess it is quite intuitive for a developer - a framework is offered two resource objects, and launches a task specifying these objects, no need to dive too deep into resource roles etc. If a framework hoards resources, it can "sum up" the offered objects, which again looks reasonable. The problem I had is at the opposite end - when a framework needs to split the offered resources and run many smaller tasks. Eventually, I was able to bypass it, by micro-managing the role assignment to each task resources; cumbersome, but works. So its more of a usage issue - if there were an api for splitting a resource object (opposite to the "+" api for summing/hoarding), the things would be more intuitive. Btw, sending the reserved and unreserved resources in two separate offers indeed helps here, since each offer comes with a single role. In any case, I agree it makes sense for a developer to be aware of the reservation policies. Regards, Gidon From:Alex Rukletsov To:user@mesos.apache.org Date:17/08/2015 01:02 PM Subject:Re: Launching tasks with reserved resources Hi Gidon, just to make sure, you mean static reservations on mesos agents (via --resources flag) and not dynamic reservations, right? Let me first try to explain, why you get the TASK_ERROR message. The built-in allocator merges '*' and reserved resources, hinting master to create a single offer. However, as you mentioned before, validation fails, if you try to mix resources with different role, because the function responsible for validation checks whether task resources are "contained" in offered resources, which obviously includes role equality check. Here are some source code snippets: https://github.com/apache/mesos/blob/master/src/master/validation.cpp#L449 https://github.com/apache/mesos/blob/master/src/common/resources.cpp#L598 https://github.com/apache/mesos/blob/master/src/common/resources.cpp#L244 https://github.com/apache/mesos/blob/master/src/common/resources.cpp#L197 Maybe we should split reserved and unreserved resources into two offers? Now, to your second concern about whether we should disallow tasks using both '*' and 'role' resources. I see your point: if a framework is entitled to use reserved and unreserved resources, why not hoard them and launch a bigger task? I think it's fine, and you should be actually able to do it by explicitly specifying two different resource objects in the task launch message, one for '*" resources and one for your role. Why cannot you just use your framework's role for both? Different roles may have different guarantees (quota, MESOS-1791), and while reserved resources may still be available for your framework, '*" may become unavailable for you (in future Mesos releases or with custom allocators) leading to the whole task termination. By requiring two different objects in the task launch message we motivate the framework ? i.e. framework writer ? to be aware of different policies that may be attached to different roles. Does it make sense? ?Alex On Thu, Aug 13, 2015 at 2:23 PM, Gidon Gershinsky wrote: I have a simple setup where a framework runs with a role, and some resources are reserved in cluster for that role. The resource offers arrive at the framework as a list of two resource sets: one general (cpus(*)), etc) and one specific for the role (cpus("role1"), etc). So far so good. If two tasks are launched, each with one of the two resources, things work. But problems start when I need to launch multiple smaller tasks (with a total resource consumption equal to the offered). I run this by creating resource objects, and attaching them to task
Re: Launching tasks with reserved resources
Hi Alex, Yep, this setup is using static reservations in agents. I haven't tried running a big task with two or more resources (reserved and unreserved), but guess it is quite intuitive for a developer - a framework is offered two resource objects, and launches a task specifying these objects, no need to dive too deep into resource roles etc. If a framework hoards resources, it can "sum up" the offered objects, which again looks reasonable. The problem I had is at the opposite end - when a framework needs to split the offered resources and run many smaller tasks. Eventually, I was able to bypass it, by micro-managing the role assignment to each task resources; cumbersome, but works. So its more of a usage issue - if there were an api for splitting a resource object (opposite to the "+" api for summing/hoarding), the things would be more intuitive. Btw, sending the reserved and unreserved resources in two separate offers indeed helps here, since each offer comes with a single role. In any case, I agree it makes sense for a developer to be aware of the reservation policies. Regards, Gidon From: Alex Rukletsov To: user@mesos.apache.org Date: 17/08/2015 01:02 PM Subject:Re: Launching tasks with reserved resources Hi Gidon, just to make sure, you mean static reservations on mesos agents (via --resources flag) and not dynamic reservations, right? Let me first try to explain, why you get the TASK_ERROR message. The built-in allocator merges '*' and reserved resources, hinting master to create a single offer. However, as you mentioned before, validation fails, if you try to mix resources with different role, because the function responsible for validation checks whether task resources are "contained" in offered resources, which obviously includes role equality check. Here are some source code snippets: https://github.com/apache/mesos/blob/master/src/master/validation.cpp#L449 https://github.com/apache/mesos/blob/master/src/common/resources.cpp#L598 https://github.com/apache/mesos/blob/master/src/common/resources.cpp#L244 https://github.com/apache/mesos/blob/master/src/common/resources.cpp#L197 Maybe we should split reserved and unreserved resources into two offers? Now, to your second concern about whether we should disallow tasks using both '*' and 'role' resources. I see your point: if a framework is entitled to use reserved and unreserved resources, why not hoard them and launch a bigger task? I think it's fine, and you should be actually able to do it by explicitly specifying two different resource objects in the task launch message, one for '*" resources and one for your role. Why cannot you just use your framework's role for both? Different roles may have different guarantees (quota, MESOS-1791), and while reserved resources may still be available for your framework, '*" may become unavailable for you (in future Mesos releases or with custom allocators) leading to the whole task termination. By requiring two different objects in the task launch message we motivate the framework ? i.e. framework writer ? to be aware of different policies that may be attached to different roles. Does it make sense? ?Alex On Thu, Aug 13, 2015 at 2:23 PM, Gidon Gershinsky wrote: I have a simple setup where a framework runs with a role, and some resources are reserved in cluster for that role. The resource offers arrive at the framework as a list of two resource sets: one general (cpus(*)), etc) and one specific for the role (cpus("role1"), etc). So far so good. If two tasks are launched, each with one of the two resources, things work. But problems start when I need to launch multiple smaller tasks (with a total resource consumption equal to the offered). I run this by creating resource objects, and attaching them to tasks, using calls from the standard Mesos samples (python): task = mesos_pb2.TaskInfo() cpus = task.resources.add() cpus.name = "cpus" cpus.scalar.value = TASK_CPUS checking that total doesnt surpass the offered resources. This starts fine, but soon I get TASK_ERROR messages, due to Master validator finding that more resources are requested by tasks than available in the offer. This obviously happens because all tasks resources, as defined above, come with (*) role, while the offer resources are split between "*" and "role1" ! Ok, then I assign a role to task resources, by adding cpus.role = "role1" But this fails again, and for the same reason.. Shouldn't this work differently? When a resource offer is received framework with a "role1", why should it care which part is 'unreserved' and which part is reserved to "role1"? When a task
Launching tasks with reserved resources
I have a simple setup where a framework runs with a role, and some resources are reserved in cluster for that role. The resource offers arrive at the framework as a list of two resource sets: one general (cpus(*)), etc) and one specific for the role (cpus("role1"), etc). So far so good. If two tasks are launched, each with one of the two resources, things work. But problems start when I need to launch multiple smaller tasks (with a total resource consumption equal to the offered). I run this by creating resource objects, and attaching them to tasks, using calls from the standard Mesos samples (python): task = mesos_pb2.TaskInfo() cpus = task.resources.add() cpus.name = "cpus" cpus.scalar.value = TASK_CPUS checking that total doesnt surpass the offered resources. This starts fine, but soon I get TASK_ERROR messages, due to Master validator finding that more resources are requested by tasks than available in the offer. This obviously happens because all tasks resources, as defined above, come with (*) role, while the offer resources are split between "*" and "role1" ! Ok, then I assign a role to task resources, by adding cpus.role = "role1" But this fails again, and for the same reason.. Shouldn't this work differently? When a resource offer is received framework with a "role1", why should it care which part is 'unreserved' and which part is reserved to "role1"? When a task launch request is received by the master, from a framework with a role, why can't it check only the total resource amount, instead of treating unreserved and reserved resources separately? They are reserved for this role anyway.. Or I'm missing something? Regards, Gidon
Spark on Mesos
Hi all, I have a few questions on how Spark is integrated with Mesos, thought I'd ask here first :), before going to the Spark list. Any details, or pointers to a design document / relevant source, will be much appreciated. I'm aware of this description, https://github.com/apache/spark/blob/master/docs/running-on-mesos.md But its pretty high-level as far as the design is concerned, while I'm looking into lower details on how Spark actually calls the Mesos APIs, how it launches the tasks, etc Namely, 1. Does Spark creates a Mesos Framework instance for each Spark application (SparkContext)? 2. Citing from the link above, "In "fine-grained" mode (default), each Spark task runs as a separate Mesos task ... comes with an additional overhead in launching each task " Does it mean that the Mesos slave launches a Spark Executor for each task? (unlikely..) Or the slave host has a number of Spark Executors pre-launched (one per application), and sends the task to its application's executor? What is the resource offer then? Is it a host's cpu slice offered to any Framework (Spark app/context), that sends the task to run on it? Or its a 'slice of app Executor' that got idle, and is offered to its Framework? 2. "The "coarse-grained" mode will instead launch only one long-running Spark task on each Mesos machine, and dynamically schedule its own "mini-tasks" within it. " What is this special task? Is it the Spark app Executor? How these mini-tasks are different from 'regular' Spark tasks? How the resources are allocated/offered in this mode? Regards, Gidon
Resource allocation module
We need to develop a new resource allocation module, replacing the off-the-shelf DRF. As I understand, the current mechanism http://mesos.apache.org/documentation/latest/allocation-module/ is being replaced with a less intrusive module architecture, https://issues.apache.org/jira/browse/MESOS-2160 The capabilities of the new mechanism have real advantages for us. However, it is not clear when it will be released. The jira has an 'in progress' status. What is the current target / horizon for making this available to the users? Also, is there any documentation on the SPIs / technical interfaces of these modules (what info is passed from slaves, frameworks, offers; what calls can be made by the modules; etc)? Regards, Gidon