Re: Updated agent resources with every offer.
So I've thought about this carefully and I've thought of a workaround that doesn't work yet, but maybe will if people can chime in. I have a demon thread that starts when the executor registers with the framework. The demon thread probes the free disk every n seconds and sends an update with a task ID (that is guaranteed to be unique) to the framework if something has changed. The framework parses statuses received with the unique task id and then uses dynamic reservation to achieve the result i want. executor.py contains this ``` def demonThread(): < probe disk state > if remainingMesosUsableDisk == previousDiskUpdate: pass else: log.debug("Disk usage changed. Sending disk update.") previousDiskUpdate = remainingMesosUsableDisk # Send the status of the disk usage via a status update status = mesos_pb2.TaskStatus() status.task_id.value = '-1' status.message = str(remainingMesosUsableDisk) status.state = mesos_pb2.TASK_RUNNING driver.sendStatusUpdate(status) ``` And the framework.py contains this ``` def statusUpdate(self, driver, update): taskID = int(update.task_id.value) if taskID == -1: nodeFreeDisk = int(update.message) self.offerUpdateReqd[update.slave_id.value] = nodeFreeDisk if not self.implicitAcknowledgements: driver.acknowledgeStatusUpdate(update) return None def resourceOffers(self, driver, offers): for offer in offers: if self.offerUpdateReqd[offer.slave_id.value] is not None: operation = mesos_pb2.Offer.Operation() operation.type = mesos_pb2.Offer.Operation.RESERVE disk = operation.reserve.resources.add() disk.name = "disk" disk.type = mesos_pb2.Value.SCALAR disk.scalar.value = self.offerUpdateReqd[offer.slave_id.value]/1024/1024 disk.role = "*" # Accept the offer with the reservation operation and continue driver.acceptOffers([offer.id], [operation]) ``` So for example, the agent has 5gb free when the executor registers with the framework. The first offer says there is 5gb free. The firs tjob writes 1Gb to the disk and the demon thread sends the framework a message saying there is only 4gb left on the system. the next offer that comes from the slave is accepted with a reserve operation specifying disk.scalar.value = 4Gb. According to the readme, if the reservation was successful, the next offer should have disk = 4Gb. however, the next offer shows the same 5Gb as the original. This either means that the reservation did not go through, or I'm missing something here. Does anyone have any thoughts about this? Arjun On Fri, Feb 12, 2016 at 6:02 PM, Vinod Kone <vinodk...@gmail.com> wrote: > Say your task asks for 1cpu and disk. After task terminates, mesos > immediately offers back 1cpu and 1gb disk. It makes sense for cpu but not > so much for disk. > > Mesos slave overcommits the disk in that sense. Mainly to allow task > owners access to sandbox data after task termination. The asynchronous gc > thread garbage collects the sandbox if there is disk space pressure on the > host. > > > @vinodkone > > On Feb 12, 2016, at 5:26 PM, Arkal Arjun Rao <aa...@ucsc.edu> wrote: > > That can be modified with the right values for gc_delay. > > I'm running a very basic test test where I accept a request, write a files > to the sandbox, sleep for 100s, then exit. After exit, I probe the next > offer. > > Having not specified any value for disk_watch_interval and assuming it is > the default 60s, the new offer should have disk = (Original value - size of > file i wrote to sandbox), right? Am i missing something here? > > Arjun > > On Fri, Feb 12, 2016 at 5:05 PM, Chong Chen <chong.ch...@huawei.com> > wrote: > >> Hi, >> >> I think the garbage collector of Mesos agent will remove the directory of >> the finished task. >> >> Thanks! >> >> >> >> *From:* Arkal Arjun Rao [mailto:aa...@ucsc.edu] >> *Sent:* Friday, February 12, 2016 4:22 PM >> *To:* user@mesos.apache.org >> *Subject:* Re: Updated agent resources with every offer. >> >> >> >> Hi Vinod, >> >> >> >> Thanks for the reply. I think I understand what you mean. Could you >> clarify these follow-up questions? >> >> >> >> 1. So if I did write to the sandbox, mesos would know and send the >> correct offer? >> >> 2. And if so, and this might be hacky, if i bind mounted my docker folder >> (where all cached images are stored) into a sandbox directory, do you think >> Mesos will register the correct state of the disk in the offer? (Suppose I >> were to spawn a possibly persist
Re: Updated agent resources with every offer.
Say your task asks for 1cpu and disk. After task terminates, mesos immediately offers back 1cpu and 1gb disk. It makes sense for cpu but not so much for disk. Mesos slave overcommits the disk in that sense. Mainly to allow task owners access to sandbox data after task termination. The asynchronous gc thread garbage collects the sandbox if there is disk space pressure on the host. @vinodkone > On Feb 12, 2016, at 5:26 PM, Arkal Arjun Rao <aa...@ucsc.edu> wrote: > > That can be modified with the right values for gc_delay. > > I'm running a very basic test test where I accept a request, write a files to > the sandbox, sleep for 100s, then exit. After exit, I probe the next offer. > > Having not specified any value for disk_watch_interval and assuming it is the > default 60s, the new offer should have disk = (Original value - size of file > i wrote to sandbox), right? Am i missing something here? > > Arjun > >> On Fri, Feb 12, 2016 at 5:05 PM, Chong Chen <chong.ch...@huawei.com> wrote: >> Hi, >> >> I think the garbage collector of Mesos agent will remove the directory of >> the finished task. >> >> Thanks! >> >> >> >> From: Arkal Arjun Rao [mailto:aa...@ucsc.edu] >> Sent: Friday, February 12, 2016 4:22 PM >> To: user@mesos.apache.org >> Subject: Re: Updated agent resources with every offer. >> >> >> >> Hi Vinod, >> >> >> >> Thanks for the reply. I think I understand what you mean. Could you clarify >> these follow-up questions? >> >> >> >> 1. So if I did write to the sandbox, mesos would know and send the correct >> offer? >> >> 2. And if so, and this might be hacky, if i bind mounted my docker folder >> (where all cached images are stored) into a sandbox directory, do you think >> Mesos will register the correct state of the disk in the offer? (Suppose I >> were to spawn a possibly persistent job that requests 0 cores, 0 memory and >> 0gb and use it's sandbox) >> >> >> >> Thanks again, >> >> Arjun >> >> >> >> On Fri, Feb 12, 2016 at 4:08 PM, Vinod Kone <vinodk...@apache.org> wrote: >> >> If your job is writing stuff outside the sandbox it is up to your framework >> to do that resource accounting. It is really tricky for Mesos to do that. >> For example, the second job might be launched even before the first one >> finishes. >> >> >> >> On Fri, Feb 12, 2016 at 3:46 PM, Arkal Arjun Rao <aa...@ucsc.edu> wrote: >> >> Hi All, >> >> >> >> I'm new to Mesos and I'm working on a framework that strongly considers the >> disk value in an offer before making a decision. My jobs don't run in the >> agent's sandbox and may use docker to pull images from my dockerhub and run >> containers on input data downloaded from S3. >> >> >> >> My jobs clean up after themselves but do not delete the cached docker images >> after they complete so a later job can use them directly without the delay >> of downloading the image again. I cannot predict how much a job will leave >> behind. >> >> >> >> Leaving behind files after the job means that the disk space available for >> the next job is less than the disk value the current job had when it >> started. However the offer made to the master does not appear to update the >> disk parameter before making the new offer. Is there any way to get the >> executor driver to update the value passed in the disk field of resource >> offers? >> >> >> >> Here's a Stack overflow with more details >> http://stackoverflow.com/questions/35354841/setup-mesos-to-provide-up-to-date-disk-in-offers >> >> >> >> Thanks in advance, >> >> Arjun Arkal Rao >> >> >> >> PhD Candidate, >> >> Haussler Lab, >> >> UC Santa Cruz, >> >> USA >> >> >> >> >> >> >> >> >> >> >> -- >> >> Arjun Arkal Rao >> >> >> >> PhD Student, >> >> Haussler Lab, >> >> UC Santa Cruz, >> >> USA >> >> >> >> aa...@ucsc.edu >> > > > > -- > Arjun Arkal Rao > > PhD Student, > Haussler Lab, > UC Santa Cruz, > USA > > aa...@ucsc.edu >
Re: Updated agent resources with every offer.
That can be modified with the right values for gc_delay. I'm running a very basic test test where I accept a request, write a files to the sandbox, sleep for 100s, then exit. After exit, I probe the next offer. Having not specified any value for disk_watch_interval and assuming it is the default 60s, the new offer should have disk = (Original value - size of file i wrote to sandbox), right? Am i missing something here? Arjun On Fri, Feb 12, 2016 at 5:05 PM, Chong Chen <chong.ch...@huawei.com> wrote: > Hi, > > I think the garbage collector of Mesos agent will remove the directory of > the finished task. > > Thanks! > > > > *From:* Arkal Arjun Rao [mailto:aa...@ucsc.edu] > *Sent:* Friday, February 12, 2016 4:22 PM > *To:* user@mesos.apache.org > *Subject:* Re: Updated agent resources with every offer. > > > > Hi Vinod, > > > > Thanks for the reply. I think I understand what you mean. Could you > clarify these follow-up questions? > > > > 1. So if I did write to the sandbox, mesos would know and send the correct > offer? > > 2. And if so, and this might be hacky, if i bind mounted my docker folder > (where all cached images are stored) into a sandbox directory, do you think > Mesos will register the correct state of the disk in the offer? (Suppose I > were to spawn a possibly persistent job that requests 0 cores, 0 memory and > 0gb and use it's sandbox) > > > > Thanks again, > > Arjun > > > > On Fri, Feb 12, 2016 at 4:08 PM, Vinod Kone <vinodk...@apache.org> wrote: > > If your job is writing stuff outside the sandbox it is up to your > framework to do that resource accounting. It is really tricky for Mesos to > do that. For example, the second job might be launched even before the > first one finishes. > > > > On Fri, Feb 12, 2016 at 3:46 PM, Arkal Arjun Rao <aa...@ucsc.edu> wrote: > > Hi All, > > > > I'm new to Mesos and I'm working on a framework that strongly considers > the disk value in an offer before making a decision. My jobs don't run in > the agent's sandbox and may use docker to pull images from my dockerhub and > run containers on input data downloaded from S3. > > > > My jobs clean up after themselves but do not delete the cached docker > images after they complete so a later job can use them directly without the > delay of downloading the image again. I cannot predict how much a job will > leave behind. > > > > Leaving behind files after the job means that the disk space available for > the next job is less than the disk value the current job had when it > started. However the offer made to the master does not appear to update the > disk parameter before making the new offer. Is there any way to get the > executor driver to update the value passed in the disk field of resource > offers? > > > > Here's a Stack overflow with more details > http://stackoverflow.com/questions/35354841/setup-mesos-to-provide-up-to-date-disk-in-offers > > > > Thanks in advance, > > Arjun Arkal Rao > > > > PhD Candidate, > > Haussler Lab, > > UC Santa Cruz, > > USA > > > > > > > > > > -- > > Arjun Arkal Rao > > > > PhD Student, > > Haussler Lab, > > UC Santa Cruz, > > USA > > > > aa...@ucsc.edu > > > -- Arjun Arkal Rao PhD Student, Haussler Lab, UC Santa Cruz, USA aa...@ucsc.edu
Re: Updated agent resources with every offer.
If your job is writing stuff outside the sandbox it is up to your framework to do that resource accounting. It is really tricky for Mesos to do that. For example, the second job might be launched even before the first one finishes. On Fri, Feb 12, 2016 at 3:46 PM, Arkal Arjun Raowrote: > Hi All, > > I'm new to Mesos and I'm working on a framework that strongly considers > the disk value in an offer before making a decision. My jobs don't run in > the agent's sandbox and may use docker to pull images from my dockerhub and > run containers on input data downloaded from S3. > > My jobs clean up after themselves but do not delete the cached docker > images after they complete so a later job can use them directly without the > delay of downloading the image again. I cannot predict how much a job will > leave behind. > > Leaving behind files after the job means that the disk space available for > the next job is less than the disk value the current job had when it > started. However the offer made to the master does not appear to update the > disk parameter before making the new offer. Is there any way to get the > executor driver to update the value passed in the disk field of resource > offers? > > Here's a Stack overflow with more details > http://stackoverflow.com/questions/35354841/setup-mesos-to-provide-up-to-date-disk-in-offers > > Thanks in advance, > Arjun Arkal Rao > > PhD Candidate, > Haussler Lab, > UC Santa Cruz, > USA > >