Re: Updated agent resources with every offer.

2016-02-16 Thread Arkal Arjun Rao
So I've thought about this carefully and I've thought of a workaround that
doesn't work yet, but maybe will if people can chime in.

I have a demon thread that starts when the executor registers with the
framework. The demon thread probes the free disk every n seconds and sends
an update with a task ID (that is guaranteed to be unique) to the framework
if something has changed. The framework parses statuses received with the
unique task id and then uses dynamic reservation to achieve the result i
want.

executor.py contains this
```
def demonThread():
< probe disk state >
if remainingMesosUsableDisk == previousDiskUpdate:
pass
else:
log.debug("Disk usage changed.  Sending disk update.")
previousDiskUpdate = remainingMesosUsableDisk
# Send the status of the disk usage via a status update
status = mesos_pb2.TaskStatus()
status.task_id.value = '-1'
status.message = str(remainingMesosUsableDisk)
status.state = mesos_pb2.TASK_RUNNING
driver.sendStatusUpdate(status)
```

And the framework.py contains this
```
def statusUpdate(self, driver, update):
taskID = int(update.task_id.value)
if taskID == -1:
nodeFreeDisk = int(update.message)
self.offerUpdateReqd[update.slave_id.value] = nodeFreeDisk
if not self.implicitAcknowledgements:
driver.acknowledgeStatusUpdate(update)
return None

def resourceOffers(self, driver, offers):
for offer in offers:
if self.offerUpdateReqd[offer.slave_id.value] is not None:
operation = mesos_pb2.Offer.Operation()
operation.type = mesos_pb2.Offer.Operation.RESERVE
disk = operation.reserve.resources.add()
disk.name = "disk"
disk.type = mesos_pb2.Value.SCALAR
disk.scalar.value =
self.offerUpdateReqd[offer.slave_id.value]/1024/1024
disk.role = "*"
# Accept the offer with the reservation operation and continue
driver.acceptOffers([offer.id], [operation])
```

So for example, the agent has 5gb free when the executor registers with the
framework. The first offer says there is 5gb free.
The firs tjob writes 1Gb to the disk and the demon thread sends the
framework a message saying there is only 4gb left on the system. the next
offer that comes from the slave is accepted with a reserve operation
specifying disk.scalar.value = 4Gb. According to the readme, if the
reservation was successful, the next offer should have disk = 4Gb.

however, the next offer shows the same 5Gb as the original. This either
means that the reservation did not go through, or I'm missing something
here.

Does anyone have any thoughts about this?

Arjun

On Fri, Feb 12, 2016 at 6:02 PM, Vinod Kone <vinodk...@gmail.com> wrote:

> Say your task asks for 1cpu and  disk. After task terminates, mesos
> immediately offers back 1cpu and 1gb disk. It makes sense for cpu but not
> so much for disk.
>
> Mesos slave overcommits the disk in that sense. Mainly to allow task
> owners access to sandbox data after task termination. The asynchronous gc
> thread garbage collects the sandbox if there is disk space pressure on the
> host.
>
>
> @vinodkone
>
> On Feb 12, 2016, at 5:26 PM, Arkal Arjun Rao <aa...@ucsc.edu> wrote:
>
> That can be modified with the right values for gc_delay.
>
> I'm running a very basic test test where I accept a request, write a files
> to the sandbox, sleep for 100s, then exit. After exit, I probe the next
> offer.
>
> Having not specified any value for disk_watch_interval and assuming it is
> the default 60s, the new offer should have disk = (Original value - size of
> file i wrote to sandbox), right? Am i missing something here?
>
> Arjun
>
> On Fri, Feb 12, 2016 at 5:05 PM, Chong Chen <chong.ch...@huawei.com>
> wrote:
>
>> Hi,
>>
>> I think the garbage collector of Mesos agent will remove the directory of
>> the finished task.
>>
>> Thanks!
>>
>>
>>
>> *From:* Arkal Arjun Rao [mailto:aa...@ucsc.edu]
>> *Sent:* Friday, February 12, 2016 4:22 PM
>> *To:* user@mesos.apache.org
>> *Subject:* Re: Updated agent resources with every offer.
>>
>>
>>
>> Hi Vinod,
>>
>>
>>
>> Thanks for the reply. I think I understand what you mean. Could you
>> clarify these follow-up questions?
>>
>>
>>
>> 1. So if I did write to the sandbox, mesos would know and send the
>> correct offer?
>>
>> 2. And if so, and this might be hacky, if i bind mounted my docker folder
>> (where all cached images are stored) into a sandbox directory, do you think
>> Mesos will register the correct state of the disk in the offer? (Suppose I
>> were to spawn a possibly persist

Re: Updated agent resources with every offer.

2016-02-12 Thread Vinod Kone
Say your task asks for 1cpu and  disk. After task terminates, mesos immediately 
offers back 1cpu and 1gb disk. It makes sense for cpu but not so much for disk. 

Mesos slave overcommits the disk in that sense. Mainly to allow task owners 
access to sandbox data after task termination. The asynchronous gc thread 
garbage collects the sandbox if there is disk space pressure on the host. 


@vinodkone

> On Feb 12, 2016, at 5:26 PM, Arkal Arjun Rao <aa...@ucsc.edu> wrote:
> 
> That can be modified with the right values for gc_delay. 
> 
> I'm running a very basic test test where I accept a request, write a files to 
> the sandbox, sleep for 100s, then exit. After exit, I probe the next offer.
> 
> Having not specified any value for disk_watch_interval and assuming it is the 
> default 60s, the new offer should have disk = (Original value - size of file 
> i wrote to sandbox), right? Am i missing something here?
> 
> Arjun
> 
>> On Fri, Feb 12, 2016 at 5:05 PM, Chong Chen <chong.ch...@huawei.com> wrote:
>> Hi,
>> 
>> I think the garbage collector of Mesos agent will remove the directory of 
>> the finished task.
>> 
>> Thanks!
>> 
>>  
>> 
>> From: Arkal Arjun Rao [mailto:aa...@ucsc.edu] 
>> Sent: Friday, February 12, 2016 4:22 PM
>> To: user@mesos.apache.org
>> Subject: Re: Updated agent resources with every offer.
>> 
>>  
>> 
>> Hi Vinod,
>> 
>>  
>> 
>> Thanks for the reply. I think I understand what you mean. Could you clarify 
>> these follow-up questions?
>> 
>>  
>> 
>> 1. So if I did write to the sandbox, mesos would know and send the correct 
>> offer?
>> 
>> 2. And if so, and this might be hacky, if i bind mounted my docker folder 
>> (where all cached images are stored) into a sandbox directory, do you think 
>> Mesos will register the correct state of the disk in the offer? (Suppose I 
>> were to spawn a possibly persistent job that requests 0 cores, 0 memory and 
>> 0gb and use it's sandbox)
>> 
>>  
>> 
>> Thanks again,
>> 
>> Arjun
>> 
>>  
>> 
>> On Fri, Feb 12, 2016 at 4:08 PM, Vinod Kone <vinodk...@apache.org> wrote:
>> 
>> If your job is writing stuff outside the sandbox it is up to your framework 
>> to do that resource accounting. It is really tricky for Mesos to do that. 
>> For example, the second job might be launched even before the first one 
>> finishes.
>> 
>>  
>> 
>> On Fri, Feb 12, 2016 at 3:46 PM, Arkal Arjun Rao <aa...@ucsc.edu> wrote:
>> 
>> Hi All,
>> 
>>  
>> 
>> I'm new to Mesos and I'm working on a  framework that strongly considers the 
>> disk value in an offer before making a decision. My jobs don't run in the 
>> agent's sandbox and may use docker to pull images from my dockerhub and run 
>> containers on input data downloaded from S3.
>> 
>>  
>> 
>> My jobs clean up after themselves but do not delete the cached docker images 
>> after they complete so a later job can use them directly without the delay 
>> of downloading the image again. I cannot predict how much a job will leave 
>> behind.
>> 
>>  
>> 
>> Leaving behind files after the job means that the disk space available for 
>> the next job is less than the disk value the current job had when it 
>> started. However the offer made to the master does not appear to update the 
>> disk parameter before making the new offer. Is there any way to get the 
>> executor driver to update the value passed in the disk field of resource 
>> offers?
>> 
>>  
>> 
>> Here's a Stack overflow with more details 
>> http://stackoverflow.com/questions/35354841/setup-mesos-to-provide-up-to-date-disk-in-offers
>> 
>>  
>> 
>> Thanks in advance,
>> 
>> Arjun Arkal Rao
>> 
>>  
>> 
>> PhD Candidate,
>> 
>> Haussler Lab,
>> 
>> UC Santa Cruz,
>> 
>> USA
>> 
>>  
>> 
>>  
>> 
>> 
>> 
>> 
>>  
>> 
>> --
>> 
>> Arjun Arkal Rao
>> 
>>  
>> 
>> PhD Student,
>> 
>> Haussler Lab,
>> 
>> UC Santa Cruz,
>> 
>> USA
>> 
>>  
>> 
>> aa...@ucsc.edu
>> 
> 
> 
> 
> -- 
> Arjun Arkal Rao
> 
> PhD Student,
> Haussler Lab,
> UC Santa Cruz,
> USA
> 
> aa...@ucsc.edu
> 


Re: Updated agent resources with every offer.

2016-02-12 Thread Arkal Arjun Rao
That can be modified with the right values for gc_delay.

I'm running a very basic test test where I accept a request, write a files
to the sandbox, sleep for 100s, then exit. After exit, I probe the next
offer.

Having not specified any value for disk_watch_interval and assuming it is
the default 60s, the new offer should have disk = (Original value - size of
file i wrote to sandbox), right? Am i missing something here?

Arjun

On Fri, Feb 12, 2016 at 5:05 PM, Chong Chen <chong.ch...@huawei.com> wrote:

> Hi,
>
> I think the garbage collector of Mesos agent will remove the directory of
> the finished task.
>
> Thanks!
>
>
>
> *From:* Arkal Arjun Rao [mailto:aa...@ucsc.edu]
> *Sent:* Friday, February 12, 2016 4:22 PM
> *To:* user@mesos.apache.org
> *Subject:* Re: Updated agent resources with every offer.
>
>
>
> Hi Vinod,
>
>
>
> Thanks for the reply. I think I understand what you mean. Could you
> clarify these follow-up questions?
>
>
>
> 1. So if I did write to the sandbox, mesos would know and send the correct
> offer?
>
> 2. And if so, and this might be hacky, if i bind mounted my docker folder
> (where all cached images are stored) into a sandbox directory, do you think
> Mesos will register the correct state of the disk in the offer? (Suppose I
> were to spawn a possibly persistent job that requests 0 cores, 0 memory and
> 0gb and use it's sandbox)
>
>
>
> Thanks again,
>
> Arjun
>
>
>
> On Fri, Feb 12, 2016 at 4:08 PM, Vinod Kone <vinodk...@apache.org> wrote:
>
> If your job is writing stuff outside the sandbox it is up to your
> framework to do that resource accounting. It is really tricky for Mesos to
> do that. For example, the second job might be launched even before the
> first one finishes.
>
>
>
> On Fri, Feb 12, 2016 at 3:46 PM, Arkal Arjun Rao <aa...@ucsc.edu> wrote:
>
> Hi All,
>
>
>
> I'm new to Mesos and I'm working on a  framework that strongly considers
> the disk value in an offer before making a decision. My jobs don't run in
> the agent's sandbox and may use docker to pull images from my dockerhub and
> run containers on input data downloaded from S3.
>
>
>
> My jobs clean up after themselves but do not delete the cached docker
> images after they complete so a later job can use them directly without the
> delay of downloading the image again. I cannot predict how much a job will
> leave behind.
>
>
>
> Leaving behind files after the job means that the disk space available for
> the next job is less than the disk value the current job had when it
> started. However the offer made to the master does not appear to update the
> disk parameter before making the new offer. Is there any way to get the
> executor driver to update the value passed in the disk field of resource
> offers?
>
>
>
> Here's a Stack overflow with more details
> http://stackoverflow.com/questions/35354841/setup-mesos-to-provide-up-to-date-disk-in-offers
>
>
>
> Thanks in advance,
>
> Arjun Arkal Rao
>
>
>
> PhD Candidate,
>
> Haussler Lab,
>
> UC Santa Cruz,
>
> USA
>
>
>
>
>
>
>
>
>
> --
>
> Arjun Arkal Rao
>
>
>
> PhD Student,
>
> Haussler Lab,
>
> UC Santa Cruz,
>
> USA
>
>
>
> aa...@ucsc.edu
>
>
>



-- 
Arjun Arkal Rao

PhD Student,
Haussler Lab,
UC Santa Cruz,
USA

aa...@ucsc.edu


Re: Updated agent resources with every offer.

2016-02-12 Thread Vinod Kone
If your job is writing stuff outside the sandbox it is up to your framework
to do that resource accounting. It is really tricky for Mesos to do that.
For example, the second job might be launched even before the first one
finishes.

On Fri, Feb 12, 2016 at 3:46 PM, Arkal Arjun Rao  wrote:

> Hi All,
>
> I'm new to Mesos and I'm working on a  framework that strongly considers
> the disk value in an offer before making a decision. My jobs don't run in
> the agent's sandbox and may use docker to pull images from my dockerhub and
> run containers on input data downloaded from S3.
>
> My jobs clean up after themselves but do not delete the cached docker
> images after they complete so a later job can use them directly without the
> delay of downloading the image again. I cannot predict how much a job will
> leave behind.
>
> Leaving behind files after the job means that the disk space available for
> the next job is less than the disk value the current job had when it
> started. However the offer made to the master does not appear to update the
> disk parameter before making the new offer. Is there any way to get the
> executor driver to update the value passed in the disk field of resource
> offers?
>
> Here's a Stack overflow with more details
> http://stackoverflow.com/questions/35354841/setup-mesos-to-provide-up-to-date-disk-in-offers
>
> Thanks in advance,
> Arjun Arkal Rao
>
> PhD Candidate,
> Haussler Lab,
> UC Santa Cruz,
> USA
>
>