Re: Sandbox life cycle /age

Venkat Morampudi Fri, 27 Oct 2017 08:44:15 -0700

Hi Tomek,

After changing GC delay to 2hrs, the existing sandbox folders that are older 
than the “Max allowed age” are not deleted. Here are the logs


Logs entire before and after the change:

I1027 15:00:48.055465 12861 slave.cpp:4615] Current disk usage 21.63%. Max 
allowed age: 1.367499658088657days
I1027 15:02:37.720451 30693 slave.cpp:4615] Current disk usage 21.60%. Max 
allowed age: 1.368035520611667hrs

Executor info from the node:


[techops@kaiju-dcos-privateslave27 ~]$ date
Fri Oct 27 15:41:59 UTC 2017
[techops@kaiju-dcos-privateslave27 ~]$ ls -l 
/var/lib/mesos/slave/slaves/3fd7ffe9-945b-4ae1-968e-2c397585960c-S201/frameworks/35e600c2-6
f43-402c-856f-9084c0040187-002/executors/
total 452
drwxr-xr-x. 3 root root 4096 Oct 26 09:43 74861.0.0
drwxr-xr-x. 3 root root 4096 Oct 26 09:56 74861.10.0
drwxr-xr-x. 3 root root 4096 Oct 26 10:31 74867.0.0
drwxr-xr-x. 3 root root 4096 Oct 26 11:07 74871.3.0
drwxr-xr-x. 3 root root 4096 Oct 26 11:45 74875.10.0
drwxr-xr-x. 3 root root 4096 Oct 26 13:43 74886.0.0
drwxr-xr-x. 3 root root 4096 Oct 26 13:45 74886.2.0
drwxr-xr-x. 3 root root 4096 Oct 26 13:51 74886.8.0
drwxr-xr-x. 3 root root 4096 Oct 26 13:56 74886.9.0
drwxr-xr-x. 3 root root 4096 Oct 26 13:59 74887.1.0
drwxr-xr-x. 3 root root 4096 Oct 26 14:42 74895.0.0
drwxr-xr-x. 3 root root 4096 Oct 26 14:58 74899.7.0
drwxr-xr-x. 3 root root 4096 Oct 26 14:59 74900.8.0
drwxr-xr-x. 3 root root 4096 Oct 26 15:12 74902.0.0
drwxr-xr-x. 3 root root 4096 Oct 26 17:05 74938.0.0
drwxr-xr-x. 3 root root 4096 Oct 26 17:10 74940.0.0
drwxr-xr-x. 3 root root 4096 Oct 26 17:11 74942.0.0
drwxr-xr-x. 3 root root 4096 Oct 26 17:30 74944.3.0
drwxr-xr-x. 3 root root 4096 Oct 26 17:45 74951.0.0
drwxr-xr-x. 3 root root 4096 Oct 26 17:45 74951.1.0
drwxr-xr-x. 3 root root 4096 Oct 26 18:07 74959.7.0
drwxr-xr-x. 3 root root 4096 Oct 26 18:47 74971.0.0
drwxr-xr-x. 3 root root 4096 Oct 26 19:18 74981.3.0
drwxr-xr-x. 3 root root 4096 Oct 26 20:14 74992.0.0
drwxr-xr-x. 3 root root 4096 Oct 26 20:28 74995.3.0
drwxr-xr-x. 3 root root 4096 Oct 26 20:59 75001.1.0


Thanks,
Venkat

> On Oct 27, 2017, at 6:42 AM, Tomek Janiszewski <[email protected]> wrote:
> 
> Low GC delay menas files will be deleted more often. I don't' think there
> will be any performance problem but low GC means you will lose your
> sandboxes earlier and they are useful for debugging purposes.
> 
> pt., 27 paź 2017 o 04:40 użytkownik Venkat Morampudi <
> [email protected] <mailto:[email protected]>> napisał:
> 
>> Hi Tomek,
>> 
>> Thanks for the quick reply. After digging a bit into Mesos code we were
>> able understand that age actually mean threshold age. Anything older than
>> the “age" would be GCed. We are going to try different setting starting
>> with "--gc_disk_headroom=.2 --gc_delay=2hrs”. Is there any downside of the
>> going with very low GC delay?
>> 
>> Thanks,
>> Venkat
>> 
>> 
>>> On Oct 26, 2017, at 4:28 PM, Tomek Janiszewski <[email protected]>
>> wrote:
>>> 
>>>> gc_delay * max(0.0, (1.0 - gc_disk_headroom - disk usage))
>>> 
>>> *Example:*
>>> gc_delay = 7days
>>> gc_disk_headroom = 0.1
>>> disk_usage = 0.8
>>> 7 * max(0.0, 1 - 0.1 - 0.8) = 7 * max(0.0, 0.1) = 0.7 days = 16 h 48 min
>>> 
>>> Can you show some logs containging information about GC?
>>> 
>>> pt., 27 paź 2017 o 00:43 użytkownik Venkat Morampudi <
>>> [email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>> 
>>> napisał:
>>> 
>>>> Hello,
>>>> In our production env, we noticed that our disk filled up because one
>>>> framework had a lot of failed/completed executors folders laying around.
>>>> The folders eventually filled up the disk.
>>>> 
>>>> 
>>>> 228M
>>>> 
>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52334.1.0
>>>> 228M
>>>> 
>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52334.2.0
>>>> 228M
>>>> 
>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52335.1.0
>>>> 228M
>>>> 
>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52335.2.0
>>>> 228M
>>>> 
>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52336.1.0
>>>> 
>>>> http://mesos.apache.org/documentation/latest/sandbox/#sandbox-lifecycle 
>>>> <http://mesos.apache.org/documentation/latest/sandbox/#sandbox-lifecycle>
>> <
>>>> http://mesos.apache.org/documentation/latest/sandbox/#sandbox-lifecycle
>> <http://mesos.apache.org/documentation/latest/sandbox/#sandbox-lifecycle>>
>>>> 
>>>> We have our lifecycle clean up set to the default which is 7days, I
>>>> believe.
>>>> 
>>>> We wanted to know if this is the proper way to clean up the
>>>> failed/completed executors folders for a running framework?
>>>> OR does the framework need to be Inactive or Completed for the garbage
>>>> collection to work?
>>>> OR does the framework , itself, need to deal with cleaning up its own
>>>> executors?
>>>> 
>>>> Bonus question: How does “gc_disk_headroom” actually work? This equation
>>>> will always return 0 it seems. gc_delay * max(0.0, (1.0 -
>> gc_disk_headroom
>>>> - disk usage))
>>>> 
>>>> Thanks,
>>>> Venkat

Re: Sandbox life cycle /age

Reply via email to