Hi Tomek, After changing GC delay to 2hrs, the existing sandbox folders that are older than the “Max allowed age” are not deleted. Here are the logs
Logs entire before and after the change: I1027 15:00:48.055465 12861 slave.cpp:4615] Current disk usage 21.63%. Max allowed age: 1.367499658088657days I1027 15:02:37.720451 30693 slave.cpp:4615] Current disk usage 21.60%. Max allowed age: 1.368035520611667hrs Executor info from the node: [techops@kaiju-dcos-privateslave27 ~]$ date Fri Oct 27 15:41:59 UTC 2017 [techops@kaiju-dcos-privateslave27 ~]$ ls -l /var/lib/mesos/slave/slaves/3fd7ffe9-945b-4ae1-968e-2c397585960c-S201/frameworks/35e600c2-6 f43-402c-856f-9084c0040187-002/executors/ total 452 drwxr-xr-x. 3 root root 4096 Oct 26 09:43 74861.0.0 drwxr-xr-x. 3 root root 4096 Oct 26 09:56 74861.10.0 drwxr-xr-x. 3 root root 4096 Oct 26 10:31 74867.0.0 drwxr-xr-x. 3 root root 4096 Oct 26 11:07 74871.3.0 drwxr-xr-x. 3 root root 4096 Oct 26 11:45 74875.10.0 drwxr-xr-x. 3 root root 4096 Oct 26 13:43 74886.0.0 drwxr-xr-x. 3 root root 4096 Oct 26 13:45 74886.2.0 drwxr-xr-x. 3 root root 4096 Oct 26 13:51 74886.8.0 drwxr-xr-x. 3 root root 4096 Oct 26 13:56 74886.9.0 drwxr-xr-x. 3 root root 4096 Oct 26 13:59 74887.1.0 drwxr-xr-x. 3 root root 4096 Oct 26 14:42 74895.0.0 drwxr-xr-x. 3 root root 4096 Oct 26 14:58 74899.7.0 drwxr-xr-x. 3 root root 4096 Oct 26 14:59 74900.8.0 drwxr-xr-x. 3 root root 4096 Oct 26 15:12 74902.0.0 drwxr-xr-x. 3 root root 4096 Oct 26 17:05 74938.0.0 drwxr-xr-x. 3 root root 4096 Oct 26 17:10 74940.0.0 drwxr-xr-x. 3 root root 4096 Oct 26 17:11 74942.0.0 drwxr-xr-x. 3 root root 4096 Oct 26 17:30 74944.3.0 drwxr-xr-x. 3 root root 4096 Oct 26 17:45 74951.0.0 drwxr-xr-x. 3 root root 4096 Oct 26 17:45 74951.1.0 drwxr-xr-x. 3 root root 4096 Oct 26 18:07 74959.7.0 drwxr-xr-x. 3 root root 4096 Oct 26 18:47 74971.0.0 drwxr-xr-x. 3 root root 4096 Oct 26 19:18 74981.3.0 drwxr-xr-x. 3 root root 4096 Oct 26 20:14 74992.0.0 drwxr-xr-x. 3 root root 4096 Oct 26 20:28 74995.3.0 drwxr-xr-x. 3 root root 4096 Oct 26 20:59 75001.1.0 Thanks, Venkat > On Oct 27, 2017, at 6:42 AM, Tomek Janiszewski <jani...@gmail.com> wrote: > > Low GC delay menas files will be deleted more often. I don't' think there > will be any performance problem but low GC means you will lose your > sandboxes earlier and they are useful for debugging purposes. > > pt., 27 paź 2017 o 04:40 użytkownik Venkat Morampudi < > venkatmoramp...@gmail.com <mailto:venkatmoramp...@gmail.com>> napisał: > >> Hi Tomek, >> >> Thanks for the quick reply. After digging a bit into Mesos code we were >> able understand that age actually mean threshold age. Anything older than >> the “age" would be GCed. We are going to try different setting starting >> with "--gc_disk_headroom=.2 --gc_delay=2hrs”. Is there any downside of the >> going with very low GC delay? >> >> Thanks, >> Venkat >> >> >>> On Oct 26, 2017, at 4:28 PM, Tomek Janiszewski <jani...@gmail.com> >> wrote: >>> >>>> gc_delay * max(0.0, (1.0 - gc_disk_headroom - disk usage)) >>> >>> *Example:* >>> gc_delay = 7days >>> gc_disk_headroom = 0.1 >>> disk_usage = 0.8 >>> 7 * max(0.0, 1 - 0.1 - 0.8) = 7 * max(0.0, 0.1) = 0.7 days = 16 h 48 min >>> >>> Can you show some logs containging information about GC? >>> >>> pt., 27 paź 2017 o 00:43 użytkownik Venkat Morampudi < >>> venkatmoramp...@gmail.com <mailto:venkatmoramp...@gmail.com> >>> <mailto:venkatmoramp...@gmail.com <mailto:venkatmoramp...@gmail.com>>> >>> napisał: >>> >>>> Hello, >>>> In our production env, we noticed that our disk filled up because one >>>> framework had a lot of failed/completed executors folders laying around. >>>> The folders eventually filled up the disk. >>>> >>>> >>>> 228M >>>> >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52334.1.0 >>>> 228M >>>> >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52334.2.0 >>>> 228M >>>> >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52335.1.0 >>>> 228M >>>> >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52335.2.0 >>>> 228M >>>> >> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52336.1.0 >>>> >>>> http://mesos.apache.org/documentation/latest/sandbox/#sandbox-lifecycle >>>> <http://mesos.apache.org/documentation/latest/sandbox/#sandbox-lifecycle> >> < >>>> http://mesos.apache.org/documentation/latest/sandbox/#sandbox-lifecycle >> <http://mesos.apache.org/documentation/latest/sandbox/#sandbox-lifecycle>> >>>> >>>> We have our lifecycle clean up set to the default which is 7days, I >>>> believe. >>>> >>>> We wanted to know if this is the proper way to clean up the >>>> failed/completed executors folders for a running framework? >>>> OR does the framework need to be Inactive or Completed for the garbage >>>> collection to work? >>>> OR does the framework , itself, need to deal with cleaning up its own >>>> executors? >>>> >>>> Bonus question: How does “gc_disk_headroom” actually work? This equation >>>> will always return 0 it seems. gc_delay * max(0.0, (1.0 - >> gc_disk_headroom >>>> - disk usage)) >>>> >>>> Thanks, >>>> Venkat