Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-24 Thread James Peach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/#review206413
---


Ship it!




Ship It!

- James Peach


On July 24, 2018, 6:19 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67914/
> ---
> 
> (Updated July 24, 2018, 6:19 p.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-9007
> https://issues.apache.org/jira/browse/MESOS-9007
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently upon container destruction its project ID is unallocated by
> the isolator and removed from the container work directory. However due
> to API limitations we can't unset project IDs on symlinks that may exist
> inside the directory. Because of that the project may still exist until
> the container directory is garbage collected. If the project ID is
> reused for a new container, any lingering symlinks that still have that
> project ID will contribute to disk usage of the new container. Typically
> symlinks don't take much space, but still this leads to inaccuracy in
> disk space usage accounting.
> 
> This patch postpones project ID reclaiming until sandbox GC time. The
> isolator periodically checks if sandboxes of terminated containers still
> exist and deallocates project IDs of the ones that were removed. Checks
> are performed every "disk_watch_interval". This mechanism can be
> improved in the future if we introduce a way for the isolators to learn
> about disk GCs.
> 
> Current number of available project IDs can be tracked with the new
> "containerizer/mesos/disk/project_ids_free" and
> "containerizer/mesos/disk/project_ids_total" metrics.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
> 0891f7709aa4f98758a727856d58e6177d46adca 
>   src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
> 25f52a43b34b141bdaf7c448817423cf4264e22a 
>   src/tests/containerizer/xfs_quota_tests.cpp 
> dc18a8a59d1eb7fae3592ef6ba8c046e4f46ee4a 
> 
> 
> Diff: https://reviews.apache.org/r/67914/diff/5/
> 
> 
> Testing
> ---
> 
> Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that 
> project ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>



Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-24 Thread Ilya Pronin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/
---

(Updated July 24, 2018, 11:19 a.m.)


Review request for mesos and James Peach.


Bugs: MESOS-9007
https://issues.apache.org/jira/browse/MESOS-9007


Repository: mesos


Description
---

Currently upon container destruction its project ID is unallocated by
the isolator and removed from the container work directory. However due
to API limitations we can't unset project IDs on symlinks that may exist
inside the directory. Because of that the project may still exist until
the container directory is garbage collected. If the project ID is
reused for a new container, any lingering symlinks that still have that
project ID will contribute to disk usage of the new container. Typically
symlinks don't take much space, but still this leads to inaccuracy in
disk space usage accounting.

This patch postpones project ID reclaiming until sandbox GC time. The
isolator periodically checks if sandboxes of terminated containers still
exist and deallocates project IDs of the ones that were removed. Checks
are performed every "disk_watch_interval". This mechanism can be
improved in the future if we introduce a way for the isolators to learn
about disk GCs.

Current number of available project IDs can be tracked with the new
"containerizer/mesos/disk/project_ids_free" and
"containerizer/mesos/disk/project_ids_total" metrics.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
0891f7709aa4f98758a727856d58e6177d46adca 
  src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
25f52a43b34b141bdaf7c448817423cf4264e22a 
  src/tests/containerizer/xfs_quota_tests.cpp 
dc18a8a59d1eb7fae3592ef6ba8c046e4f46ee4a 


Diff: https://reviews.apache.org/r/67914/diff/5/

Changes: https://reviews.apache.org/r/67914/diff/4-5/


Testing
---

Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that project 
ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.


Thanks,

Ilya Pronin



Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-23 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/#review206368
---



PASS: Mesos patch 67914 was successfully built and tested.

Reviews applied: `['67914']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/1969/mesos-review-67914

- Mesos Reviewbot Windows


On July 23, 2018, 11:21 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67914/
> ---
> 
> (Updated July 23, 2018, 11:21 p.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-9007
> https://issues.apache.org/jira/browse/MESOS-9007
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently upon container destruction its project ID is unallocated by
> the isolator and removed from the container work directory. However due
> to API limitations we can't unset project IDs on symlinks that may exist
> inside the directory. Because of that the project may still exist until
> the container directory is garbage collected. If the project ID is
> reused for a new container, any lingering symlinks that still have that
> project ID will contribute to disk usage of the new container. Typically
> symlinks don't take much space, but still this leads to inaccuracy in
> disk space usage accounting.
> 
> This patch postpones project ID reclaiming until sandbox GC time. The
> isolator periodically checks if sandboxes of terminated containers still
> exist and deallocates project IDs of the ones that were removed. Checks
> are performed every "disk_watch_interval". This mechanism can be
> improved in the future if we introduce a way for the isolators to learn
> about disk GCs.
> 
> Current number of available project IDs can be tracked with the new
> "containerizer/mesos/disk/project_ids_free" and
> "containerizer/mesos/disk/project_ids_total" metrics.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
> 0891f7709aa4f98758a727856d58e6177d46adca 
>   src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
> 25f52a43b34b141bdaf7c448817423cf4264e22a 
>   src/tests/containerizer/xfs_quota_tests.cpp 
> dc18a8a59d1eb7fae3592ef6ba8c046e4f46ee4a 
> 
> 
> Diff: https://reviews.apache.org/r/67914/diff/4/
> 
> 
> Testing
> ---
> 
> Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that 
> project ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>



Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-23 Thread Ilya Pronin


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/slave/flags.cpp
> > Lines 1338 (patched)
> > 
> >
> > Rather than introducing a new flag, lets use 
> > `container_disk_watch_interval` or `disk_watch_interval` ... probably the 
> > former?
> 
> Ilya Pronin wrote:
> I think if we do this then we should rather use the latter because disk 
> GC is driven by `disk_watch_interval` and `gc_delay` (maybe use which one of 
> those 2 is the smallest?). `container_disk_watch_interval` will most likely 
> be set to a smaller value to promptly terminate containers that have breached 
> their quota (that's true in our cluster). Running sandboxes check at that 
> frequently would be wasteful.
> 
> James Peach wrote:
> OK, `disk_watch_interval` works for me too.

Done.


- Ilya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/#review206203
---


On July 23, 2018, 4:21 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67914/
> ---
> 
> (Updated July 23, 2018, 4:21 p.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-9007
> https://issues.apache.org/jira/browse/MESOS-9007
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently upon container destruction its project ID is unallocated by
> the isolator and removed from the container work directory. However due
> to API limitations we can't unset project IDs on symlinks that may exist
> inside the directory. Because of that the project may still exist until
> the container directory is garbage collected. If the project ID is
> reused for a new container, any lingering symlinks that still have that
> project ID will contribute to disk usage of the new container. Typically
> symlinks don't take much space, but still this leads to inaccuracy in
> disk space usage accounting.
> 
> This patch postpones project ID reclaiming until sandbox GC time. The
> isolator periodically checks if sandboxes of terminated containers still
> exist and deallocates project IDs of the ones that were removed. Checks
> are performed every "disk_watch_interval". This mechanism can be
> improved in the future if we introduce a way for the isolators to learn
> about disk GCs.
> 
> Current number of available project IDs can be tracked with the new
> "containerizer/mesos/disk/project_ids_free" and
> "containerizer/mesos/disk/project_ids_total" metrics.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
> 0891f7709aa4f98758a727856d58e6177d46adca 
>   src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
> 25f52a43b34b141bdaf7c448817423cf4264e22a 
>   src/tests/containerizer/xfs_quota_tests.cpp 
> dc18a8a59d1eb7fae3592ef6ba8c046e4f46ee4a 
> 
> 
> Diff: https://reviews.apache.org/r/67914/diff/4/
> 
> 
> Testing
> ---
> 
> Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that 
> project ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>



Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-23 Thread Ilya Pronin


> On July 23, 2018, 3:22 p.m., James Peach wrote:
> > src/slave/containerizer/mesos/isolators/xfs/disk.hpp
> > Lines 24 (patched)
> > 
> >
> > Let's use a pull gauge here for sampling performance reasons.
> > 
> > We can decrement it in `recover` and `nextProjectId`, and increment it 
> > in `returnProjectId`.
> 
> James Peach wrote:
> *push gauge* :)

Done.


> On July 23, 2018, 3:22 p.m., James Peach wrote:
> > src/slave/containerizer/mesos/isolators/xfs/disk.hpp
> > Lines 124 (patched)
> > 
> >
> > Shall we add a `project_ids_total` as well?

Done.


- Ilya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/#review206351
---


On July 23, 2018, 4:21 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67914/
> ---
> 
> (Updated July 23, 2018, 4:21 p.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-9007
> https://issues.apache.org/jira/browse/MESOS-9007
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently upon container destruction its project ID is unallocated by
> the isolator and removed from the container work directory. However due
> to API limitations we can't unset project IDs on symlinks that may exist
> inside the directory. Because of that the project may still exist until
> the container directory is garbage collected. If the project ID is
> reused for a new container, any lingering symlinks that still have that
> project ID will contribute to disk usage of the new container. Typically
> symlinks don't take much space, but still this leads to inaccuracy in
> disk space usage accounting.
> 
> This patch postpones project ID reclaiming until sandbox GC time. The
> isolator periodically checks if sandboxes of terminated containers still
> exist and deallocates project IDs of the ones that were removed. Checks
> are performed every "disk_watch_interval". This mechanism can be
> improved in the future if we introduce a way for the isolators to learn
> about disk GCs.
> 
> Current number of available project IDs can be tracked with the new
> "containerizer/mesos/disk/project_ids_free" and
> "containerizer/mesos/disk/project_ids_total" metrics.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
> 0891f7709aa4f98758a727856d58e6177d46adca 
>   src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
> 25f52a43b34b141bdaf7c448817423cf4264e22a 
>   src/tests/containerizer/xfs_quota_tests.cpp 
> dc18a8a59d1eb7fae3592ef6ba8c046e4f46ee4a 
> 
> 
> Diff: https://reviews.apache.org/r/67914/diff/4/
> 
> 
> Testing
> ---
> 
> Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that 
> project ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>



Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-23 Thread Ilya Pronin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/
---

(Updated July 23, 2018, 4:21 p.m.)


Review request for mesos and James Peach.


Changes
---

Addressed review comments.


Bugs: MESOS-9007
https://issues.apache.org/jira/browse/MESOS-9007


Repository: mesos


Description (updated)
---

Currently upon container destruction its project ID is unallocated by
the isolator and removed from the container work directory. However due
to API limitations we can't unset project IDs on symlinks that may exist
inside the directory. Because of that the project may still exist until
the container directory is garbage collected. If the project ID is
reused for a new container, any lingering symlinks that still have that
project ID will contribute to disk usage of the new container. Typically
symlinks don't take much space, but still this leads to inaccuracy in
disk space usage accounting.

This patch postpones project ID reclaiming until sandbox GC time. The
isolator periodically checks if sandboxes of terminated containers still
exist and deallocates project IDs of the ones that were removed. Checks
are performed every "disk_watch_interval". This mechanism can be
improved in the future if we introduce a way for the isolators to learn
about disk GCs.

Current number of available project IDs can be tracked with the new
"containerizer/mesos/disk/project_ids_free" and
"containerizer/mesos/disk/project_ids_total" metrics.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
0891f7709aa4f98758a727856d58e6177d46adca 
  src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
25f52a43b34b141bdaf7c448817423cf4264e22a 
  src/tests/containerizer/xfs_quota_tests.cpp 
dc18a8a59d1eb7fae3592ef6ba8c046e4f46ee4a 


Diff: https://reviews.apache.org/r/67914/diff/4/

Changes: https://reviews.apache.org/r/67914/diff/3-4/


Testing
---

Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that project 
ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.


Thanks,

Ilya Pronin



Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-23 Thread James Peach


> On July 23, 2018, 10:22 p.m., James Peach wrote:
> > src/slave/containerizer/mesos/isolators/xfs/disk.hpp
> > Lines 24 (patched)
> > 
> >
> > Let's use a pull gauge here for sampling performance reasons.
> > 
> > We can decrement it in `recover` and `nextProjectId`, and increment it 
> > in `returnProjectId`.

*push gauge* :)


- James


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/#review206351
---


On July 19, 2018, 11:12 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67914/
> ---
> 
> (Updated July 19, 2018, 11:12 p.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-9007
> https://issues.apache.org/jira/browse/MESOS-9007
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently upon container destruction its project ID is unallocated by
> the isolator and removed from the container work directory. However due
> to API limitations we can't unset project IDs on symlinks that may exist
> inside the directory. Because of that the project may still exist until
> the container directory is garbage collected. If the project ID is
> reused for a new container, any lingering symlinks that still have that
> project ID will contribute to disk usage of the new container. Typically
> symlinks don't take much space, but still this leads to inaccuracy in
> disk space usage accounting.
> 
> This patch postpones project ID reclaiming until sandbox GC time. The
> isolator periodically checks if sandboxes of terminated containers still
> exist and deallocates project IDs of the ones that were removed. This
> mechanism can be improved in the future if we introduce a way for the
> isolators to learn about disk GCs.
> 
> Current number of available project IDs can be tracked with the new
> "containerizer/mesos/disk/project_ids_free" metric.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
> 0891f7709aa4f98758a727856d58e6177d46adca 
>   src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
> 25f52a43b34b141bdaf7c448817423cf4264e22a 
>   src/slave/flags.hpp eeb9708f9ec76d83b6719541f4a012544c7c0cbe 
>   src/slave/flags.cpp 58cdc0f1100fe244e5bf1036e1ccf39478d5d478 
>   src/tests/containerizer/xfs_quota_tests.cpp 
> dc18a8a59d1eb7fae3592ef6ba8c046e4f46ee4a 
> 
> 
> Diff: https://reviews.apache.org/r/67914/diff/3/
> 
> 
> Testing
> ---
> 
> Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that 
> project ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>



Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-23 Thread James Peach


> On July 18, 2018, 8:51 p.m., James Peach wrote:
> > src/slave/containerizer/mesos/isolators/xfs/disk.cpp
> > Lines 549 (patched)
> > 
> >
> > If possible, I think this is simple enough to inline into the check loop
> 
> Ilya Pronin wrote:
> Definitely possible, but I would prefer to keep that logical separation 
> unless you feel strongly about that :) `loop()` schedules the periodic check 
> and `checkProjectIdUsage()` actually performs that check.

Yeh I live with that :)


> On July 18, 2018, 8:51 p.m., James Peach wrote:
> > src/slave/flags.cpp
> > Lines 1338 (patched)
> > 
> >
> > Rather than introducing a new flag, lets use 
> > `container_disk_watch_interval` or `disk_watch_interval` ... probably the 
> > former?
> 
> Ilya Pronin wrote:
> I think if we do this then we should rather use the latter because disk 
> GC is driven by `disk_watch_interval` and `gc_delay` (maybe use which one of 
> those 2 is the smallest?). `container_disk_watch_interval` will most likely 
> be set to a smaller value to promptly terminate containers that have breached 
> their quota (that's true in our cluster). Running sandboxes check at that 
> frequently would be wasteful.

OK, `disk_watch_interval` works for me too.


- James


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/#review206203
---


On July 19, 2018, 11:12 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67914/
> ---
> 
> (Updated July 19, 2018, 11:12 p.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-9007
> https://issues.apache.org/jira/browse/MESOS-9007
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently upon container destruction its project ID is unallocated by
> the isolator and removed from the container work directory. However due
> to API limitations we can't unset project IDs on symlinks that may exist
> inside the directory. Because of that the project may still exist until
> the container directory is garbage collected. If the project ID is
> reused for a new container, any lingering symlinks that still have that
> project ID will contribute to disk usage of the new container. Typically
> symlinks don't take much space, but still this leads to inaccuracy in
> disk space usage accounting.
> 
> This patch postpones project ID reclaiming until sandbox GC time. The
> isolator periodically checks if sandboxes of terminated containers still
> exist and deallocates project IDs of the ones that were removed. This
> mechanism can be improved in the future if we introduce a way for the
> isolators to learn about disk GCs.
> 
> Current number of available project IDs can be tracked with the new
> "containerizer/mesos/disk/project_ids_free" metric.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
> 0891f7709aa4f98758a727856d58e6177d46adca 
>   src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
> 25f52a43b34b141bdaf7c448817423cf4264e22a 
>   src/slave/flags.hpp eeb9708f9ec76d83b6719541f4a012544c7c0cbe 
>   src/slave/flags.cpp 58cdc0f1100fe244e5bf1036e1ccf39478d5d478 
>   src/tests/containerizer/xfs_quota_tests.cpp 
> dc18a8a59d1eb7fae3592ef6ba8c046e4f46ee4a 
> 
> 
> Diff: https://reviews.apache.org/r/67914/diff/3/
> 
> 
> Testing
> ---
> 
> Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that 
> project ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>



Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-23 Thread James Peach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/#review206351
---




src/slave/containerizer/mesos/isolators/xfs/disk.hpp
Lines 24 (patched)


Let's use a pull gauge here for sampling performance reasons.

We can decrement it in `recover` and `nextProjectId`, and increment it in 
`returnProjectId`.



src/slave/containerizer/mesos/isolators/xfs/disk.hpp
Lines 124 (patched)


Shall we add a `project_ids_total` as well?


- James Peach


On July 19, 2018, 11:12 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67914/
> ---
> 
> (Updated July 19, 2018, 11:12 p.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-9007
> https://issues.apache.org/jira/browse/MESOS-9007
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently upon container destruction its project ID is unallocated by
> the isolator and removed from the container work directory. However due
> to API limitations we can't unset project IDs on symlinks that may exist
> inside the directory. Because of that the project may still exist until
> the container directory is garbage collected. If the project ID is
> reused for a new container, any lingering symlinks that still have that
> project ID will contribute to disk usage of the new container. Typically
> symlinks don't take much space, but still this leads to inaccuracy in
> disk space usage accounting.
> 
> This patch postpones project ID reclaiming until sandbox GC time. The
> isolator periodically checks if sandboxes of terminated containers still
> exist and deallocates project IDs of the ones that were removed. This
> mechanism can be improved in the future if we introduce a way for the
> isolators to learn about disk GCs.
> 
> Current number of available project IDs can be tracked with the new
> "containerizer/mesos/disk/project_ids_free" metric.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
> 0891f7709aa4f98758a727856d58e6177d46adca 
>   src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
> 25f52a43b34b141bdaf7c448817423cf4264e22a 
>   src/slave/flags.hpp eeb9708f9ec76d83b6719541f4a012544c7c0cbe 
>   src/slave/flags.cpp 58cdc0f1100fe244e5bf1036e1ccf39478d5d478 
>   src/tests/containerizer/xfs_quota_tests.cpp 
> dc18a8a59d1eb7fae3592ef6ba8c046e4f46ee4a 
> 
> 
> Diff: https://reviews.apache.org/r/67914/diff/3/
> 
> 
> Testing
> ---
> 
> Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that 
> project ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>



Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-20 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/#review206308
---



Patch looks great!

Reviews applied: [67915, 67914]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On July 19, 2018, 11:12 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67914/
> ---
> 
> (Updated July 19, 2018, 11:12 p.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-9007
> https://issues.apache.org/jira/browse/MESOS-9007
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently upon container destruction its project ID is unallocated by
> the isolator and removed from the container work directory. However due
> to API limitations we can't unset project IDs on symlinks that may exist
> inside the directory. Because of that the project may still exist until
> the container directory is garbage collected. If the project ID is
> reused for a new container, any lingering symlinks that still have that
> project ID will contribute to disk usage of the new container. Typically
> symlinks don't take much space, but still this leads to inaccuracy in
> disk space usage accounting.
> 
> This patch postpones project ID reclaiming until sandbox GC time. The
> isolator periodically checks if sandboxes of terminated containers still
> exist and deallocates project IDs of the ones that were removed. This
> mechanism can be improved in the future if we introduce a way for the
> isolators to learn about disk GCs.
> 
> Current number of available project IDs can be tracked with the new
> "containerizer/mesos/disk/project_ids_free" metric.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
> 0891f7709aa4f98758a727856d58e6177d46adca 
>   src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
> 25f52a43b34b141bdaf7c448817423cf4264e22a 
>   src/slave/flags.hpp eeb9708f9ec76d83b6719541f4a012544c7c0cbe 
>   src/slave/flags.cpp 58cdc0f1100fe244e5bf1036e1ccf39478d5d478 
>   src/tests/containerizer/xfs_quota_tests.cpp 
> dc18a8a59d1eb7fae3592ef6ba8c046e4f46ee4a 
> 
> 
> Diff: https://reviews.apache.org/r/67914/diff/3/
> 
> 
> Testing
> ---
> 
> Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that 
> project ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>



Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-19 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/#review206255
---



PASS: Mesos patch 67914 was successfully built and tested.

Reviews applied: `['67914']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/1958/mesos-review-67914

- Mesos Reviewbot Windows


On July 19, 2018, 11:12 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67914/
> ---
> 
> (Updated July 19, 2018, 11:12 p.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-9007
> https://issues.apache.org/jira/browse/MESOS-9007
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently upon container destruction its project ID is unallocated by
> the isolator and removed from the container work directory. However due
> to API limitations we can't unset project IDs on symlinks that may exist
> inside the directory. Because of that the project may still exist until
> the container directory is garbage collected. If the project ID is
> reused for a new container, any lingering symlinks that still have that
> project ID will contribute to disk usage of the new container. Typically
> symlinks don't take much space, but still this leads to inaccuracy in
> disk space usage accounting.
> 
> This patch postpones project ID reclaiming until sandbox GC time. The
> isolator periodically checks if sandboxes of terminated containers still
> exist and deallocates project IDs of the ones that were removed. This
> mechanism can be improved in the future if we introduce a way for the
> isolators to learn about disk GCs.
> 
> Current number of available project IDs can be tracked with the new
> "containerizer/mesos/disk/project_ids_free" metric.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
> 0891f7709aa4f98758a727856d58e6177d46adca 
>   src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
> 25f52a43b34b141bdaf7c448817423cf4264e22a 
>   src/slave/flags.hpp eeb9708f9ec76d83b6719541f4a012544c7c0cbe 
>   src/slave/flags.cpp 58cdc0f1100fe244e5bf1036e1ccf39478d5d478 
>   src/tests/containerizer/xfs_quota_tests.cpp 
> dc18a8a59d1eb7fae3592ef6ba8c046e4f46ee4a 
> 
> 
> Diff: https://reviews.apache.org/r/67914/diff/3/
> 
> 
> Testing
> ---
> 
> Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that 
> project ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>



Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-19 Thread Ilya Pronin


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/slave/containerizer/mesos/isolators/xfs/disk.cpp
> > Line 495 (original), 504 (patched)
> > 
> >
> > I'm a little concerned that we could exhaust the project IDs (we 
> > default to 5000 IDs). What do you think about adding a metric for the 
> > number of available project IDs?

Good idea. Added `containerizer/mesos/disk/project_ids_free` metric.


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/slave/containerizer/mesos/isolators/xfs/disk.cpp
> > Lines 547 (patched)
> > 
> >
> > Let's make this message symmetric with the one we emit when we assign:
> > 
> > ```
> > LOG(INFO) << "Reclaimed project " << stringify(projectId.get()) << " 
> > from '"
> >   << containerConfig.directory() << "'";
> > ```

Fixed.


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/slave/containerizer/mesos/isolators/xfs/disk.cpp
> > Lines 549 (patched)
> > 
> >
> > If possible, I think this is simple enough to inline into the check loop

Definitely possible, but I would prefer to keep that logical separation unless 
you feel strongly about that :) `loop()` schedules the periodic check and 
`checkProjectIdUsage()` actually performs that check.


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/slave/flags.cpp
> > Lines 1338 (patched)
> > 
> >
> > Rather than introducing a new flag, lets use 
> > `container_disk_watch_interval` or `disk_watch_interval` ... probably the 
> > former?

I think if we do this then we should rather use the latter because disk GC is 
driven by `disk_watch_interval` and `gc_delay` (maybe use which one of those 2 
is the smallest?). `container_disk_watch_interval` will most likely be set to a 
smaller value to promptly terminate containers that have breached their quota 
(that's true in our cluster). Running sandboxes check at that frequently would 
be wasteful.


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/tests/containerizer/xfs_quota_tests.cpp
> > Lines 1197 (patched)
> > 
> >
> > For consistency with other tests, could we call this 
> > `ProjectIdReclaiming`?
> > 
> > Can you please add a short comment explaining the purpose of the test?

Done.


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/tests/containerizer/xfs_quota_tests.cpp
> > Lines 1319 (patched)
> > 
> >
> > Is this re-using the futures an OK thing to do? If they are already 
> > ready from the previous task launch, won't they still be ready when we 
> > await them here?

Yes, it seems to be safe. The `Future` is reset in `FutureArg()` during 
expectations setup: 
https://github.com/apache/mesos/blob/9147283171d761a4d38710f24ba654f8a96e325c/3rdparty/libprocess/include/process/gmock.hpp#L82


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/tests/containerizer/xfs_quota_tests.cpp
> > Lines 1330 (patched)
> > 
> >
> > This can be `EXPECT_EQ`.

Fixed.


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/tests/containerizer/xfs_quota_tests.cpp
> > Lines 1332 (patched)
> > 
> >
> > Is the link the `latest` link? Can you add an explanatory comment?

Done.


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/tests/containerizer/xfs_quota_tests.cpp
> > Lines 1333 (patched)
> > 
> >
> > This can me `EXPECT_SOME_EQ`.

Fixed.


- Ilya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/#review206203
---


On July 19, 2018, 4:12 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67914/
> ---
> 
> (Updated July 19, 2018, 4:12 p.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-9007
> https://issues.apache.org/jira/browse/MESOS-9007
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently upon container destruction its project ID is unallocated by
> the isolator and removed from the container work directory. However due
> to API limitations we can't unset project IDs on symlinks that may 

Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-19 Thread Ilya Pronin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/
---

(Updated July 19, 2018, 4:12 p.m.)


Review request for mesos and James Peach.


Changes
---

Addressed review comments.


Bugs: MESOS-9007
https://issues.apache.org/jira/browse/MESOS-9007


Repository: mesos


Description (updated)
---

Currently upon container destruction its project ID is unallocated by
the isolator and removed from the container work directory. However due
to API limitations we can't unset project IDs on symlinks that may exist
inside the directory. Because of that the project may still exist until
the container directory is garbage collected. If the project ID is
reused for a new container, any lingering symlinks that still have that
project ID will contribute to disk usage of the new container. Typically
symlinks don't take much space, but still this leads to inaccuracy in
disk space usage accounting.

This patch postpones project ID reclaiming until sandbox GC time. The
isolator periodically checks if sandboxes of terminated containers still
exist and deallocates project IDs of the ones that were removed. This
mechanism can be improved in the future if we introduce a way for the
isolators to learn about disk GCs.

Current number of available project IDs can be tracked with the new
"containerizer/mesos/disk/project_ids_free" metric.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
0891f7709aa4f98758a727856d58e6177d46adca 
  src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
25f52a43b34b141bdaf7c448817423cf4264e22a 
  src/slave/flags.hpp eeb9708f9ec76d83b6719541f4a012544c7c0cbe 
  src/slave/flags.cpp 58cdc0f1100fe244e5bf1036e1ccf39478d5d478 
  src/tests/containerizer/xfs_quota_tests.cpp 
dc18a8a59d1eb7fae3592ef6ba8c046e4f46ee4a 


Diff: https://reviews.apache.org/r/67914/diff/3/

Changes: https://reviews.apache.org/r/67914/diff/2-3/


Testing
---

Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that project 
ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.


Thanks,

Ilya Pronin



Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-18 Thread James Peach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/#review206203
---




src/slave/containerizer/mesos/isolators/xfs/disk.cpp
Line 495 (original), 504 (patched)


I'm a little concerned that we could exhaust the project IDs (we default to 
5000 IDs). What do you think about adding a metric for the number of available 
project IDs?



src/slave/containerizer/mesos/isolators/xfs/disk.cpp
Lines 547 (patched)


Let's make this message symmetric with the one we emit when we assign:

```
LOG(INFO) << "Reclaimed project " << stringify(projectId.get()) << " from '"
  << containerConfig.directory() << "'";
```



src/slave/containerizer/mesos/isolators/xfs/disk.cpp
Lines 549 (patched)


If possible, I think this is simple enough to inline into the check loop



src/slave/flags.cpp
Lines 1338 (patched)


Rather than introducing a new flag, lets use 
`container_disk_watch_interval` or `disk_watch_interval` ... probably the 
former?



src/tests/containerizer/xfs_quota_tests.cpp
Lines 1197 (patched)


For consistency with other tests, could we call this `ProjectIdReclaiming`?

Can you please add a short comment explaining the purpose of the test?



src/tests/containerizer/xfs_quota_tests.cpp
Lines 1319 (patched)


Is this re-using the futures an OK thing to do? If they are already ready 
from the previous task launch, won't they still be ready when we await them 
here?



src/tests/containerizer/xfs_quota_tests.cpp
Lines 1330 (patched)


This can be `EXPECT_EQ`.



src/tests/containerizer/xfs_quota_tests.cpp
Lines 1332 (patched)


Is the link the `latest` link? Can you add an explanatory comment?



src/tests/containerizer/xfs_quota_tests.cpp
Lines 1333 (patched)


This can me `EXPECT_SOME_EQ`.


- James Peach


On July 13, 2018, 9:59 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67914/
> ---
> 
> (Updated July 13, 2018, 9:59 p.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-9007
> https://issues.apache.org/jira/browse/MESOS-9007
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently upon container destruction its project ID is unallocated by
> the isolator and removed from the container work directory. However due
> to API limitations we can't unset project IDs on symlinks that may exist
> inside the directory. Because of that the project may still exist until
> the container directory is garbage collected. If the project ID is
> reused for a new container, any lingering symlinks that still have that
> project ID will contribute to disk usage of the new container. Typically
> symlinks don't take much space, but still this leads to inaccuracy in
> disk space usage accounting.
> 
> This patch postpones project ID reclaiming until sandbox GC time. The
> isolator periodically checks if sandboxes of terminated containers still
> exist and deallocates project IDs of the ones that were removed. This
> mechanism can be improved in the future if we introduce a way for the
> isolators to learn about disk GCs.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
> 0891f7709aa4f98758a727856d58e6177d46adca 
>   src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
> 25f52a43b34b141bdaf7c448817423cf4264e22a 
>   src/slave/flags.hpp eeb9708f9ec76d83b6719541f4a012544c7c0cbe 
>   src/slave/flags.cpp 58cdc0f1100fe244e5bf1036e1ccf39478d5d478 
>   src/tests/containerizer/xfs_quota_tests.cpp 
> bee0c2db7adc1dc7d774f9152931130093fe5b50 
> 
> 
> Diff: https://reviews.apache.org/r/67914/diff/2/
> 
> 
> Testing
> ---
> 
> Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that 
> project ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>



Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-13 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/#review206079
---



PASS: Mesos patch 67914 was successfully built and tested.

Reviews applied: `['67915', '67914']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/1922/mesos-review-67914

- Mesos Reviewbot Windows


On July 13, 2018, 9:59 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67914/
> ---
> 
> (Updated July 13, 2018, 9:59 p.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-9007
> https://issues.apache.org/jira/browse/MESOS-9007
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently upon container destruction its project ID is unallocated by
> the isolator and removed from the container work directory. However due
> to API limitations we can't unset project IDs on symlinks that may exist
> inside the directory. Because of that the project may still exist until
> the container directory is garbage collected. If the project ID is
> reused for a new container, any lingering symlinks that still have that
> project ID will contribute to disk usage of the new container. Typically
> symlinks don't take much space, but still this leads to inaccuracy in
> disk space usage accounting.
> 
> This patch postpones project ID reclaiming until sandbox GC time. The
> isolator periodically checks if sandboxes of terminated containers still
> exist and deallocates project IDs of the ones that were removed. This
> mechanism can be improved in the future if we introduce a way for the
> isolators to learn about disk GCs.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
> 0891f7709aa4f98758a727856d58e6177d46adca 
>   src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
> 25f52a43b34b141bdaf7c448817423cf4264e22a 
>   src/slave/flags.hpp eeb9708f9ec76d83b6719541f4a012544c7c0cbe 
>   src/slave/flags.cpp 58cdc0f1100fe244e5bf1036e1ccf39478d5d478 
>   src/tests/containerizer/xfs_quota_tests.cpp 
> bee0c2db7adc1dc7d774f9152931130093fe5b50 
> 
> 
> Diff: https://reviews.apache.org/r/67914/diff/2/
> 
> 
> Testing
> ---
> 
> Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that 
> project ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>



Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.

2018-07-13 Thread Ilya Pronin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/
---

(Updated July 13, 2018, 2:59 p.m.)


Review request for mesos and James Peach.


Bugs: MESOS-9007
https://issues.apache.org/jira/browse/MESOS-9007


Repository: mesos


Description (updated)
---

Currently upon container destruction its project ID is unallocated by
the isolator and removed from the container work directory. However due
to API limitations we can't unset project IDs on symlinks that may exist
inside the directory. Because of that the project may still exist until
the container directory is garbage collected. If the project ID is
reused for a new container, any lingering symlinks that still have that
project ID will contribute to disk usage of the new container. Typically
symlinks don't take much space, but still this leads to inaccuracy in
disk space usage accounting.

This patch postpones project ID reclaiming until sandbox GC time. The
isolator periodically checks if sandboxes of terminated containers still
exist and deallocates project IDs of the ones that were removed. This
mechanism can be improved in the future if we introduce a way for the
isolators to learn about disk GCs.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/xfs/disk.hpp 
0891f7709aa4f98758a727856d58e6177d46adca 
  src/slave/containerizer/mesos/isolators/xfs/disk.cpp 
25f52a43b34b141bdaf7c448817423cf4264e22a 
  src/slave/flags.hpp eeb9708f9ec76d83b6719541f4a012544c7c0cbe 
  src/slave/flags.cpp 58cdc0f1100fe244e5bf1036e1ccf39478d5d478 
  src/tests/containerizer/xfs_quota_tests.cpp 
bee0c2db7adc1dc7d774f9152931130093fe5b50 


Diff: https://reviews.apache.org/r/67914/diff/2/

Changes: https://reviews.apache.org/r/67914/diff/1-2/


Testing
---

Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that project 
ID is reclaimed and reused after sandbox GC. Ran `sudo make check`.


Thanks,

Ilya Pronin