----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29688/#review67244 -----------------------------------------------------------
src/slave/containerizer/isolators/disk_quota.cpp <https://reviews.apache.org/r/29688/#comment111222> This is pretty aggressive and is likely to hit the kernel hard for traversing large directories... Note how high the cpu usage is, particulary system time. FS structures will be cached (bad for the page cache) but the traversal remains costly. [root@hostname tmp]# time du -k -s /usr 2703744 /usr real 0m0.256s user 0m0.049s sys 0m0.203s [root@hostname tmp]# time du -k -s /usr 2703744 /usr real 0m0.253s user 0m0.050s sys 0m0.199s [root@hostname tmp]# time du -k -s /usr 2703744 /usr real 0m0.252s user 0m0.060s sys 0m0.188s [root@hostname tmp]# time du -k -s /usr 2703744 /usr real 0m0.266s user 0m0.061s sys 0m0.196s src/slave/containerizer/isolators/disk_quota.cpp <https://reviews.apache.org/r/29688/#comment111187> Wrap in CHECK_NOTNULL? src/slave/containerizer/isolators/disk_quota.cpp <https://reviews.apache.org/r/29688/#comment111191> ditto src/slave/containerizer/isolators/disk_quota.cpp <https://reviews.apache.org/r/29688/#comment111195> This is true in the current Mesos containerizer code but this behavior could be changed to be configurable so it should not be assumed in any isolator. src/slave/containerizer/isolators/disk_quota.cpp <https://reviews.apache.org/r/29688/#comment111208> As above, the isolator should not assume the containerizer will kill the container after the Limitation so it should wait until cleanup() before stopping everything. src/slave/containerizer/isolators/disk_quota.cpp <https://reviews.apache.org/r/29688/#comment111249> Can you comment somewhere that the `du` processes are run in the slave's cgroup and it will be that cgroup that is charged for (a) memory to cache the fs data structures, (b) disk io to read those structures, and (c) the cpu time to traverse. src/slave/containerizer/isolators/disk_quota.cpp <https://reviews.apache.org/r/29688/#comment111210> wrap in a CHECK_NOTNULL? src/slave/containerizer/isolators/disk_quota.cpp <https://reviews.apache.org/r/29688/#comment111247> So this means that the time between checks for a given container is dependent on other containers' filesystem usage? [root@hostname du_test]# for N in {1..100000}; do touch $N; done [root@hostname du_test]# time du -k -s 2072 . real 0m0.301s user 0m0.079s sys 0m0.220s [root@hostname du_test]# time du -k -s 2072 . real 0m0.295s user 0m0.077s sys 0m0.216s src/slave/containerizer/isolators/disk_quota.cpp <https://reviews.apache.org/r/29688/#comment111250> more likely is a failure to exec? src/slave/containerizer/isolators/disk_quota.cpp <https://reviews.apache.org/r/29688/#comment111231> What errors do you expect that are transient and imply we can retry? Can we verify correct functionality of du at initialization and then not reschedule on unexpected failure? src/slave/containerizer/isolators/disk_quota.cpp <https://reviews.apache.org/r/29688/#comment111232> ditto src/slave/containerizer/isolators/disk_quota.cpp <https://reviews.apache.org/r/29688/#comment111233> ditto - Ian Downes On Jan. 7, 2015, 4:45 p.m., Jie Yu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/29688/ > ----------------------------------------------------------- > > (Updated Jan. 7, 2015, 4:45 p.m.) > > > Review request for mesos, Benjamin Hindman, Ben Mahler, and Ian Downes. > > > Bugs: MESOS-1588 > https://issues.apache.org/jira/browse/MESOS-1588 > > > Repository: mesos-git > > > Description > ------- > > Added DiskQuotaIsolator to enforce disk quota. I created a DiskUsageChecker > to check disk usage by calling 'du'. The DiskUsageChecker is throttled (see > comments). The isolator uses DiskUsageChecker to enforce disk quota. > > > Diffs > ----- > > src/Makefile.am 0521f5849acc3237a8fa3970c983beab74441586 > src/slave/containerizer/isolators/disk_quota.hpp PRE-CREATION > src/slave/containerizer/isolators/disk_quota.cpp PRE-CREATION > src/slave/containerizer/mesos/containerizer.cpp > 5c014ebe360b9527b3edd505d47e57a4d5ce5c52 > > Diff: https://reviews.apache.org/r/29688/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Jie Yu > >