[ https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143301#comment-16143301 ]
Qian Zhang commented on MESOS-6162: ----------------------------------- For the [performance issue|https://github.com/opencontainers/runc/issues/861] mentioned in the description of this ticket, after some experiments, I found it will happen only when the IO scheduler for the disk is set to {{cfq}} and the filesystem is {{ext4}}/{{ext3}} with the {{data=ordered}} option. {code} # pwd /mnt # mount | grep mnt /dev/sdb on /mnt type ext4 (rw,relatime,data=ordered) # cat /sys/block/sdb/queue/scheduler noop deadline [cfq] # echo $$ > /sys/fs/cgroup/blkio/cgroup.procs # dd if=/dev/zero of=test.bin bs=512 count=1000 oflag=dsync 1000+0 records in 1000+0 records out 512000 bytes (512 kB, 500 KiB) copied, 1.51425 s, 338 kB/s # echo $$ >/sys/fs/cgroup/blkio/test/cgroup.procs # dd if=/dev/zero of=test.bin bs=512 count=1000 oflag=dsync 1000+0 records in 1000+0 records out 512000 bytes (512 kB, 500 KiB) copied, 16.0301 s, 31.9 kB/s <--- Performance degradation when we put the process into "test" blkio cgroup {code} If we change the IO scheduler to {{deadline}} (see [this doc|https://www.kernel.org/doc/Documentation/block/switching-sched.txt] for more info about switching IO scheduler, and see [CFS scheduler|https://www.kernel.org/doc/Documentation/block/cfq-iosched.txt] and [deadline scheduler|https://www.kernel.org/doc/Documentation/block/deadline-iosched.txt] for more info about CFQ and deadline IO scheduler) , we will not have this performance issue. {code} # echo deadline > /sys/block/sdb/queue/scheduler # cat /sys/block/sdb/queue/scheduler noop [deadline] cfq # echo $$ > /sys/fs/cgroup/blkio/cgroup.procs root@workstation:/mnt# dd if=/dev/zero of=test.bin bs=512 count=1000 oflag=dsync 1000+0 records in 1000+0 records out 512000 bytes (512 kB, 500 KiB) copied, 1.21094 s, 423 kB/s # echo $$ > /sys/fs/cgroup/blkio/test/cgroup.procs # dd if=/dev/zero of=test.bin bs=512 count=1000 oflag=dsync 1000+0 records in 1000+0 records out 512000 bytes (512 kB, 500 KiB) copied, 1.19367 s, 429 kB/s <--- No performance degradation {code} And I also tested if the disk is formatted as other filesystems (e.g., {{xfs}}, {{btrfs}}) or the disk is mounted without the {{data=ordered}} option (it is enabled by default for {{ext4}} and {{ext3}}, we can disable it by specifying a different option when mounting the disk, e.g., {{data=journal}}), we will not have this performance issue. See [this doc|https://www.ibm.com/developerworks/library/l-fs8/index.html] for the difference between {{data=ordered}} and {{data=journal}}. {quote} Theoretically, data=journal mode is the slowest journaling mode of all, since data gets written to disk twice rather than once. However, it turns out that in certain situations, data=journal mode can be blazingly fast. {quote} It seems only SUSE has this performance issue since it by default has the disk's IO scheduler set to {{cfq}} and the filesystem is {{ext4}} with the {{data=ordered}} option. I tested other distros (CoreOS, CentOS 7.2 and Ubuntu 16.04), they do not have that issue since some of them have the disk's IO scheduler set to {{deadline}} by default (Ubuntu 16.04), and some of them have the disk formatted as {{xfs}} by default (CentOS 7.2). So I think this should not be a general performance issue since most of the distros have not such issue, and this issue can be fixed on the fly by switching IO scheduler to {{deadline}}. But in future, when we support blkio control functionalities ([MESOS-7843|https://issues.apache.org/jira/browse/MESOS-7843]), setting IO scheduler to {{deadline}} will be a problem because blkio control functionalities needs IO scheduler set to {{CFQ}}, if it is set to {{deadline}}, all the {{blkio.weight}}, {{blkio.weight_device}} and {{blkio.leaf_weight\[_device\]}} proportional weight policy files will NOT take effect. > Add support for cgroups blkio subsystem blkio statistics. > --------------------------------------------------------- > > Key: MESOS-6162 > URL: https://issues.apache.org/jira/browse/MESOS-6162 > Project: Mesos > Issue Type: Task > Components: cgroups, containerization > Reporter: haosdent > Assignee: Jason Lai > Labels: cgroups, containerizer, mesosphere > Fix For: 1.4.0 > > > Noted that cgroups blkio subsystem may have performance issue, refer to > https://github.com/opencontainers/runc/issues/861 -- This message was sent by Atlassian JIRA (v6.4.14#64029)