[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem blkio statistics.

2017-09-21 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174787#comment-16174787
 ] 

Qian Zhang commented on MESOS-6162:
---

I did more tests for this performance issue with Mesos (rather than just 
manually tested it with {{dd}} in my previous post), I used {{mesos-execute}} 
to launch task to run {{dd}} like this:
{code}mesos-execute --master=192.168.1.6:5050 --name=test --command="dd 
if=/dev/zero of=test.bin bs=512 count=1000 oflag=dsync"{code}
And I found this performance issue will *always* happen as long as the 
combination {{ext4/ext3 with the data=ordered option}} + {{cfq IO scheduler}} 
is met *no matter `cgroups/blkio` isolation is enabled or not*, i.e., if that 
combination is met, the task will always take much longer to complete (~16s) 
than what the task will take (~1.2s) if that combination is not met regardless 
`cgroups/blkio` enabled or not.

So it seems this performance issue has nothing to do with `cgroups/blkio` since 
it will happen even `cgroups/blkio` is not enabled at all. However a weird 
issue I found is, if the process is assigned to the *root* blkio cgroup and 
even that combination is met, this performance issue will *not* happen:
{code}
# echo $$ > /sys/fs/cgroup/blkio/cgroup.procs 
# dd if=/dev/zero of=test.bin bs=512 count=1000 oflag=dsync 
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 1.19546 s, 428 kB/s<--- No 
performance issue.
{code}

So the conclusion is when the combination is met, 
# If the process is not assigned to any blkio cgroups (i.e., `cgroups/blio` 
isolation is not enabled), the performance issue will happen.
# If the process is assigned to a sub blkio cgroup (i.e., `cgroups/blio` 
isolation is enabled), the performance issue will happen.
# If the process is assigned to the root blkio cgroup, the performance issue 
will not happen.

I think 1 and 2 will happen in the Mesos context but not 3 since a container 
launched by Mesos will never be assigned to the root blkio cgroup. Originally I 
thought we should add a note for the performance issue in the doc of 
`cgroups/blkio`, but now I think that may not be the right place to mention 
such performance issue, instead we should add such note in the doc 
{{mesos-containerizer.md}} and {{persistent-volume.md}}.


> Add support for cgroups blkio subsystem blkio statistics.
> -
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups, containerization
>Reporter: haosdent
>Assignee: Jason Lai
>  Labels: cgroups, containerizer, mesosphere
> Fix For: 1.4.0
>
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem blkio statistics.

2017-08-27 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143301#comment-16143301
 ] 

Qian Zhang commented on MESOS-6162:
---

For the [performance issue|https://github.com/opencontainers/runc/issues/861] 
mentioned in the description of this ticket, after some experiments, I found it 
will happen only when the IO scheduler for the disk is set to {{cfq}} and the 
filesystem is {{ext4}}/{{ext3}} with the {{data=ordered}} option. 
{code}
# pwd
/mnt
# mount | grep mnt 
/dev/sdb on /mnt type ext4 (rw,relatime,data=ordered)
# cat /sys/block/sdb/queue/scheduler 
noop deadline [cfq] 
# echo $$ > /sys/fs/cgroup/blkio/cgroup.procs 
# dd if=/dev/zero of=test.bin bs=512 count=1000 oflag=dsync   
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 1.51425 s, 338 kB/s
# echo $$ >/sys/fs/cgroup/blkio/test/cgroup.procs
# dd if=/dev/zero of=test.bin bs=512 count=1000 oflag=dsync   
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 16.0301 s, 31.9 kB/s  <--- Performance 
degradation when we put the process into "test" blkio cgroup
{code}

If we change the IO scheduler to {{deadline}} (see [this 
doc|https://www.kernel.org/doc/Documentation/block/switching-sched.txt] for 
more info about switching IO scheduler, and see [CFS 
scheduler|https://www.kernel.org/doc/Documentation/block/cfq-iosched.txt] and 
[deadline 
scheduler|https://www.kernel.org/doc/Documentation/block/deadline-iosched.txt] 
for more info about CFQ and deadline IO scheduler) , we will not have this 
performance issue.
{code}
# echo deadline > /sys/block/sdb/queue/scheduler  
# cat /sys/block/sdb/queue/scheduler
noop [deadline] cfq 
# echo $$ > /sys/fs/cgroup/blkio/cgroup.procs
root@workstation:/mnt# dd if=/dev/zero of=test.bin bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 1.21094 s, 423 kB/s
# echo $$ > /sys/fs/cgroup/blkio/test/cgroup.procs
# dd if=/dev/zero of=test.bin bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 1.19367 s, 429 kB/s  <--- No performance 
degradation
{code}

And I also tested if the disk is formatted as other filesystems (e.g., {{xfs}}, 
{{btrfs}}) or the disk is mounted without the {{data=ordered}} option (it is 
enabled by default for {{ext4}} and {{ext3}}, we can disable it by specifying a 
different option when mounting the disk, e.g., {{data=journal}}), we will not 
have this performance issue. See [this 
doc|https://www.ibm.com/developerworks/library/l-fs8/index.html] for the 
difference between {{data=ordered}} and {{data=journal}}.
{quote}
Theoretically, data=journal mode is the slowest journaling mode of all, since 
data gets written to disk twice rather than once. However, it turns out that in 
certain situations, data=journal mode can be blazingly fast.
{quote}

It seems only SUSE has this performance issue since it by default has the 
disk's IO scheduler set to {{cfq}} and the filesystem is {{ext4}} with the 
{{data=ordered}} option. I tested other distros (CoreOS, CentOS 7.2 and Ubuntu 
16.04), they do not have that issue since some of them have the disk's IO 
scheduler set to {{deadline}} by default (Ubuntu 16.04), and some of them have 
the disk formatted as {{xfs}} by default (CentOS 7.2).

So I think this should not be a general performance issue since most of the 
distros have not such issue, and this issue can be fixed on the fly by 
switching IO scheduler to {{deadline}}. But in future, when we support blkio 
control functionalities 
([MESOS-7843|https://issues.apache.org/jira/browse/MESOS-7843]), setting IO 
scheduler to {{deadline}} will be a problem because blkio control 
functionalities needs IO scheduler set to {{CFQ}}, if it is set to 
{{deadline}}, all the {{blkio.weight}}, {{blkio.weight_device}} and 
{{blkio.leaf_weight\[_device\]}} proportional weight policy files will NOT take 
effect.

> Add support for cgroups blkio subsystem blkio statistics.
> -
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups, containerization
>Reporter: haosdent
>Assignee: Jason Lai
>  Labels: cgroups, containerizer, mesosphere
> Fix For: 1.4.0
>
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem blkio statistics.

2017-07-31 Thread Jason Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108168#comment-16108168
 ] 

Jason Lai commented on MESOS-6162:
--

Thanks a lot for shepherding my changes while I was away for vacation, 
[~gilbert]! And no worries! Glad to get my code committed to upstream, 
regardless of the committer name. We still have a lot of tasks to collaborate 
upon onwards :)

> Add support for cgroups blkio subsystem blkio statistics.
> -
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups, containerization
>Reporter: haosdent
>Assignee: Jason Lai
>  Labels: cgroups, containerizer, mesosphere
> Fix For: 1.4.0
>
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem blkio statistics.

2017-07-31 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108152#comment-16108152
 ] 

Gilbert Song commented on MESOS-6162:
-

[~jasonlai], sorry I forgot to update the commits to be under your name. I made 
some changes and it was totally a mistake. Apologize for that.

/cc [~ctrlhxj] [~zhitao]

> Add support for cgroups blkio subsystem blkio statistics.
> -
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups, containerization
>Reporter: haosdent
>Assignee: Jason Lai
>  Labels: cgroups, containerizer, mesosphere
> Fix For: 1.4.0
>
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem

2017-06-09 Thread Jason Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045210#comment-16045210
 ] 

Jason Lai commented on MESOS-6162:
--

I had a long time diff that I didn't get to submit yet. Now rebased to the 
master and squashed into one commit at: https://reviews.apache.org/r/59960/ 
[~gilbert] [~jieyu]

> Add support for cgroups blkio subsystem
> ---
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>Assignee: Jason Lai
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem

2017-06-09 Thread Jason Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044040#comment-16044040
 ] 

Jason Lai commented on MESOS-6162:
--

[~haoyixin] Hi! We didn't get to prioritize this as the diff was pending for 
review. But we'll resurrect this task, for the sake of incoming demands on 
this. Will keep you updated as we progress

> Add support for cgroups blkio subsystem
> ---
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>Assignee: Jason Lai
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem

2017-06-09 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044032#comment-16044032
 ] 

Gilbert Song commented on MESOS-6162:
-

[~haoyixin], sorry for the delay. I chatted with Jason. He already has a local 
implementation. Considering the fact that a couple companies are interested in 
this feature. We will try to ship it by the end of next week. I will shepherd.

> Add support for cgroups blkio subsystem
> ---
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>Assignee: Jason Lai
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem

2017-06-08 Thread Hao Yixin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043980#comment-16043980
 ] 

Hao Yixin commented on MESOS-6162:
--

[~gilbert]
Hi, Gilbert, we've met at qihoo 360 and talked about this.
And, how does this going? I found it dead in the water for months.

> Add support for cgroups blkio subsystem
> ---
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>Assignee: Jason Lai
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem

2016-11-09 Thread Jason Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651768#comment-15651768
 ] 

Jason Lai commented on MESOS-6162:
--

Thanks Haosdent! I also asked to Zhitao to temporarily hold this task for me 
before I can assign it to myself.

> Add support for cgroups blkio subsystem
> ---
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>Assignee: Zhitao Li
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem

2016-10-28 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617272#comment-15617272
 ] 

haosdent commented on MESOS-6162:
-

No problem, thank you!

> Add support for cgroups blkio subsystem
> ---
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem

2016-10-28 Thread Jason Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616497#comment-15616497
 ] 

Jason Lai commented on MESOS-6162:
--

Hi haosdent! Not sure if you're working on this ticket, at Uber we have been 
collecting blkio stats for Docker containers, and I would like to take this 
task as an effort to maintain feature parity with our Docker containers. Are 
you okay with that?

> Add support for cgroups blkio subsystem
> ---
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)