Re: [ceph-users] Ceph cluster performance degrade (radosgw) > after running some time

2014-02-10 Thread Guang Yang
Thanks all for the help.

We finally identified the root cause of the issue was due to a lock contention 
happening at folder splitting and here is a tracking ticket (thanks Inktank for 
the fix!): http://tracker.ceph.com/issues/7207

Thanks,
Guang


On Tuesday, December 31, 2013 8:22 AM, Guang Yang  wrote:
 
Thanks Wido, my comments inline...

>Date: Mon, 30 Dec 2013 14:04:35 +0100
>From: Wido den Hollander 
>To: ceph-users@lists.ceph.com
>Subject: Re: [ceph-users] Ceph cluster performance degrade (radosgw)
>    after running some time

>On 12/30/2013 12:45 PM, Guang wrote:
> Hi ceph-users and ceph-devel,
> Merry Christmas and Happy New Year!
>
> We have a ceph cluster with radosgw, our customer is using S3 API to
> access the cluster.
>
> The basic information of the cluster is:
> bash-4.1$ ceph -s
>    cluster b9cb3ea9-e1de-48b4-9e86-6921e2c537d2
>    health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>    monmap e1: 3 mons at
> {osd151=10.194.0.68:6789/0,osd152=10.193.207.130:6789/0,osd153=10.193.207.131:6789/0},
> election epoch 40, quorum 0,1,2 osd151,osd152,osd153
>    osdmap e129885: 787 osds: 758 up, 758 in
>      pgmap v1884502: 22203 pgs: 22125 active+clean, 1
> active+clean+scrubbing, 1 active+clean+inconsistent, 76
> active+clean+scrubbing+deep; 96319 GB data, 302 TB used, 762 TB / 1065
> TB avail
>    mdsmap e1: 0/0/1 up
>
> #When the latency peak happened, there was no scrubbing, recovering or
> backfilling at the moment.#
>
> While the performance of the cluster (only with WRITE traffic) is stable
> until Dec 25th, our monitoring (for radosgw access log) shows a
> significant increase of average latency and 99% latency.
>
> And then I chose one OSD and try to grep slow requests logs and find
> that most of the slow requests were waiting for subop, I take osd22 for
> example.
>
> osd[561-571] are hosted by osd22.
> -bash-4.1$ for i in {561..571}; do grep "slow request" ceph-osd.$i.log |
> grep "2013-12-25 16"| grep osd_op | grep -oP "\d+,\d+" ; done >
> ~/slow_osd.txt
> -bash-4.1$ cat ~/slow_osd.txt  | sort | uniq -c | sort ?nr
>    3586 656,598
>      289 467,629
>      284 598,763
>      279 584,598
>      203 172,598
>      182 598,6
>      155 629,646
>      83 631,598
>      65 631,593
>      21 616,629
>      20 609,671
>      20 609,390
>      13 609,254
>      12 702,629
>      12 629,641
>      11 665,613
>      11 593,724
>      11 361,591
>      10 591,709
>        9 681,609
>        9 609,595
>        9 591,772
>        8 613,662
>        8 575,591
>        7 674,722
>        7 609,603
>        6 585,605
>        5 613,691
>        5 293,629
>        4 774,591
>        4 717,591
>        4 613,776
>        4 538,629
>        4 485,629
>        3 702,641
>        3 608,629
>        3 593,580
>        3 591,676
>
> It turns out most of the slow requests were waiting for osd 598, 629, I
> ran the procedure on another host osd22 and got the same pattern.
>
> Then I turned to the host having osd598 and dump the perf counter to do
> comparision.
>
> -bash-4.1$ for i in {594..604}; do sudo ceph --admin-daemon
> /var/run/ceph/ceph-osd.$i.asok perf dump | ~/do_calc_op_latency.pl; done
> op_latency,subop_latency,total_ops
> 0.192097526753471,0.0344513450167198,7549045
> 1.99137797628122,1.42198426157216,9184472
> 0.198062399664129,0.0387090378926376,6305973
> 0.621697271315762,0.396549768986993,9726679
> 29.5222496247375,18.246379615, 10860858
> 0.229250239525916,0.0557482067611005,8149691
> 0.208981698303654,0.0375553180438224,6623842
> 0.47474766302086,0.292583928601509,9838777
> 0.339477790083925,0.101288409388438,9340212
> 0.186448840141895,0.0327296517417626,7081410
> 0.807598201207144,0.0139762289702332,6093531
> (osd 598 is op hotspot as well)
>
> This double confirmed that osd 598 was having some performance issues
> (it has around *30 seconds average op latency*!).
> sar shows slightly higher disk I/O for osd 598 (/dev/sdf) but the
> latency difference is not as significant as we saw from osd perf.
> reads  kbread writes  kbwrite %busy  avgqu  await  svctm
> 37.3    459.9    89.8    4106.9  61.8    1.6      12.2    4.9
> 42.3    545.8    91.8    4296.3  69.7    2.4      17.6    5.2
> 42.0    483.8    93.1    4263.6  68.8    1.8      13.3    5.1
> 39.7    425.5    89.4    4327.0  68.5    1.8      14.0    5.3
>
> Another disk at the same time for comparison (/dev/sdb).
> reads  kbread writes  kbwrite %busy  avgqu  await  svctm
> 34.2    502.6    80.1    3524.3    53.4    1.3    11.8      4.7
> 35.3    560.9    83.7    3742.0    56.0    1.2    9.8      4.7
> 30.4    371.5  78.8    3631.4    52.

Re: [ceph-users] Ceph cluster performance degrade (radosgw) after running some time

2013-12-31 Thread Guang Yang
Thanks Mark, my comments inline...

Date: Mon, 30 Dec 2013 07:36:56 -0600
From: Mark Nelson 
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ceph cluster performance degrade (radosgw)
    after running some time

On 12/30/2013 05:45 AM, Guang wrote:
> Hi ceph-users and ceph-devel,
> Merry Christmas and Happy New Year!
>
> We have a ceph cluster with radosgw, our customer is using S3 API to
> access the cluster.
>
> The basic information of the cluster is:
> bash-4.1$ ceph -s
>    cluster b9cb3ea9-e1de-48b4-9e86-6921e2c537d2
>    health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>    monmap e1: 3 mons at
> {osd151=10.194.0.68:6789/0,osd152=10.193.207.130:6789/0,osd153=10.193.207.131:6789/0},
> election epoch 40, quorum 0,1,2 osd151,osd152,osd153
>    osdmap e129885: 787 osds: 758 up, 758 in
>      pgmap v1884502: 22203 pgs: 22125 active+clean, 1
> active+clean+scrubbing, 1 active+clean+inconsistent, 76
> active+clean+scrubbing+deep; 96319 GB data, 302 TB used, 762 TB / 1065
> TB avail
>    mdsmap e1: 0/0/1 up
>
> #When the latency peak happened, there was no scrubbing, recovering or
> backfilling at the moment.#
>
> While the performance of the cluster (only with WRITE traffic) is stable
> until Dec 25th, our monitoring (for radosgw access log) shows a
> significant increase of average latency and 99% latency.
>
> And then I chose one OSD and try to grep slow requests logs and find
> that most of the slow requests were waiting for subop, I take osd22 for
> example.
>
> osd[561-571] are hosted by osd22.
> -bash-4.1$ for i in {561..571}; do grep "slow request" ceph-osd.$i.log |
> grep "2013-12-25 16"| grep osd_op | grep -oP "\d+,\d+" ; done >
> ~/slow_osd.txt
> -bash-4.1$ cat ~/slow_osd.txt  | sort | uniq -c | sort ?nr
>    3586 656,598
>      289 467,629
>      284 598,763
>      279 584,598
>      203 172,598
>      182 598,6
>      155 629,646
>      83 631,598
>      65 631,593
>      21 616,629
>      20 609,671
>      20 609,390
>      13 609,254
>      12 702,629
>      12 629,641
>      11 665,613
>      11 593,724
>      11 361,591
>      10 591,709
>        9 681,609
>        9 609,595
>        9 591,772
>        8 613,662
>        8 575,591
>        7 674,722
>        7 609,603
>        6 585,605
>        5 613,691
>        5 293,629
>        4 774,591
>        4 717,591
>        4 613,776
>        4 538,629
>        4 485,629
>        3 702,641
>        3 608,629
>        3 593,580
>        3 591,676
>
> It turns out most of the slow requests were waiting for osd 598, 629, I
> ran the procedure on another host osd22 and got the same pattern.
>
> Then I turned to the host having osd598 and dump the perf counter to do
> comparision.
>
> -bash-4.1$ for i in {594..604}; do sudo ceph --admin-daemon
> /var/run/ceph/ceph-osd.$i.asok perf dump | ~/do_calc_op_latency.pl; done
> op_latency,subop_latency,total_ops
> 0.192097526753471,0.0344513450167198,7549045
> 1.99137797628122,1.42198426157216,9184472
> 0.198062399664129,0.0387090378926376,6305973
> 0.621697271315762,0.396549768986993,9726679
> 29.5222496247375,18.246379615, 10860858
> 0.229250239525916,0.0557482067611005,8149691
> 0.208981698303654,0.0375553180438224,6623842
> 0.47474766302086,0.292583928601509,9838777
> 0.339477790083925,0.101288409388438,9340212
> 0.186448840141895,0.0327296517417626,7081410
> 0.807598201207144,0.0139762289702332,6093531
> (osd 598 is op hotspot as well)
>
> This double confirmed that osd 598 was having some performance issues
> (it has around *30 seconds average op latency*!).
> sar shows slightly higher disk I/O for osd 598 (/dev/sdf) but the
> latency difference is not as significant as we saw from osd perf.
> reads  kbread writes  kbwrite %busy  avgqu  await  svctm
> 37.3    459.9    89.8    4106.9  61.8    1.6      12.2    4.9
> 42.3    545.8    91.8    4296.3  69.7    2.4      17.6    5.2
> 42.0    483.8    93.1    4263.6  68.8    1.8      13.3    5.1
> 39.7    425.5    89.4    4327.0  68.5    1.8      14.0    5.3
>
> Another disk at the same time for comparison (/dev/sdb).
> reads  kbread writes  kbwrite %busy  avgqu  await  svctm
> 34.2    502.6    80.1    3524.3    53.4    1.3    11.8      4.7
> 35.3    560.9    83.7    3742.0    56.0    1.2    9.8      4.7
> 30.4    371.5  78.8    3631.4    52.2    1.7    15.8    4.8
> 33.0    389.4  78.8      3597.6  54.2    1.4      12.1    4.8
>
> Any idea why a couple of OSDs are so slow that impact the performance of
> the entire cluster?

You may want to use the dump_historic_ops command in the admin socket 
for the slow OSDs.  That will give you some clues regarding whe

Re: [ceph-users] Ceph cluster performance degrade (radosgw) after running some time

2013-12-31 Thread Guang Yang
Thanks Wido, my comments inline...

>Date: Mon, 30 Dec 2013 14:04:35 +0100
>From: Wido den Hollander 
>To: ceph-users@lists.ceph.com
>Subject: Re: [ceph-users] Ceph cluster performance degrade (radosgw)
>    after running some time

>On 12/30/2013 12:45 PM, Guang wrote:
> Hi ceph-users and ceph-devel,
> Merry Christmas and Happy New Year!
>
> We have a ceph cluster with radosgw, our customer is using S3 API to
> access the cluster.
>
> The basic information of the cluster is:
> bash-4.1$ ceph -s
>    cluster b9cb3ea9-e1de-48b4-9e86-6921e2c537d2
>    health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>    monmap e1: 3 mons at
> {osd151=10.194.0.68:6789/0,osd152=10.193.207.130:6789/0,osd153=10.193.207.131:6789/0},
> election epoch 40, quorum 0,1,2 osd151,osd152,osd153
>    osdmap e129885: 787 osds: 758 up, 758 in
>      pgmap v1884502: 22203 pgs: 22125 active+clean, 1
> active+clean+scrubbing, 1 active+clean+inconsistent, 76
> active+clean+scrubbing+deep; 96319 GB data, 302 TB used, 762 TB / 1065
> TB avail
>    mdsmap e1: 0/0/1 up
>
> #When the latency peak happened, there was no scrubbing, recovering or
> backfilling at the moment.#
>
> While the performance of the cluster (only with WRITE traffic) is stable
> until Dec 25th, our monitoring (for radosgw access log) shows a
> significant increase of average latency and 99% latency.
>
> And then I chose one OSD and try to grep slow requests logs and find
> that most of the slow requests were waiting for subop, I take osd22 for
> example.
>
> osd[561-571] are hosted by osd22.
> -bash-4.1$ for i in {561..571}; do grep "slow request" ceph-osd.$i.log |
> grep "2013-12-25 16"| grep osd_op | grep -oP "\d+,\d+" ; done >
> ~/slow_osd.txt
> -bash-4.1$ cat ~/slow_osd.txt  | sort | uniq -c | sort ?nr
>    3586 656,598
>      289 467,629
>      284 598,763
>      279 584,598
>      203 172,598
>      182 598,6
>      155 629,646
>      83 631,598
>      65 631,593
>      21 616,629
>      20 609,671
>      20 609,390
>      13 609,254
>      12 702,629
>      12 629,641
>      11 665,613
>      11 593,724
>      11 361,591
>      10 591,709
>        9 681,609
>        9 609,595
>        9 591,772
>        8 613,662
>        8 575,591
>        7 674,722
>        7 609,603
>        6 585,605
>        5 613,691
>        5 293,629
>        4 774,591
>        4 717,591
>        4 613,776
>        4 538,629
>        4 485,629
>        3 702,641
>        3 608,629
>        3 593,580
>        3 591,676
>
> It turns out most of the slow requests were waiting for osd 598, 629, I
> ran the procedure on another host osd22 and got the same pattern.
>
> Then I turned to the host having osd598 and dump the perf counter to do
> comparision.
>
> -bash-4.1$ for i in {594..604}; do sudo ceph --admin-daemon
> /var/run/ceph/ceph-osd.$i.asok perf dump | ~/do_calc_op_latency.pl; done
> op_latency,subop_latency,total_ops
> 0.192097526753471,0.0344513450167198,7549045
> 1.99137797628122,1.42198426157216,9184472
> 0.198062399664129,0.0387090378926376,6305973
> 0.621697271315762,0.396549768986993,9726679
> 29.5222496247375,18.246379615, 10860858
> 0.229250239525916,0.0557482067611005,8149691
> 0.208981698303654,0.0375553180438224,6623842
> 0.47474766302086,0.292583928601509,9838777
> 0.339477790083925,0.101288409388438,9340212
> 0.186448840141895,0.0327296517417626,7081410
> 0.807598201207144,0.0139762289702332,6093531
> (osd 598 is op hotspot as well)
>
> This double confirmed that osd 598 was having some performance issues
> (it has around *30 seconds average op latency*!).
> sar shows slightly higher disk I/O for osd 598 (/dev/sdf) but the
> latency difference is not as significant as we saw from osd perf.
> reads  kbread writes  kbwrite %busy  avgqu  await  svctm
> 37.3    459.9    89.8    4106.9  61.8    1.6      12.2    4.9
> 42.3    545.8    91.8    4296.3  69.7    2.4      17.6    5.2
> 42.0    483.8    93.1    4263.6  68.8    1.8      13.3    5.1
> 39.7    425.5    89.4    4327.0  68.5    1.8      14.0    5.3
>
> Another disk at the same time for comparison (/dev/sdb).
> reads  kbread writes  kbwrite %busy  avgqu  await  svctm
> 34.2    502.6    80.1    3524.3    53.4    1.3    11.8      4.7
> 35.3    560.9    83.7    3742.0    56.0    1.2    9.8      4.7
> 30.4    371.5  78.8    3631.4    52.2    1.7    15.8    4.8
> 33.0    389.4  78.8      3597.6  54.2    1.4      12.1    4.8
>
> Any idea why a couple of OSDs are so slow that impact the performance of
> the entire cluster?
>

What filesystem are you using? Btrfs or XFS?

Btrfs still suffers from a performance degradation over t

Re: [ceph-users] Ceph cluster performance degrade (radosgw) > after running some time

2013-12-30 Thread Guang Yang
Thanks Wido, my comments inline...

>Date: Mon, 30 Dec 2013 14:04:35 +0100
>From: Wido den Hollander 
>To: ceph-users@lists.ceph.com
>Subject: Re: [ceph-users] Ceph cluster performance degrade (radosgw)
>    after running some time

>On 12/30/2013 12:45 PM, Guang wrote:
> Hi ceph-users and ceph-devel,
> Merry Christmas and Happy New Year!
>
> We have a ceph cluster with radosgw, our customer is using S3 API to
> access the cluster.
>
> The basic information of the cluster is:
> bash-4.1$ ceph -s
>    cluster b9cb3ea9-e1de-48b4-9e86-6921e2c537d2
>    health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>    monmap e1: 3 mons at
> {osd151=10.194.0.68:6789/0,osd152=10.193.207.130:6789/0,osd153=10.193.207.131:6789/0},
> election epoch 40, quorum 0,1,2 osd151,osd152,osd153
>    osdmap e129885: 787 osds: 758 up, 758 in
>      pgmap v1884502: 22203 pgs: 22125 active+clean, 1
> active+clean+scrubbing, 1 active+clean+inconsistent, 76
> active+clean+scrubbing+deep; 96319 GB data, 302 TB used, 762 TB / 1065
> TB avail
>    mdsmap e1: 0/0/1 up
>
> #When the latency peak happened, there was no scrubbing, recovering or
> backfilling at the moment.#
>
> While the performance of the cluster (only with WRITE traffic) is stable
> until Dec 25th, our monitoring (for radosgw access log) shows a
> significant increase of average latency and 99% latency.
>
> And then I chose one OSD and try to grep slow requests logs and find
> that most of the slow requests were waiting for subop, I take osd22 for
> example.
>
> osd[561-571] are hosted by osd22.
> -bash-4.1$ for i in {561..571}; do grep "slow request" ceph-osd.$i.log |
> grep "2013-12-25 16"| grep osd_op | grep -oP "\d+,\d+" ; done >
> ~/slow_osd.txt
> -bash-4.1$ cat ~/slow_osd.txt  | sort | uniq -c | sort ?nr
>    3586 656,598
>      289 467,629
>      284 598,763
>      279 584,598
>      203 172,598
>      182 598,6
>      155 629,646
>      83 631,598
>      65 631,593
>      21 616,629
>      20 609,671
>      20 609,390
>      13 609,254
>      12 702,629
>      12 629,641
>      11 665,613
>      11 593,724
>      11 361,591
>      10 591,709
>        9 681,609
>        9 609,595
>        9 591,772
>        8 613,662
>        8 575,591
>        7 674,722
>        7 609,603
>        6 585,605
>        5 613,691
>        5 293,629
>        4 774,591
>        4 717,591
>        4 613,776
>        4 538,629
>        4 485,629
>        3 702,641
>        3 608,629
>        3 593,580
>        3 591,676
>
> It turns out most of the slow requests were waiting for osd 598, 629, I
> ran the procedure on another host osd22 and got the same pattern.
>
> Then I turned to the host having osd598 and dump the perf counter to do
> comparision.
>
> -bash-4.1$ for i in {594..604}; do sudo ceph --admin-daemon
> /var/run/ceph/ceph-osd.$i.asok perf dump | ~/do_calc_op_latency.pl; done
> op_latency,subop_latency,total_ops
> 0.192097526753471,0.0344513450167198,7549045
> 1.99137797628122,1.42198426157216,9184472
> 0.198062399664129,0.0387090378926376,6305973
> 0.621697271315762,0.396549768986993,9726679
> 29.5222496247375,18.246379615, 10860858
> 0.229250239525916,0.0557482067611005,8149691
> 0.208981698303654,0.0375553180438224,6623842
> 0.47474766302086,0.292583928601509,9838777
> 0.339477790083925,0.101288409388438,9340212
> 0.186448840141895,0.0327296517417626,7081410
> 0.807598201207144,0.0139762289702332,6093531
> (osd 598 is op hotspot as well)
>
> This double confirmed that osd 598 was having some performance issues
> (it has around *30 seconds average op latency*!).
> sar shows slightly higher disk I/O for osd 598 (/dev/sdf) but the
> latency difference is not as significant as we saw from osd perf.
> reads  kbread writes  kbwrite %busy  avgqu  await  svctm
> 37.3    459.9    89.8    4106.9  61.8    1.6      12.2    4.9
> 42.3    545.8    91.8    4296.3  69.7    2.4      17.6    5.2
> 42.0    483.8    93.1    4263.6  68.8    1.8      13.3    5.1
> 39.7    425.5    89.4    4327.0  68.5    1.8      14.0    5.3
>
> Another disk at the same time for comparison (/dev/sdb).
> reads  kbread writes  kbwrite %busy  avgqu  await  svctm
> 34.2    502.6    80.1    3524.3    53.4    1.3    11.8      4.7
> 35.3    560.9    83.7    3742.0    56.0    1.2    9.8      4.7
> 30.4    371.5  78.8    3631.4    52.2    1.7    15.8    4.8
> 33.0    389.4  78.8      3597.6  54.2    1.4      12.1    4.8
>
> Any idea why a couple of OSDs are so slow that impact the performance of
> the entire cluster?
>

What filesystem are you using? Btrfs or XFS?

Btrfs still suffers from a performance degradation over time. So if you 
run btrfs, that might be the problem.

[yguang] We are running on xfs, journal and data share the same disk with 
different partitions.

Wido

> Thanks,___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster performance degrade (radosgw) after running some time

2013-12-30 Thread Mark Nelson

On 12/30/2013 05:45 AM, Guang wrote:

Hi ceph-users and ceph-devel,
Merry Christmas and Happy New Year!

We have a ceph cluster with radosgw, our customer is using S3 API to
access the cluster.

The basic information of the cluster is:
bash-4.1$ ceph -s
   cluster b9cb3ea9-e1de-48b4-9e86-6921e2c537d2
health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
monmap e1: 3 mons at
{osd151=10.194.0.68:6789/0,osd152=10.193.207.130:6789/0,osd153=10.193.207.131:6789/0},
election epoch 40, quorum 0,1,2 osd151,osd152,osd153
osdmap e129885: 787 osds: 758 up, 758 in
 pgmap v1884502: 22203 pgs: 22125 active+clean, 1
active+clean+scrubbing, 1 active+clean+inconsistent, 76
active+clean+scrubbing+deep; 96319 GB data, 302 TB used, 762 TB / 1065
TB avail
mdsmap e1: 0/0/1 up

#When the latency peak happened, there was no scrubbing, recovering or
backfilling at the moment.#

While the performance of the cluster (only with WRITE traffic) is stable
until Dec 25th, our monitoring (for radosgw access log) shows a
significant increase of average latency and 99% latency.

And then I chose one OSD and try to grep slow requests logs and find
that most of the slow requests were waiting for subop, I take osd22 for
example.

osd[561-571] are hosted by osd22.
-bash-4.1$ for i in {561..571}; do grep "slow request" ceph-osd.$i.log |
grep "2013-12-25 16"| grep osd_op | grep -oP "\d+,\d+" ; done >
~/slow_osd.txt
-bash-4.1$ cat ~/slow_osd.txt  | sort | uniq -c | sort –nr
3586 656,598
 289 467,629
 284 598,763
 279 584,598
 203 172,598
 182 598,6
 155 629,646
  83 631,598
  65 631,593
  21 616,629
  20 609,671
  20 609,390
  13 609,254
  12 702,629
  12 629,641
  11 665,613
  11 593,724
  11 361,591
  10 591,709
   9 681,609
   9 609,595
   9 591,772
   8 613,662
   8 575,591
   7 674,722
   7 609,603
   6 585,605
   5 613,691
   5 293,629
   4 774,591
   4 717,591
   4 613,776
   4 538,629
   4 485,629
   3 702,641
   3 608,629
   3 593,580
   3 591,676

It turns out most of the slow requests were waiting for osd 598, 629, I
ran the procedure on another host osd22 and got the same pattern.

Then I turned to the host having osd598 and dump the perf counter to do
comparision.

-bash-4.1$ for i in {594..604}; do sudo ceph --admin-daemon
/var/run/ceph/ceph-osd.$i.asok perf dump | ~/do_calc_op_latency.pl; done
op_latency,subop_latency,total_ops
0.192097526753471,0.0344513450167198,7549045
1.99137797628122,1.42198426157216,9184472
0.198062399664129,0.0387090378926376,6305973
0.621697271315762,0.396549768986993,9726679
29.5222496247375,18.246379615, 10860858
0.229250239525916,0.0557482067611005,8149691
0.208981698303654,0.0375553180438224,6623842
0.47474766302086,0.292583928601509,9838777
0.339477790083925,0.101288409388438,9340212
0.186448840141895,0.0327296517417626,7081410
0.807598201207144,0.0139762289702332,6093531
(osd 598 is op hotspot as well)

This double confirmed that osd 598 was having some performance issues
(it has around *30 seconds average op latency*!).
sar shows slightly higher disk I/O for osd 598 (/dev/sdf) but the
latency difference is not as significant as we saw from osd perf.
reads   kbread writes  kbwrite %busy  avgqu  await  svctm
37.3459.989.8 4106.9   61.8 1.6   12.24.9
42.3545.891.8 4296.3   69.7 2.4   17.65.2
42.0483.893.1 4263.6   68.8 1.8   13.35.1
39.7425.589.4 4327.0   68.5 1.8   14.05.3

Another disk at the same time for comparison (/dev/sdb).
reads   kbread writes  kbwrite %busy  avgqu  await  svctm
34.2502.680.13524.353.4 1.3 11.8  4.7
35.3560.983.73742.056.0 1.2 9.8   4.7
30.4371.5   78.8 3631.452.2 1.7 15.8 4.8
33.0389.4   78.8  3597.6   54.2 1.4  12.14.8

Any idea why a couple of OSDs are so slow that impact the performance of
the entire cluster?


You may want to use the dump_historic_ops command in the admin socket 
for the slow OSDs.  That will give you some clues regarding where the 
ops are hanging up in the OSD.  You can also crank the osd debugging way 
up on that node and search through the logs to see if there are any 
patterns or trends (consistent slowness, pauses, etc).  It may also be 
useful to look and see if that OSD is pegging CPU and if so attach 
strace or perf to it and see what it's doing.


Normally in this situation I'd say to be wary of the disk going bad, but 
in this case it may be something else.




Thanks,
Guang


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster performance degrade (radosgw) after running some time

2013-12-30 Thread Wido den Hollander

On 12/30/2013 12:45 PM, Guang wrote:

Hi ceph-users and ceph-devel,
Merry Christmas and Happy New Year!

We have a ceph cluster with radosgw, our customer is using S3 API to
access the cluster.

The basic information of the cluster is:
bash-4.1$ ceph -s
   cluster b9cb3ea9-e1de-48b4-9e86-6921e2c537d2
health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
monmap e1: 3 mons at
{osd151=10.194.0.68:6789/0,osd152=10.193.207.130:6789/0,osd153=10.193.207.131:6789/0},
election epoch 40, quorum 0,1,2 osd151,osd152,osd153
osdmap e129885: 787 osds: 758 up, 758 in
 pgmap v1884502: 22203 pgs: 22125 active+clean, 1
active+clean+scrubbing, 1 active+clean+inconsistent, 76
active+clean+scrubbing+deep; 96319 GB data, 302 TB used, 762 TB / 1065
TB avail
mdsmap e1: 0/0/1 up

#When the latency peak happened, there was no scrubbing, recovering or
backfilling at the moment.#

While the performance of the cluster (only with WRITE traffic) is stable
until Dec 25th, our monitoring (for radosgw access log) shows a
significant increase of average latency and 99% latency.

And then I chose one OSD and try to grep slow requests logs and find
that most of the slow requests were waiting for subop, I take osd22 for
example.

osd[561-571] are hosted by osd22.
-bash-4.1$ for i in {561..571}; do grep "slow request" ceph-osd.$i.log |
grep "2013-12-25 16"| grep osd_op | grep -oP "\d+,\d+" ; done >
~/slow_osd.txt
-bash-4.1$ cat ~/slow_osd.txt  | sort | uniq -c | sort –nr
3586 656,598
 289 467,629
 284 598,763
 279 584,598
 203 172,598
 182 598,6
 155 629,646
  83 631,598
  65 631,593
  21 616,629
  20 609,671
  20 609,390
  13 609,254
  12 702,629
  12 629,641
  11 665,613
  11 593,724
  11 361,591
  10 591,709
   9 681,609
   9 609,595
   9 591,772
   8 613,662
   8 575,591
   7 674,722
   7 609,603
   6 585,605
   5 613,691
   5 293,629
   4 774,591
   4 717,591
   4 613,776
   4 538,629
   4 485,629
   3 702,641
   3 608,629
   3 593,580
   3 591,676

It turns out most of the slow requests were waiting for osd 598, 629, I
ran the procedure on another host osd22 and got the same pattern.

Then I turned to the host having osd598 and dump the perf counter to do
comparision.

-bash-4.1$ for i in {594..604}; do sudo ceph --admin-daemon
/var/run/ceph/ceph-osd.$i.asok perf dump | ~/do_calc_op_latency.pl; done
op_latency,subop_latency,total_ops
0.192097526753471,0.0344513450167198,7549045
1.99137797628122,1.42198426157216,9184472
0.198062399664129,0.0387090378926376,6305973
0.621697271315762,0.396549768986993,9726679
29.5222496247375,18.246379615, 10860858
0.229250239525916,0.0557482067611005,8149691
0.208981698303654,0.0375553180438224,6623842
0.47474766302086,0.292583928601509,9838777
0.339477790083925,0.101288409388438,9340212
0.186448840141895,0.0327296517417626,7081410
0.807598201207144,0.0139762289702332,6093531
(osd 598 is op hotspot as well)

This double confirmed that osd 598 was having some performance issues
(it has around *30 seconds average op latency*!).
sar shows slightly higher disk I/O for osd 598 (/dev/sdf) but the
latency difference is not as significant as we saw from osd perf.
reads   kbread writes  kbwrite %busy  avgqu  await  svctm
37.3459.989.8 4106.9   61.8 1.6   12.24.9
42.3545.891.8 4296.3   69.7 2.4   17.65.2
42.0483.893.1 4263.6   68.8 1.8   13.35.1
39.7425.589.4 4327.0   68.5 1.8   14.05.3

Another disk at the same time for comparison (/dev/sdb).
reads   kbread writes  kbwrite %busy  avgqu  await  svctm
34.2502.680.13524.353.4 1.3 11.8  4.7
35.3560.983.73742.056.0 1.2 9.8   4.7
30.4371.5   78.8 3631.452.2 1.7 15.8 4.8
33.0389.4   78.8  3597.6   54.2 1.4  12.14.8

Any idea why a couple of OSDs are so slow that impact the performance of
the entire cluster?



What filesystem are you using? Btrfs or XFS?

Btrfs still suffers from a performance degradation over time. So if you 
run btrfs, that might be the problem.


Wido


Thanks,
Guang


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph cluster performance degrade (radosgw) after running some time

2013-12-30 Thread Guang
Hi ceph-users and ceph-devel,
Merry Christmas and Happy New Year!

We have a ceph cluster with radosgw, our customer is using S3 API to access the 
cluster.

The basic information of the cluster is:
bash-4.1$ ceph -s
  cluster b9cb3ea9-e1de-48b4-9e86-6921e2c537d2
   health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
   monmap e1: 3 mons at 
{osd151=10.194.0.68:6789/0,osd152=10.193.207.130:6789/0,osd153=10.193.207.131:6789/0},
 election epoch 40, quorum 0,1,2 osd151,osd152,osd153
   osdmap e129885: 787 osds: 758 up, 758 in
pgmap v1884502: 22203 pgs: 22125 active+clean, 1 active+clean+scrubbing, 1 
active+clean+inconsistent, 76 active+clean+scrubbing+deep; 96319 GB data, 302 
TB used, 762 TB / 1065 TB avail
   mdsmap e1: 0/0/1 up

#When the latency peak happened, there was no scrubbing, recovering or 
backfilling at the moment.#

While the performance of the cluster (only with WRITE traffic) is stable until 
Dec 25th, our monitoring (for radosgw access log) shows a significant increase 
of average latency and 99% latency.

And then I chose one OSD and try to grep slow requests logs and find that most 
of the slow requests were waiting for subop, I take osd22 for example.

osd[561-571] are hosted by osd22.
-bash-4.1$ for i in {561..571}; do grep "slow request" ceph-osd.$i.log | grep 
"2013-12-25 16"| grep osd_op | grep -oP "\d+,\d+" ; done > ~/slow_osd.txt
-bash-4.1$ cat ~/slow_osd.txt  | sort | uniq -c | sort –nr
   3586 656,598
289 467,629
284 598,763
279 584,598
203 172,598
182 598,6
155 629,646
 83 631,598
 65 631,593
 21 616,629
 20 609,671
 20 609,390
 13 609,254
 12 702,629
 12 629,641
 11 665,613
 11 593,724
 11 361,591
 10 591,709
  9 681,609
  9 609,595
  9 591,772
  8 613,662
  8 575,591
  7 674,722
  7 609,603
  6 585,605
  5 613,691
  5 293,629
  4 774,591
  4 717,591
  4 613,776
  4 538,629
  4 485,629
  3 702,641
  3 608,629
  3 593,580
  3 591,676

It turns out most of the slow requests were waiting for osd 598, 629, I ran the 
procedure on another host osd22 and got the same pattern.

Then I turned to the host having osd598 and dump the perf counter to do 
comparision.

-bash-4.1$ for i in {594..604}; do sudo ceph --admin-daemon 
/var/run/ceph/ceph-osd.$i.asok perf dump | ~/do_calc_op_latency.pl; done
op_latency,subop_latency,total_ops
0.192097526753471,0.0344513450167198,7549045
1.99137797628122,1.42198426157216,9184472
0.198062399664129,0.0387090378926376,6305973
0.621697271315762,0.396549768986993,9726679
29.5222496247375,18.246379615, 10860858
0.229250239525916,0.0557482067611005,8149691
0.208981698303654,0.0375553180438224,6623842
0.47474766302086,0.292583928601509,9838777
0.339477790083925,0.101288409388438,9340212
0.186448840141895,0.0327296517417626,7081410
0.807598201207144,0.0139762289702332,6093531
(osd 598 is op hotspot as well)

This double confirmed that osd 598 was having some performance issues (it has 
around 30 seconds average op latency!).
sar shows slightly higher disk I/O for osd 598 (/dev/sdf) but the latency 
difference is not as significant as we saw from osd perf.
reads   kbread writes  kbwrite %busy  avgqu  await  svctm
37.3459.989.8 4106.9   61.8 1.6   12.24.9
42.3545.891.8 4296.3   69.7 2.4   17.65.2 
42.0483.893.1 4263.6   68.8 1.8   13.35.1
39.7425.589.4 4327.0   68.5 1.8   14.05.3

Another disk at the same time for comparison (/dev/sdb).
reads   kbread writes  kbwrite %busy  avgqu  await  svctm
34.2502.680.13524.353.4 1.3 11.8  4.7
35.3560.983.73742.056.0 1.2 9.8   4.7 
30.4371.5   78.8 3631.452.2 1.7 15.8 4.8
33.0389.4   78.8  3597.6   54.2 1.4  12.14.8

Any idea why a couple of OSDs are so slow that impact the performance of the 
entire cluster?

Thanks,
Guang___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com