[ceph-users] Re: Please discuss about Slow Peering

2024-05-21 Thread Frank Schilder
> Not with the most recent Ceph releases.

Actually, this depends. If its SSDs for which IOPs profit from higher iodepth, 
it is very likely to improve performance, because until today each OSD has only 
one kv_sync_thread and this is typically the bottleneck with heavy IOPs load. 
Having 2-4 kv_sync_threads per SSD, meaning 2-4 OSDs per disk, will help a lot 
if this thread is saturating.

For NVMes this is usually not required.

The question still remains, do you have enough CPU? If you have 13 disks with 4 
OSDs each, you will need a core-count of at least 50-ish per host. Newer OSDs 
might be able to utilize even more on fast disks. You will also need 4 times 
the RAM.

> I suspect your PGs are too few though.

In addition, on these drives you should aim for 150-200 PGs per OSD (another 
reason to go x4 OSDs - x4 PGs per drive). We have 198PGs/OSD on average and 
this helps a lot with IO, recovery, everything.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Anthony D'Atri 
Sent: Tuesday, May 21, 2024 3:06 PM
To: 서민우
Cc: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Please discuss about Slow Peering



I have additional questions,
We use 13 disk (3.2TB NVMe) per server and allocate one OSD to each disk. In 
other words 1 Node has 13 osds.
Do you think this is inefficient?
Is it better to create more OSD by creating LV on the disk?

Not with the most recent Ceph releases.  I suspect your PGs are too few though.




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Please discuss about Slow Peering

2024-05-21 Thread Anthony D'Atri
> 
> 
> I have additional questions,
> We use 13 disk (3.2TB NVMe) per server and allocate one OSD to each disk. In 
> other words 1 Node has 13 osds.
> Do you think this is inefficient?
> Is it better to create more OSD by creating LV on the disk?

Not with the most recent Ceph releases.  I suspect your PGs are too few though.

>> 
>> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Please discuss about Slow Peering

2024-05-21 Thread 서민우
I compared your advice to my current setup.
I think it worked that increasing 'osd_memory_target'. I changed 4G -> 8G
then, the latency of our test decreased about 50%.

I have additional questions,
We use 13 disk (3.2TB NVMe) per server and allocate one OSD to each disk.
In other words 1 Node has 13 osds.
Do you think this is inefficient?
Is it better to create more OSD by creating LV on the disk?


2024년 5월 21일 (화) 오후 6:49, Frank Schilder 님이 작성:

> We are using the read-intensive kioxia drives (octopus cluster) in RBD
> pools and are very happy with them. I don't think its the drives.
>
> The last possibility I could think of is CPU. We run 4 OSDs per 1.92TB
> Kioxia drive to utilize their performance (single OSD per disk doesn't cut
> it at all) and have 2x16-core Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
> per server. During normal operations the CPU is only lightly loaded. During
> peering this load peaks at or above 100%. If not enough CPU power is
> available, peering will be hit very badly. What you could check is:
>
> - number of cores: at least 1 HT per OSD, better 1 core per OSD.
> - cstates disabled: we run with virtualization-performance profile and the
> CPU is basically always at all-core boost (3.2GHz)
> - sufficient RAM: we run these OSDs with 6G memory limit, that's 24G per
> disk! Still, the servers have 50% OSD RAM utilisation and 50% buffers, so
> there is enough for fast peak allocations during peering.
> - check vm.min_free_kbytes: the default is way too low for OSD hosts, we
> use vm.min_free_kbytes=4194304 (4G), this can have latency impact for
> network connections
> - swap disabled: disable swap on OSD hosts
> - sysctl network tuning: check that your network parameters are
> appropriate for your network cards, the kernel defaults are still for 1G
> connections, there are great tuning guides on-line, here some of our
> settings for 10G NICs:
>
> # Increase autotuning TCP buffer limits
> # 10G fiber/64MB buffers (67108864)
> net.core.rmem_max = 67108864
> net.core.wmem_max = 67108864
> net.core.rmem_default = 67108864
> net.core.wmem_default = 67108864
> net.core.optmem_max = 40960
> net.ipv4.tcp_rmem = 22500   218450 67108864
> net.ipv4.tcp_wmem = 22500  81920 67108864
>
> - last check: are you using WPQ or MCLOCK? The mclock scheduler still has
> serious issues and switching to WPQ might help.
>
> If none of these help, I'm out of ideas. For us the Kioxia drives work
> like a charm, its the pool that is easiest to manage and maintain with
> super-fast recovery and really good sustained performance.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: 서민우 
> Sent: Tuesday, May 21, 2024 11:25 AM
> To: Anthony D'Atri
> Cc: Frank Schilder; ceph-users@ceph.io
> Subject: Re: [ceph-users] Please discuss about Slow Peering
>
> We used the "kioxia kcd6xvul3t20" model.
> Any infamous information of this Model?
>
> 2024년 5월 17일 (금) 오전 2:58, Anthony D'Atri  anthony.da...@gmail.com>>님이 작성:
> If using jumbo frames, also ensure that they're consistently enabled on
> all OS instances and network devices.
>
> > On May 16, 2024, at 09:30, Frank Schilder  fr...@dtu.dk>> wrote:
> >
> > This is a long shot: if you are using octopus, you might be hit by this
> pglog-dup problem:
> https://docs.clyso.com/blog/osds-with-unlimited-ram-growth/. They don't
> mention slow peering explicitly in the blog, but its also a consequence
> because the up+acting OSDs need to go through the PG_log during peering.
> >
> > We are also using octopus and I'm not sure if we have ever seen slow ops
> caused by peering alone. It usually happens when a disk cannot handle load
> under peering. We have, unfortunately, disks that show random latency
> spikes (firmware update pending). You can try to monitor OPS latencies for
> your drives when peering and look for something that sticks out. People on
> this list were reporting quite bad results for certain infamous NVMe
> brands. If you state your model numbers, someone else might recognize it.
> >
> > Best regards,
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: 서민우 mailto:smw940...@gmail.com>>
> > Sent: Thursday, May 16, 2024 7:39 AM
> > To: ceph-users@ceph.io
> > Subject: [ceph-users] Please discuss about Slow Peering
> >
> > Env:
> > - OS: Ubuntu 20.04
> > - Ceph Version: Octopus 15.0.0.1
> > - OSD Disk: 2.9TB NVMe
> > - BlockStorage (Replication 3)
> >
> > Symptom:
> > - Peering when OSD's node up is very slow. Peering speed varies from PG
> to
> > PG, and some PG may even take 10 seconds. But, there is no log for 10
> > seconds.
> > - I checked the effect of client VM's. Actually, Slow queries of mysql
> > occur at the same time.
> >
> > There are Ceph OSD logs of both Best and Worst.
> >
> > Best Peering Case (0.5 Seconds)
> > 

[ceph-users] Re: Please discuss about Slow Peering

2024-05-21 Thread Frank Schilder
We are using the read-intensive kioxia drives (octopus cluster) in RBD pools 
and are very happy with them. I don't think its the drives.

The last possibility I could think of is CPU. We run 4 OSDs per 1.92TB Kioxia 
drive to utilize their performance (single OSD per disk doesn't cut it at all) 
and have 2x16-core Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz per server. 
During normal operations the CPU is only lightly loaded. During peering this 
load peaks at or above 100%. If not enough CPU power is available, peering will 
be hit very badly. What you could check is:

- number of cores: at least 1 HT per OSD, better 1 core per OSD.
- cstates disabled: we run with virtualization-performance profile and the CPU 
is basically always at all-core boost (3.2GHz)
- sufficient RAM: we run these OSDs with 6G memory limit, that's 24G per disk! 
Still, the servers have 50% OSD RAM utilisation and 50% buffers, so there is 
enough for fast peak allocations during peering.
- check vm.min_free_kbytes: the default is way too low for OSD hosts, we use 
vm.min_free_kbytes=4194304 (4G), this can have latency impact for network 
connections
- swap disabled: disable swap on OSD hosts
- sysctl network tuning: check that your network parameters are appropriate for 
your network cards, the kernel defaults are still for 1G connections, there are 
great tuning guides on-line, here some of our settings for 10G NICs:

# Increase autotuning TCP buffer limits
# 10G fiber/64MB buffers (67108864)
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
net.core.rmem_default = 67108864
net.core.wmem_default = 67108864
net.core.optmem_max = 40960
net.ipv4.tcp_rmem = 22500   218450 67108864
net.ipv4.tcp_wmem = 22500  81920 67108864

- last check: are you using WPQ or MCLOCK? The mclock scheduler still has 
serious issues and switching to WPQ might help.

If none of these help, I'm out of ideas. For us the Kioxia drives work like a 
charm, its the pool that is easiest to manage and maintain with super-fast 
recovery and really good sustained performance.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: 서민우 
Sent: Tuesday, May 21, 2024 11:25 AM
To: Anthony D'Atri
Cc: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Please discuss about Slow Peering

We used the "kioxia kcd6xvul3t20" model.
Any infamous information of this Model?

2024년 5월 17일 (금) 오전 2:58, Anthony D'Atri 
mailto:anthony.da...@gmail.com>>님이 작성:
If using jumbo frames, also ensure that they're consistently enabled on all OS 
instances and network devices.

> On May 16, 2024, at 09:30, Frank Schilder mailto:fr...@dtu.dk>> 
> wrote:
>
> This is a long shot: if you are using octopus, you might be hit by this 
> pglog-dup problem: 
> https://docs.clyso.com/blog/osds-with-unlimited-ram-growth/. They don't 
> mention slow peering explicitly in the blog, but its also a consequence 
> because the up+acting OSDs need to go through the PG_log during peering.
>
> We are also using octopus and I'm not sure if we have ever seen slow ops 
> caused by peering alone. It usually happens when a disk cannot handle load 
> under peering. We have, unfortunately, disks that show random latency spikes 
> (firmware update pending). You can try to monitor OPS latencies for your 
> drives when peering and look for something that sticks out. People on this 
> list were reporting quite bad results for certain infamous NVMe brands. If 
> you state your model numbers, someone else might recognize it.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: 서민우 mailto:smw940...@gmail.com>>
> Sent: Thursday, May 16, 2024 7:39 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Please discuss about Slow Peering
>
> Env:
> - OS: Ubuntu 20.04
> - Ceph Version: Octopus 15.0.0.1
> - OSD Disk: 2.9TB NVMe
> - BlockStorage (Replication 3)
>
> Symptom:
> - Peering when OSD's node up is very slow. Peering speed varies from PG to
> PG, and some PG may even take 10 seconds. But, there is no log for 10
> seconds.
> - I checked the effect of client VM's. Actually, Slow queries of mysql
> occur at the same time.
>
> There are Ceph OSD logs of both Best and Worst.
>
> Best Peering Case (0.5 Seconds)
> 2024-04-11T15:32:44.693+0900 7f108b522700  1 osd.7 pg_epoch: 27368 pg[6.8]
> state: transitioning to Primary
> 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27371 pg[6.8]
> state: Peering, affected_by_map, going to Reset
> 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27371 pg[6.8]
> start_peering_interval up [7,6,11] -> [6,11], acting [7,6,11] -> [6,11],
> acting_primary 7 -> 6, up_primary 7 -> 6, role 0 -> -1, features acting
> 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27377 pg[6.8]
> state: transitioning to Primary
> 2024-04-11T15:32:45.165+0900 7f108f52a700  

[ceph-users] Re: Please discuss about Slow Peering

2024-05-21 Thread 서민우
We used the "kioxia kcd6xvul3t20" model.
Any infamous information of this Model?

2024년 5월 17일 (금) 오전 2:58, Anthony D'Atri 님이 작성:

> If using jumbo frames, also ensure that they're consistently enabled on
> all OS instances and network devices.
>
> > On May 16, 2024, at 09:30, Frank Schilder  wrote:
> >
> > This is a long shot: if you are using octopus, you might be hit by this
> pglog-dup problem:
> https://docs.clyso.com/blog/osds-with-unlimited-ram-growth/. They don't
> mention slow peering explicitly in the blog, but its also a consequence
> because the up+acting OSDs need to go through the PG_log during peering.
> >
> > We are also using octopus and I'm not sure if we have ever seen slow ops
> caused by peering alone. It usually happens when a disk cannot handle load
> under peering. We have, unfortunately, disks that show random latency
> spikes (firmware update pending). You can try to monitor OPS latencies for
> your drives when peering and look for something that sticks out. People on
> this list were reporting quite bad results for certain infamous NVMe
> brands. If you state your model numbers, someone else might recognize it.
> >
> > Best regards,
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: 서민우 
> > Sent: Thursday, May 16, 2024 7:39 AM
> > To: ceph-users@ceph.io
> > Subject: [ceph-users] Please discuss about Slow Peering
> >
> > Env:
> > - OS: Ubuntu 20.04
> > - Ceph Version: Octopus 15.0.0.1
> > - OSD Disk: 2.9TB NVMe
> > - BlockStorage (Replication 3)
> >
> > Symptom:
> > - Peering when OSD's node up is very slow. Peering speed varies from PG
> to
> > PG, and some PG may even take 10 seconds. But, there is no log for 10
> > seconds.
> > - I checked the effect of client VM's. Actually, Slow queries of mysql
> > occur at the same time.
> >
> > There are Ceph OSD logs of both Best and Worst.
> >
> > Best Peering Case (0.5 Seconds)
> > 2024-04-11T15:32:44.693+0900 7f108b522700  1 osd.7 pg_epoch: 27368
> pg[6.8]
> > state: transitioning to Primary
> > 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27371
> pg[6.8]
> > state: Peering, affected_by_map, going to Reset
> > 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27371
> pg[6.8]
> > start_peering_interval up [7,6,11] -> [6,11], acting [7,6,11] -> [6,11],
> > acting_primary 7 -> 6, up_primary 7 -> 6, role 0 -> -1, features acting
> > 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27377
> pg[6.8]
> > state: transitioning to Primary
> > 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27377
> pg[6.8]
> > start_peering_interval up [6,11] -> [7,6,11], acting [6,11] -> [7,6,11],
> > acting_primary 6 -> 7, up_primary 6 -> 7, role -1 -> 0, features acting
> >
> > Worst Peering Case (11.6 Seconds)
> > 2024-04-11T15:32:45.169+0900 7f108b522700  1 osd.7 pg_epoch: 27377
> pg[30.20]
> > state: transitioning to Stray
> > 2024-04-11T15:32:45.169+0900 7f108b522700  1 osd.7 pg_epoch: 27377
> pg[30.20]
> > start_peering_interval up [0,1] -> [0,7,1], acting [0,1] -> [0,7,1],
> > acting_primary 0 -> 0, up_primary 0 -> 0, role -1 -> 1, features acting
> > 2024-04-11T15:32:46.173+0900 7f108b522700  1 osd.7 pg_epoch: 27378
> pg[30.20]
> > state: transitioning to Stray
> > 2024-04-11T15:32:46.173+0900 7f108b522700  1 osd.7 pg_epoch: 27378
> pg[30.20]
> > start_peering_interval up [0,7,1] -> [0,7,1], acting [0,7,1] -> [0,1],
> > acting_primary 0 -> 0, up_primary 0 -> 0, role 1 -> -1, features acting
> > 2024-04-11T15:32:57.794+0900 7f108b522700  1 osd.7 pg_epoch: 27390
> pg[30.20]
> > state: transitioning to Stray
> > 2024-04-11T15:32:57.794+0900 7f108b522700  1 osd.7 pg_epoch: 27390
> pg[30.20]
> > start_peering_interval up [0,7,1] -> [0,7,1], acting [0,1] -> [0,7,1],
> > acting_primary 0 -> 0, up_primary 0 -> 0, role -1 -> 1, features acting
> >
> > *I wish to know about*
> > - Why some PG's take 10 seconds until Peering finishes.
> > - Why Ceph log is quiet during peering.
> > - Is this symptom intended in Ceph.
> >
> > *And please give some advice,*
> > - Is there any way to improve peering speed?
> > - Or, Is there a way to not affect the client when peering occurs?
> >
> > P.S
> > - I checked the symptoms in the following environments.
> > -> Octopus Version, Reef Version, Cephadm, Ceph-Ansible
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Please discuss about Slow Peering

2024-05-16 Thread Anthony D'Atri
If using jumbo frames, also ensure that they're consistently enabled on all OS 
instances and network devices. 

> On May 16, 2024, at 09:30, Frank Schilder  wrote:
> 
> This is a long shot: if you are using octopus, you might be hit by this 
> pglog-dup problem: 
> https://docs.clyso.com/blog/osds-with-unlimited-ram-growth/. They don't 
> mention slow peering explicitly in the blog, but its also a consequence 
> because the up+acting OSDs need to go through the PG_log during peering.
> 
> We are also using octopus and I'm not sure if we have ever seen slow ops 
> caused by peering alone. It usually happens when a disk cannot handle load 
> under peering. We have, unfortunately, disks that show random latency spikes 
> (firmware update pending). You can try to monitor OPS latencies for your 
> drives when peering and look for something that sticks out. People on this 
> list were reporting quite bad results for certain infamous NVMe brands. If 
> you state your model numbers, someone else might recognize it.
> 
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> 
> 
> From: 서민우 
> Sent: Thursday, May 16, 2024 7:39 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Please discuss about Slow Peering
> 
> Env:
> - OS: Ubuntu 20.04
> - Ceph Version: Octopus 15.0.0.1
> - OSD Disk: 2.9TB NVMe
> - BlockStorage (Replication 3)
> 
> Symptom:
> - Peering when OSD's node up is very slow. Peering speed varies from PG to
> PG, and some PG may even take 10 seconds. But, there is no log for 10
> seconds.
> - I checked the effect of client VM's. Actually, Slow queries of mysql
> occur at the same time.
> 
> There are Ceph OSD logs of both Best and Worst.
> 
> Best Peering Case (0.5 Seconds)
> 2024-04-11T15:32:44.693+0900 7f108b522700  1 osd.7 pg_epoch: 27368 pg[6.8]
> state: transitioning to Primary
> 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27371 pg[6.8]
> state: Peering, affected_by_map, going to Reset
> 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27371 pg[6.8]
> start_peering_interval up [7,6,11] -> [6,11], acting [7,6,11] -> [6,11],
> acting_primary 7 -> 6, up_primary 7 -> 6, role 0 -> -1, features acting
> 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27377 pg[6.8]
> state: transitioning to Primary
> 2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27377 pg[6.8]
> start_peering_interval up [6,11] -> [7,6,11], acting [6,11] -> [7,6,11],
> acting_primary 6 -> 7, up_primary 6 -> 7, role -1 -> 0, features acting
> 
> Worst Peering Case (11.6 Seconds)
> 2024-04-11T15:32:45.169+0900 7f108b522700  1 osd.7 pg_epoch: 27377 pg[30.20]
> state: transitioning to Stray
> 2024-04-11T15:32:45.169+0900 7f108b522700  1 osd.7 pg_epoch: 27377 pg[30.20]
> start_peering_interval up [0,1] -> [0,7,1], acting [0,1] -> [0,7,1],
> acting_primary 0 -> 0, up_primary 0 -> 0, role -1 -> 1, features acting
> 2024-04-11T15:32:46.173+0900 7f108b522700  1 osd.7 pg_epoch: 27378 pg[30.20]
> state: transitioning to Stray
> 2024-04-11T15:32:46.173+0900 7f108b522700  1 osd.7 pg_epoch: 27378 pg[30.20]
> start_peering_interval up [0,7,1] -> [0,7,1], acting [0,7,1] -> [0,1],
> acting_primary 0 -> 0, up_primary 0 -> 0, role 1 -> -1, features acting
> 2024-04-11T15:32:57.794+0900 7f108b522700  1 osd.7 pg_epoch: 27390 pg[30.20]
> state: transitioning to Stray
> 2024-04-11T15:32:57.794+0900 7f108b522700  1 osd.7 pg_epoch: 27390 pg[30.20]
> start_peering_interval up [0,7,1] -> [0,7,1], acting [0,1] -> [0,7,1],
> acting_primary 0 -> 0, up_primary 0 -> 0, role -1 -> 1, features acting
> 
> *I wish to know about*
> - Why some PG's take 10 seconds until Peering finishes.
> - Why Ceph log is quiet during peering.
> - Is this symptom intended in Ceph.
> 
> *And please give some advice,*
> - Is there any way to improve peering speed?
> - Or, Is there a way to not affect the client when peering occurs?
> 
> P.S
> - I checked the symptoms in the following environments.
> -> Octopus Version, Reef Version, Cephadm, Ceph-Ansible
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Please discuss about Slow Peering

2024-05-16 Thread Frank Schilder
This is a long shot: if you are using octopus, you might be hit by this 
pglog-dup problem: https://docs.clyso.com/blog/osds-with-unlimited-ram-growth/. 
They don't mention slow peering explicitly in the blog, but its also a 
consequence because the up+acting OSDs need to go through the PG_log during 
peering.

We are also using octopus and I'm not sure if we have ever seen slow ops caused 
by peering alone. It usually happens when a disk cannot handle load under 
peering. We have, unfortunately, disks that show random latency spikes 
(firmware update pending). You can try to monitor OPS latencies for your drives 
when peering and look for something that sticks out. People on this list were 
reporting quite bad results for certain infamous NVMe brands. If you state your 
model numbers, someone else might recognize it.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: 서민우 
Sent: Thursday, May 16, 2024 7:39 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Please discuss about Slow Peering

Env:
- OS: Ubuntu 20.04
- Ceph Version: Octopus 15.0.0.1
- OSD Disk: 2.9TB NVMe
- BlockStorage (Replication 3)

Symptom:
- Peering when OSD's node up is very slow. Peering speed varies from PG to
PG, and some PG may even take 10 seconds. But, there is no log for 10
seconds.
- I checked the effect of client VM's. Actually, Slow queries of mysql
occur at the same time.

There are Ceph OSD logs of both Best and Worst.

Best Peering Case (0.5 Seconds)
2024-04-11T15:32:44.693+0900 7f108b522700  1 osd.7 pg_epoch: 27368 pg[6.8]
state: transitioning to Primary
2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27371 pg[6.8]
state: Peering, affected_by_map, going to Reset
2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27371 pg[6.8]
start_peering_interval up [7,6,11] -> [6,11], acting [7,6,11] -> [6,11],
acting_primary 7 -> 6, up_primary 7 -> 6, role 0 -> -1, features acting
2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27377 pg[6.8]
state: transitioning to Primary
2024-04-11T15:32:45.165+0900 7f108f52a700  1 osd.7 pg_epoch: 27377 pg[6.8]
start_peering_interval up [6,11] -> [7,6,11], acting [6,11] -> [7,6,11],
acting_primary 6 -> 7, up_primary 6 -> 7, role -1 -> 0, features acting

Worst Peering Case (11.6 Seconds)
2024-04-11T15:32:45.169+0900 7f108b522700  1 osd.7 pg_epoch: 27377 pg[30.20]
state: transitioning to Stray
2024-04-11T15:32:45.169+0900 7f108b522700  1 osd.7 pg_epoch: 27377 pg[30.20]
start_peering_interval up [0,1] -> [0,7,1], acting [0,1] -> [0,7,1],
acting_primary 0 -> 0, up_primary 0 -> 0, role -1 -> 1, features acting
2024-04-11T15:32:46.173+0900 7f108b522700  1 osd.7 pg_epoch: 27378 pg[30.20]
state: transitioning to Stray
2024-04-11T15:32:46.173+0900 7f108b522700  1 osd.7 pg_epoch: 27378 pg[30.20]
start_peering_interval up [0,7,1] -> [0,7,1], acting [0,7,1] -> [0,1],
acting_primary 0 -> 0, up_primary 0 -> 0, role 1 -> -1, features acting
2024-04-11T15:32:57.794+0900 7f108b522700  1 osd.7 pg_epoch: 27390 pg[30.20]
state: transitioning to Stray
2024-04-11T15:32:57.794+0900 7f108b522700  1 osd.7 pg_epoch: 27390 pg[30.20]
start_peering_interval up [0,7,1] -> [0,7,1], acting [0,1] -> [0,7,1],
acting_primary 0 -> 0, up_primary 0 -> 0, role -1 -> 1, features acting

*I wish to know about*
- Why some PG's take 10 seconds until Peering finishes.
- Why Ceph log is quiet during peering.
- Is this symptom intended in Ceph.

*And please give some advice,*
- Is there any way to improve peering speed?
- Or, Is there a way to not affect the client when peering occurs?

P.S
- I checked the symptoms in the following environments.
-> Octopus Version, Reef Version, Cephadm, Ceph-Ansible
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io