[ceph-users] Re: What is a pgmap?
Here's what I learned about PG maps from my investigation of the code. First, they don't seem to be involved in deciding what needs reconstruction when a dead OSD is revived. There is a version number stored with the PGs that is probably used for that. It looks like nothing but statistics - the ones you see in a 'ceph status' (or more specifically, 'ceph pg stat' report - and I don't think those statistics affect any automatic operation. The PG map gets updated (version incremented) mainly when an OSD sends those statistics to the monitor cluster. Each OSD sends a statistics report every 6 seconds (default - it's the osd_heartbeat_interval configuration variable) to a monitor. If those statistics differ at all from the previous report, the monitor generates a new PG map. Because the stats include I/O rates, they do tend to be different every time. But there is a limit of one update per second (default - it's the 'paxos_propose_interval' configuration variable) on updates to any of the maps in the monitor database, so on any normal size system, you'll see updates once a second. -- Bryan Henderson San Jose, California ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: What is a pgmap?
>I thought it was a method (the method?) to know if a PG comes back from a >crashed OSD/host, to know if it was up-to-date or old since it would have >an older timestamp. Thanks. That's a reasonable theory. Maybe I'll look in the code and see if I can confirm it. And it means on my cluster, once an hour would probably be sufficient. >I was sure it was updated exactly once per second. Because there's an infamous cluster log message every time the pgmap updates, I know for me it is about 10 times a minute, in a pattern that is neither periodic nor random. Maybe once per second is the maximum frequency and it depends upon how frequently PGs are written to. -- Bryan Henderson San Jose, California ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: What is a pgmap?
Unfortunately, my e-mail client does not collect threads properly. Think I got my answer. Form Janne Johansson: > Since using computer time and date is fraught with peril, having the whole > cluster just bump that single number every second (and writing it to the PG > on each write) would allow a mostly idle PG that comes back after an hour > of unexpected downtime to easily know if it needs no recovery, a little bit > of delta to get up-to-date or a full copy from the primary in order to > become a part of the replica set for that PG. So an increase every second is expected. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 14 May 2020 12:37 To: Nghia Viet Tran; Bryan Henderson; Ceph users mailing list Subject: [ceph-users] Re: What is a pgmap? Hi, I also observe an increase in pgmap version every second or so, see snippet below. I run mimic 13.2.8 without any PG scaling/upmapping. Why does the version increase so often? May 14 12:33:50 ceph-03 journal: cluster 2020-05-14 12:33:48.521546 mgr.ceph-02 mgr.27460080 192.168.32.66:0/63 114833 : cluster [DBG] pgmap v114860: 2545 pgs: 2 active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 1.5 PiB / 1.8 PiB avail; 4.8 MiB/s rd, 11 MiB/s wr, 1.48 kop/s May 14 12:33:50 ceph-02 journal: 2020-05-14 12:33:50.543 7fdb57c5b700 0 log_channel(cluster) log [DBG] : pgmap v114861: 2545 pgs: 2 active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 1.5 PiB / 1.8 PiB avail; 5.6 MiB/s rd, 11 MiB/s wr, 1.21 kop/s May 14 12:33:52 ceph-02 journal: 2020-05-14 12:33:52.565 7fdb57c5b700 0 log_channel(cluster) log [DBG] : pgmap v114862: 2545 pgs: 2 active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 1.5 PiB / 1.8 PiB avail; 8.9 MiB/s rd, 16 MiB/s wr, 1.59 kop/s The version increases every second, here from pgmap v114860 to pgmap v114862. Current cluster status: [root@gnosis]# ceph status cluster: id: --- health: HEALTH_OK services: mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 mgr: ceph-02(active), standbys: ceph-01, ceph-03 mds: con-fs2-1/1/1 up {0=ceph-08=up:active}, 1 up:standby-replay osd: 288 osds: 268 up, 268 in data: pools: 10 pools, 2545 pgs objects: 80.80 M objects, 195 TiB usage: 249 TiB used, 1.5 PiB / 1.8 PiB avail pgs: 2543 active+clean 2active+clean+scrubbing+deep io: client: 20 MiB/s rd, 21 MiB/s wr, 578 op/s rd, 1.08 kop/s wr Thanks for any info! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nghia Viet Tran Sent: 14 May 2020 03:49:38 To: Bryan Henderson; Ceph users mailing list Subject: [ceph-users] Re: What is a pgmap? If your Ceph cluster are running on the latest version of Ceph then the the pg_autoscaler probably is the reason. After the period of time, Ceph will check the cluster status and increase/decrease the number of PG in the cluster if needed. On 5/14/20, 03:37, "Bryan Henderson" wrote: I'm surprised I couldn't find this explained anywhere (I did look), but ... What is the pgmap and why does it get updated every few seconds on a tiny cluster that's mostly idle? I do know what a placement group (PG) is and that when documentation talks about placement group maps, it is talking about something else -- mapping of PGs to OSDs by CRUSH and OSD maps. -- Bryan Henderson San Jose, California ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: What is a pgmap?
Hi, I also observe an increase in pgmap version every second or so, see snippet below. I run mimic 13.2.8 without any PG scaling/upmapping. Why does the version increase so often? May 14 12:33:50 ceph-03 journal: cluster 2020-05-14 12:33:48.521546 mgr.ceph-02 mgr.27460080 192.168.32.66:0/63 114833 : cluster [DBG] pgmap v114860: 2545 pgs: 2 active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 1.5 PiB / 1.8 PiB avail; 4.8 MiB/s rd, 11 MiB/s wr, 1.48 kop/s May 14 12:33:50 ceph-02 journal: 2020-05-14 12:33:50.543 7fdb57c5b700 0 log_channel(cluster) log [DBG] : pgmap v114861: 2545 pgs: 2 active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 1.5 PiB / 1.8 PiB avail; 5.6 MiB/s rd, 11 MiB/s wr, 1.21 kop/s May 14 12:33:52 ceph-02 journal: 2020-05-14 12:33:52.565 7fdb57c5b700 0 log_channel(cluster) log [DBG] : pgmap v114862: 2545 pgs: 2 active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 1.5 PiB / 1.8 PiB avail; 8.9 MiB/s rd, 16 MiB/s wr, 1.59 kop/s The version increases every second, here from pgmap v114860 to pgmap v114862. Current cluster status: [root@gnosis]# ceph status cluster: id: --- health: HEALTH_OK services: mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 mgr: ceph-02(active), standbys: ceph-01, ceph-03 mds: con-fs2-1/1/1 up {0=ceph-08=up:active}, 1 up:standby-replay osd: 288 osds: 268 up, 268 in data: pools: 10 pools, 2545 pgs objects: 80.80 M objects, 195 TiB usage: 249 TiB used, 1.5 PiB / 1.8 PiB avail pgs: 2543 active+clean 2active+clean+scrubbing+deep io: client: 20 MiB/s rd, 21 MiB/s wr, 578 op/s rd, 1.08 kop/s wr Thanks for any info! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nghia Viet Tran Sent: 14 May 2020 03:49:38 To: Bryan Henderson; Ceph users mailing list Subject: [ceph-users] Re: What is a pgmap? If your Ceph cluster are running on the latest version of Ceph then the the pg_autoscaler probably is the reason. After the period of time, Ceph will check the cluster status and increase/decrease the number of PG in the cluster if needed. On 5/14/20, 03:37, "Bryan Henderson" wrote: I'm surprised I couldn't find this explained anywhere (I did look), but ... What is the pgmap and why does it get updated every few seconds on a tiny cluster that's mostly idle? I do know what a placement group (PG) is and that when documentation talks about placement group maps, it is talking about something else -- mapping of PGs to OSDs by CRUSH and OSD maps. -- Bryan Henderson San Jose, California ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: What is a pgmap?
Den ons 13 maj 2020 kl 22:37 skrev Bryan Henderson : > I'm surprised I couldn't find this explained anywhere (I did look), but ... > What is the pgmap and why does it get updated every few seconds on a tiny > cluster that's mostly idle? > > I was sure it was updated exactly once per second. > I do know what a placement group (PG) is and that when documentation talks > about placement group maps, it is talking about something else -- mapping > of > PGs to OSDs by CRUSH and OSD maps. > I thought it was a method (the method?) to know if a PG comes back from a crashed OSD/host, to know if it was up-to-date or old since it would have an older timestamp. Since using computer time and date is fraught with peril, having the whole cluster just bump that single number every second (and writing it to the PG on each write) would allow a mostly idle PG that comes back after an hour of unexpected downtime to easily know if it needs no recovery, a little bit of delta to get up-to-date or a full copy from the primary in order to become a part of the replica set for that PG. -- May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io