[ceph-users] Re: What is a pgmap?

2020-05-19 Thread Bryan Henderson
Here's what I learned about PG maps from my investigation of the code.

First, they don't seem to be involved in deciding what needs reconstruction
when a dead OSD is revived.  There is a version number stored with the PGs
that is probably used for that.

It looks like nothing but statistics - the ones you see in a 'ceph status' (or
more specifically, 'ceph pg stat' report - and I don't think those statistics
affect any automatic operation.

The PG map gets updated (version incremented) mainly when an OSD sends those
statistics to the monitor cluster.  Each OSD sends a statistics report every 6
seconds (default - it's the osd_heartbeat_interval configuration variable) to
a monitor.  If those statistics differ at all from the previous report, the
monitor generates a new PG map.  Because the stats include I/O rates, they
do tend to be different every time.

But there is a limit of one update per second (default - it's the
'paxos_propose_interval' configuration variable) on updates to any of the maps
in the monitor database, so on any normal size system, you'll see updates once
a second.

-- 
Bryan Henderson   San Jose, California
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What is a pgmap?

2020-05-16 Thread Bryan Henderson
>I thought it was a method (the method?) to know if a PG comes back from a
>crashed OSD/host, to know if it was up-to-date or old since it would have
>an older timestamp.

Thanks.  That's a reasonable theory.  Maybe I'll look in the code and see if
I can confirm it.

And it means on my cluster, once an hour would probably be sufficient.

>I was sure it was updated exactly once per second.

Because there's an infamous cluster log message every time the pgmap updates,
I know for me it is about 10 times a minute, in a pattern that is neither
periodic nor random.  Maybe once per second is the maximum frequency and it
depends upon how frequently PGs are written to.

-- 
Bryan Henderson   San Jose, California
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What is a pgmap?

2020-05-14 Thread Frank Schilder
Unfortunately, my e-mail client does not collect threads properly.

Think I got my answer.

Form Janne Johansson:
> Since using computer time and date is fraught with peril, having the whole
> cluster just bump that single number every second (and writing it to the PG
> on each write) would allow a mostly idle PG that comes back after an hour
> of unexpected downtime to easily know if it needs no recovery, a little bit
> of delta to get up-to-date or a full copy from the primary in order to
> become a part of the replica set for that PG.

So an increase every second is expected.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: 14 May 2020 12:37
To: Nghia Viet Tran; Bryan Henderson; Ceph users mailing list
Subject: [ceph-users] Re: What is a pgmap?

Hi, I also observe an increase in pgmap version every second or so, see snippet 
below. I run mimic 13.2.8 without any PG scaling/upmapping. Why does the 
version increase so often?

May 14 12:33:50 ceph-03 journal: cluster 2020-05-14 12:33:48.521546 mgr.ceph-02 
mgr.27460080 192.168.32.66:0/63 114833 : cluster [DBG] pgmap v114860: 2545 pgs: 
2 active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 
1.5 PiB / 1.8 PiB avail; 4.8 MiB/s rd, 11 MiB/s wr, 1.48 kop/s

May 14 12:33:50 ceph-02 journal: 2020-05-14 12:33:50.543 7fdb57c5b700  0 
log_channel(cluster) log [DBG] : pgmap v114861: 2545 pgs: 2 
active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 1.5 
PiB / 1.8 PiB avail; 5.6 MiB/s rd, 11 MiB/s wr, 1.21 kop/s

May 14 12:33:52 ceph-02 journal: 2020-05-14 12:33:52.565 7fdb57c5b700  0 
log_channel(cluster) log [DBG] : pgmap v114862: 2545 pgs: 2 
active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 1.5 
PiB / 1.8 PiB avail; 8.9 MiB/s rd, 16 MiB/s wr, 1.59 kop/s

The version increases every second, here from pgmap v114860 to  pgmap v114862. 
Current cluster status:

[root@gnosis]# ceph status
  cluster:
id: ---
health: HEALTH_OK

  services:
mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
mgr: ceph-02(active), standbys: ceph-01, ceph-03
mds: con-fs2-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby-replay
osd: 288 osds: 268 up, 268 in

  data:
pools:   10 pools, 2545 pgs
objects: 80.80 M objects, 195 TiB
usage:   249 TiB used, 1.5 PiB / 1.8 PiB avail
pgs: 2543 active+clean
 2active+clean+scrubbing+deep

  io:
client:   20 MiB/s rd, 21 MiB/s wr, 578 op/s rd, 1.08 kop/s wr

Thanks for any info!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Nghia Viet Tran 
Sent: 14 May 2020 03:49:38
To: Bryan Henderson; Ceph users mailing list
Subject: [ceph-users] Re: What is a pgmap?

If your Ceph cluster are running on the latest version of Ceph then the the 
pg_autoscaler probably  is the reason. After the period of time, Ceph will 
check the cluster status and increase/decrease the number of PG in the cluster 
if needed.

On 5/14/20, 03:37, "Bryan Henderson"  wrote:

I'm surprised I couldn't find this explained anywhere (I did look), but ...

What is the pgmap and why does it get updated every few seconds on a tiny
cluster that's mostly idle?

I do know what a placement group (PG) is and that when documentation talks
about placement group maps, it is talking about something else -- mapping of
PGs to OSDs by CRUSH and OSD maps.

--
Bryan Henderson   San Jose, California
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What is a pgmap?

2020-05-14 Thread Frank Schilder
Hi, I also observe an increase in pgmap version every second or so, see snippet 
below. I run mimic 13.2.8 without any PG scaling/upmapping. Why does the 
version increase so often?

May 14 12:33:50 ceph-03 journal: cluster 2020-05-14 12:33:48.521546 mgr.ceph-02 
mgr.27460080 192.168.32.66:0/63 114833 : cluster [DBG] pgmap v114860: 2545 pgs: 
2 active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 
1.5 PiB / 1.8 PiB avail; 4.8 MiB/s rd, 11 MiB/s wr, 1.48 kop/s

May 14 12:33:50 ceph-02 journal: 2020-05-14 12:33:50.543 7fdb57c5b700  0 
log_channel(cluster) log [DBG] : pgmap v114861: 2545 pgs: 2 
active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 1.5 
PiB / 1.8 PiB avail; 5.6 MiB/s rd, 11 MiB/s wr, 1.21 kop/s

May 14 12:33:52 ceph-02 journal: 2020-05-14 12:33:52.565 7fdb57c5b700  0 
log_channel(cluster) log [DBG] : pgmap v114862: 2545 pgs: 2 
active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 1.5 
PiB / 1.8 PiB avail; 8.9 MiB/s rd, 16 MiB/s wr, 1.59 kop/s

The version increases every second, here from pgmap v114860 to  pgmap v114862. 
Current cluster status:

[root@gnosis]# ceph status
  cluster:
id: ---
health: HEALTH_OK

  services:
mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
mgr: ceph-02(active), standbys: ceph-01, ceph-03
mds: con-fs2-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby-replay
osd: 288 osds: 268 up, 268 in

  data:
pools:   10 pools, 2545 pgs
objects: 80.80 M objects, 195 TiB
usage:   249 TiB used, 1.5 PiB / 1.8 PiB avail
pgs: 2543 active+clean
 2active+clean+scrubbing+deep

  io:
client:   20 MiB/s rd, 21 MiB/s wr, 578 op/s rd, 1.08 kop/s wr

Thanks for any info!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Nghia Viet Tran 
Sent: 14 May 2020 03:49:38
To: Bryan Henderson; Ceph users mailing list
Subject: [ceph-users] Re: What is a pgmap?

If your Ceph cluster are running on the latest version of Ceph then the the 
pg_autoscaler probably  is the reason. After the period of time, Ceph will 
check the cluster status and increase/decrease the number of PG in the cluster 
if needed.

On 5/14/20, 03:37, "Bryan Henderson"  wrote:

I'm surprised I couldn't find this explained anywhere (I did look), but ...

What is the pgmap and why does it get updated every few seconds on a tiny
cluster that's mostly idle?

I do know what a placement group (PG) is and that when documentation talks
about placement group maps, it is talking about something else -- mapping of
PGs to OSDs by CRUSH and OSD maps.

--
Bryan Henderson   San Jose, California
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What is a pgmap?

2020-05-14 Thread Janne Johansson
Den ons 13 maj 2020 kl 22:37 skrev Bryan Henderson :

> I'm surprised I couldn't find this explained anywhere (I did look), but ...
> What is the pgmap and why does it get updated every few seconds on a tiny
> cluster that's mostly idle?
>
>
I was sure it was updated exactly once per second.


> I do know what a placement group (PG) is and that when documentation talks
> about placement group maps, it is talking about something else -- mapping
> of
> PGs to OSDs by CRUSH and OSD maps.
>

I thought it was a method (the method?) to know if a PG comes back from a
crashed OSD/host, to know if it was up-to-date or old since it would have
an older timestamp.

Since using computer time and date is fraught with peril, having the whole
cluster just bump that single number every second (and writing it to the PG
on each write) would allow a mostly idle PG that comes back after an hour
of unexpected downtime to easily know if it needs no recovery, a little bit
of delta to get up-to-date or a full copy from the primary in order to
become a part of the replica set for that PG.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io