[ceph-users] Re: Ceph EC PG calculation

Szabo, Istvan (Agoda) Wed, 18 Nov 2020 04:23:02 -0800

Hi,

Thank you Frank.


And after how this affect the non EC pools? Because they will use the same 
device classes, which is SSD.
So I'd calculate with 100PG/osd, because this will grow.
If I calculate with EC it will be 512. But still have many replicated pools 😊

Or just let the autoscaler in warn and do when it instruct.

To be honest I just want to be sure my setup is correct or I miss something or 
did something wrong.


-----Original Message-----
From: Frank Schilder <fr...@dtu.dk> 
Sent: Wednesday, November 18, 2020 3:11 PM
To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com>; ceph-users@ceph.io
Subject: Re: Ceph EC PG calculation

Email received from outside the company. If in doubt don't click links nor open 
attachments!
________________________________

Roughly speaking, if you have N OSDs, a replication factor of R and aim for P 
PGs/OSD on average, you can assign (N*P)/R PGs to the pool.

Example: 4+2 EC has replication 6. There are 36 OSDs. If you want to place, 
say,  50 PGs per OSD, you can assign

(36*50)/6=300 PGs

to the EC pool. You may pick a close power of 2 if you wish and then calculate 
how many PGs will be placed on each OSD on average. For example, we choose 256 
PGs, then

256*6/36 = 42.7 PGs per OSD will be added.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com>
Sent: 18 November 2020 04:58:38
To: ceph-users@ceph.io
Subject: [ceph-users] Ceph EC PG calculation

Hi,

I have this error:
I have 36 osd and get this:
Error ERANGE:  pg_num 4096 size 6 would mean 25011 total pgs, which exceeds max 
10500 (mon_max_pg_per_osd 250 * num_in_osds 42)

If I want to calculate the max pg in my server, how it works if I have EC pool?

I have 4:2 data EC pool, and the others are replicated.

These are the pools:
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 2 
object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode warn last_change 597 
flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth pool 
2 '.rgw.root' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins 
pg_num 32 pgp_num 32 autoscale_mode warn last_change 598 flags hashpspool 
stripe_width 0 application rgw pool 6 'sin.rgw.log' replicated size 3 min_size 
2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn 
last_change 599 flags hashpspool stripe_width 0 application rgw pool 7 
'sin.rgw.control' replicated size 3 min_size 2 crush_rule 2 object_hash 
rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 600 flags 
hashpspool stripe_width 0 application rgw pool 8 'sin.rgw.meta' replicated size 
3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 
autoscale_mode warn last_change 601 lfor 0/393/391 flags hashpspool 
stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw pool 10 
'sin.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 1 object_hash 
rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 602 lfor 0/529/527 
flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application 
rgw pool 11 'sin.rgw.buckets.data.old' replicated size 3 min_size 2 crush_rule 
0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 603 
flags hashpspool stripe_width 0 application rgw pool 12 'sin.rgw.buckets.data' 
erasure profile data-ec size 6 min_size 5 crush_rule 3 object_hash rjenkins 
pg_num 32 pgp_num 32 autoscale_mode warn last_change 604 flags 
hashpspool,ec_overwrites stripe_width 16384 application rgw

So how I can calculate the pgs?

This is my osd tree:
ID   CLASS  WEIGHT     TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
-1         534.38354  root default
-5          89.06392      host cephosd-6s01
36   nvme    1.74660          osd.36                up   1.00000  1.00000
  0    ssd   14.55289          osd.0                 up   1.00000  1.00000
  8    ssd   14.55289          osd.8                 up   1.00000  1.00000
15    ssd   14.55289          osd.15                up   1.00000  1.00000
18    ssd   14.55289          osd.18                up   1.00000  1.00000
24    ssd   14.55289          osd.24                up   1.00000  1.00000
30    ssd   14.55289          osd.30                up   1.00000  1.00000
-3          89.06392      host cephosd-6s02
37   nvme    1.74660          osd.37                up   1.00000  1.00000
  1    ssd   14.55289          osd.1                 up   1.00000  1.00000
11    ssd   14.55289          osd.11                up   1.00000  1.00000
17    ssd   14.55289          osd.17                up   1.00000  1.00000
23    ssd   14.55289          osd.23                up   1.00000  1.00000
28    ssd   14.55289          osd.28                up   1.00000  1.00000
35    ssd   14.55289          osd.35                up   1.00000  1.00000
-11          89.06392      host cephosd-6s03
41   nvme    1.74660          osd.41                up   1.00000  1.00000
  2    ssd   14.55289          osd.2                 up   1.00000  1.00000
  6    ssd   14.55289          osd.6                 up   1.00000  1.00000
13    ssd   14.55289          osd.13                up   1.00000  1.00000
19    ssd   14.55289          osd.19                up   1.00000  1.00000
26    ssd   14.55289          osd.26                up   1.00000  1.00000
32    ssd   14.55289          osd.32                up   1.00000  1.00000
-13          89.06392      host cephosd-6s04
38   nvme    1.74660          osd.38                up   1.00000  1.00000
  5    ssd   14.55289          osd.5                 up   1.00000  1.00000
  7    ssd   14.55289          osd.7                 up   1.00000  1.00000
14    ssd   14.55289          osd.14                up   1.00000  1.00000
20    ssd   14.55289          osd.20                up   1.00000  1.00000
25    ssd   14.55289          osd.25                up   1.00000  1.00000
31    ssd   14.55289          osd.31                up   1.00000  1.00000
-9          89.06392      host cephosd-6s05
40   nvme    1.74660          osd.40                up   1.00000  1.00000
  3    ssd   14.55289          osd.3                 up   1.00000  1.00000
10    ssd   14.55289          osd.10                up   1.00000  1.00000
12    ssd   14.55289          osd.12                up   1.00000  1.00000
21    ssd   14.55289          osd.21                up   1.00000  1.00000
29    ssd   14.55289          osd.29                up   1.00000  1.00000
33    ssd   14.55289          osd.33                up   1.00000  1.00000
-7          89.06392      host cephosd-6s06
39   nvme    1.74660          osd.39                up   1.00000  1.00000
  4    ssd   14.55289          osd.4                 up   1.00000  1.00000
  9    ssd   14.55289          osd.9                 up   1.00000  1.00000
16    ssd   14.55289          osd.16                up   1.00000  1.00000
22    ssd   14.55289          osd.22                up   1.00000  1.00000
27    ssd   14.55289          osd.27                up   1.00000  1.00000
34    ssd   14.55289          osd.34                up   1.00000  1.00000

This is the crush rules:
[
    {
        "rule_id": 0,
        "rule_name": "replicated_rule",
        "ruleset": 0,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": "default"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 1,
        "rule_name": "replicated_nvme",
        "ruleset": 1,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -21,
                "item_name": "default~nvme"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 2,
        "rule_name": "replicated_ssd",
        "ruleset": 2,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -2,
                "item_name": "default~ssd"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 3,
        "rule_name": "sin.rgw.buckets.data.new",
        "ruleset": 3,
        "type": 3,
        "min_size": 3,
        "max_size": 6,
        "steps": [
            {
                "op": "set_chooseleaf_tries",
                "num": 5
            },
            {
                "op": "set_choose_tries",
                "num": 100
            },
            {
                "op": "take",
                "item": -2,
                "item_name": "default~ssd"
            },
            {
                "op": "chooseleaf_indep",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    }
]

So everything else rather than the data pool are on SSD and nvme with replica 3.
If I calculate the pg in the ec like 36osd*100/6=600 which means the max pg in 
the EC pool is 512?
But how this affect the SSD replica pools then?

This is the EC pool definition:
crush-device-class=ssd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=2
plugin=jerasure
technique=reed_sol_van
w=8

Thank you in advance.

________________________________
This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph EC PG calculation

Reply via email to