[ceph-users] Ceph EC PG calculation

Szabo, Istvan (Agoda) Tue, 17 Nov 2020 23:59:10 -0800

Hi,

I have this error:
I have 36 osd and get this:
Error ERANGE:  pg_num 4096 size 6 would mean 25011 total pgs, which exceeds max 
10500 (mon_max_pg_per_osd 250 * num_in_osds 42)


If I want to calculate the max pg in my server, how it works if I have EC pool?

I have 4:2 data EC pool, and the others are replicated.

These are the pools:
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 2 
object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode warn last_change 597 
flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 2 object_hash 
rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 598 flags 
hashpspool stripe_width 0 application rgw
pool 6 'sin.rgw.log' replicated size 3 min_size 2 crush_rule 2 object_hash 
rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 599 flags 
hashpspool stripe_width 0 application rgw
pool 7 'sin.rgw.control' replicated size 3 min_size 2 crush_rule 2 object_hash 
rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 600 flags 
hashpspool stripe_width 0 application rgw
pool 8 'sin.rgw.meta' replicated size 3 min_size 2 crush_rule 1 object_hash 
rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 601 lfor 0/393/391 
flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw
pool 10 'sin.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 1 
object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 602 
lfor 0/529/527 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 
application rgw
pool 11 'sin.rgw.buckets.data.old' replicated size 3 min_size 2 crush_rule 0 
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 603 
flags hashpspool stripe_width 0 application rgw
pool 12 'sin.rgw.buckets.data' erasure profile data-ec size 6 min_size 5 
crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn 
last_change 604 flags hashpspool,ec_overwrites stripe_width 16384 application 
rgw

So how I can calculate the pgs?

This is my osd tree:
ID   CLASS  WEIGHT     TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
-1         534.38354  root default
-5          89.06392      host cephosd-6s01
36   nvme    1.74660          osd.36                up   1.00000  1.00000
  0    ssd   14.55289          osd.0                 up   1.00000  1.00000
  8    ssd   14.55289          osd.8                 up   1.00000  1.00000
15    ssd   14.55289          osd.15                up   1.00000  1.00000
18    ssd   14.55289          osd.18                up   1.00000  1.00000
24    ssd   14.55289          osd.24                up   1.00000  1.00000
30    ssd   14.55289          osd.30                up   1.00000  1.00000
-3          89.06392      host cephosd-6s02
37   nvme    1.74660          osd.37                up   1.00000  1.00000
  1    ssd   14.55289          osd.1                 up   1.00000  1.00000
11    ssd   14.55289          osd.11                up   1.00000  1.00000
17    ssd   14.55289          osd.17                up   1.00000  1.00000
23    ssd   14.55289          osd.23                up   1.00000  1.00000
28    ssd   14.55289          osd.28                up   1.00000  1.00000
35    ssd   14.55289          osd.35                up   1.00000  1.00000
-11          89.06392      host cephosd-6s03
41   nvme    1.74660          osd.41                up   1.00000  1.00000
  2    ssd   14.55289          osd.2                 up   1.00000  1.00000
  6    ssd   14.55289          osd.6                 up   1.00000  1.00000
13    ssd   14.55289          osd.13                up   1.00000  1.00000
19    ssd   14.55289          osd.19                up   1.00000  1.00000
26    ssd   14.55289          osd.26                up   1.00000  1.00000
32    ssd   14.55289          osd.32                up   1.00000  1.00000
-13          89.06392      host cephosd-6s04
38   nvme    1.74660          osd.38                up   1.00000  1.00000
  5    ssd   14.55289          osd.5                 up   1.00000  1.00000
  7    ssd   14.55289          osd.7                 up   1.00000  1.00000
14    ssd   14.55289          osd.14                up   1.00000  1.00000
20    ssd   14.55289          osd.20                up   1.00000  1.00000
25    ssd   14.55289          osd.25                up   1.00000  1.00000
31    ssd   14.55289          osd.31                up   1.00000  1.00000
-9          89.06392      host cephosd-6s05
40   nvme    1.74660          osd.40                up   1.00000  1.00000
  3    ssd   14.55289          osd.3                 up   1.00000  1.00000
10    ssd   14.55289          osd.10                up   1.00000  1.00000
12    ssd   14.55289          osd.12                up   1.00000  1.00000
21    ssd   14.55289          osd.21                up   1.00000  1.00000
29    ssd   14.55289          osd.29                up   1.00000  1.00000
33    ssd   14.55289          osd.33                up   1.00000  1.00000
-7          89.06392      host cephosd-6s06
39   nvme    1.74660          osd.39                up   1.00000  1.00000
  4    ssd   14.55289          osd.4                 up   1.00000  1.00000
  9    ssd   14.55289          osd.9                 up   1.00000  1.00000
16    ssd   14.55289          osd.16                up   1.00000  1.00000
22    ssd   14.55289          osd.22                up   1.00000  1.00000
27    ssd   14.55289          osd.27                up   1.00000  1.00000
34    ssd   14.55289          osd.34                up   1.00000  1.00000

This is the crush rules:
[
    {
        "rule_id": 0,
        "rule_name": "replicated_rule",
        "ruleset": 0,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": "default"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 1,
        "rule_name": "replicated_nvme",
        "ruleset": 1,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -21,
                "item_name": "default~nvme"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 2,
        "rule_name": "replicated_ssd",
        "ruleset": 2,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -2,
                "item_name": "default~ssd"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 3,
        "rule_name": "sin.rgw.buckets.data.new",
        "ruleset": 3,
        "type": 3,
        "min_size": 3,
        "max_size": 6,
        "steps": [
            {
                "op": "set_chooseleaf_tries",
                "num": 5
            },
            {
                "op": "set_choose_tries",
                "num": 100
            },
            {
                "op": "take",
                "item": -2,
                "item_name": "default~ssd"
            },
            {
                "op": "chooseleaf_indep",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    }
]

So everything else rather than the data pool are on SSD and nvme with replica 3.
If I calculate the pg in the ec like 36osd*100/6=600 which means the max pg in 
the EC pool is 512?
But how this affect the SSD replica pools then?

This is the EC pool definition:
crush-device-class=ssd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=2
plugin=jerasure
technique=reed_sol_van
w=8

Thank you in advance.

________________________________
This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Ceph EC PG calculation

Reply via email to