Without having looked too closely yet, you might need "chooseleaf"
when it's supposed to select OSDs, this is just an example from one of
my test nodes:
step take default
step choose indep 0 type host
step chooseleaf indep 2 type osd
step emit
Zitat von Denis Polom <[email protected]>:
Hi folks,
on our Ceph v19.2.1 distributed across 4 failure domains we have a
EC pool using following crush rule:
{
"rule_id": 4,
"rule_name": "ec53",
"type": 3,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "choose_indep",
"num": 4,
"type": "tor"
},
{
"op": "choose_indep",
"num": 2,
"type": "osd"
},
{
"op": "emit"
}
]
}
and following EC profile
crush-device-class=
crush-failure-domain=tor
crush-root=default
jerasure-per-chunk-alignment=false
k=5
m=3
plugin=jerasure
technique=reed_sol_van
w=8
here is our crush tree:
ID CLASS WEIGHT TYPE NAME
-40 15.71931 root metadata
-41 3.49318 tor s1-bos-e-10-ng.metadata
-42 1.74660 host osd1-metadata
98 ssd 1.74660 osd.98
-49 1.74660 host osd2-metadata
99 ssd 1.74660 osd.99
-124 3.49318 tor s1-bos-f-7-ng.metadata
-52 1.74660 host osd3-metadata
103 ssd 1.74660 osd.103
-76 1.74660 host osd9-metadata
106 ssd 1.74660 osd.106
-56 5.23978 tor s1-bos-f-8-ng.metadata
-55 1.74660 host osd4-metadata
85 ssd 1.74660 osd.85
-61 1.74660 host osd5-metadata
93 ssd 1.74660 osd.93
-64 1.74660 host osd6-metadata
97 ssd 1.74660 osd.97
-67 3.49318 tor s1-bos-f-9-ng.metadata
-70 1.74660 host osd7-metadata
104 ssd 1.74660 osd.104
-73 1.74660 host osd8-metadata
105 ssd 1.74660 osd.105
-1 2733.78516 root default
-21 745.57776 tor s1-bos-e-10-ng
-7 80.40544 host osd1
0 hdd 7.30959 osd.0
10 hdd 7.30959 osd.10
26 hdd 7.30959 osd.26
28 hdd 7.30959 osd.28
44 hdd 7.30959 osd.44
53 hdd 7.30959 osd.53
62 hdd 7.30959 osd.62
70 hdd 7.30959 osd.70
79 hdd 7.30959 osd.79
89 hdd 7.30959 osd.89
107 hdd 7.30959 osd.107
-88 116.95337 host osd12
128 hdd 7.30959 osd.128
139 hdd 7.30959 osd.139
142 hdd 7.30959 osd.142
146 hdd 7.30959 osd.146
150 hdd 7.30959 osd.150
152 hdd 7.30959 osd.152
153 hdd 7.30959 osd.153
154 hdd 7.30959 osd.154
191 hdd 7.30959 osd.191
192 hdd 7.30959 osd.192
193 hdd 7.30959 osd.193
194 hdd 7.30959 osd.194
195 hdd 7.30959 osd.195
196 hdd 7.30959 osd.196
199 hdd 7.30959 osd.199
200 hdd 7.30959 osd.200
-94 116.95337 host osd15
156 hdd 7.30959 osd.156
157 hdd 7.30959 osd.157
158 hdd 7.30959 osd.158
159 hdd 7.30959 osd.159
160 hdd 7.30959 osd.160
161 hdd 7.30959 osd.161
162 hdd 7.30959 osd.162
163 hdd 7.30959 osd.163
164 hdd 7.30959 osd.164
165 hdd 7.30959 osd.165
166 hdd 7.30959 osd.166
167 hdd 7.30959 osd.167
171 hdd 7.30959 osd.171
172 hdd 7.30959 osd.172
177 hdd 7.30959 osd.177
183 hdd 7.30959 osd.183
-17 80.40544 host osd2
4 hdd 7.30959 osd.4
15 hdd 7.30959 osd.15
19 hdd 7.30959 osd.19
33 hdd 7.30959 osd.33
38 hdd 7.30959 osd.38
52 hdd 7.30959 osd.52
58 hdd 7.30959 osd.58
69 hdd 7.30959 osd.69
73 hdd 7.30959 osd.73
81 hdd 7.30959 osd.81
95 hdd 7.30959 osd.95
-115 87.71503 host osd22
275 hdd 7.30959 osd.275
276 hdd 7.30959 osd.276
277 hdd 7.30959 osd.277
278 hdd 7.30959 osd.278
279 hdd 7.30959 osd.279
280 hdd 7.30959 osd.280
281 hdd 7.30959 osd.281
282 hdd 7.30959 osd.282
283 hdd 7.30959 osd.283
284 hdd 7.30959 osd.284
285 hdd 7.30959 osd.285
286 hdd 7.30959 osd.286
-118 87.71503 host osd23
287 hdd 7.30959 osd.287
288 hdd 7.30959 osd.288
289 hdd 7.30959 osd.289
290 hdd 7.30959 osd.290
291 hdd 7.30959 osd.291
292 hdd 7.30959 osd.292
293 hdd 7.30959 osd.293
294 hdd 7.30959 osd.294
295 hdd 7.30959 osd.295
296 hdd 7.30959 osd.296
297 hdd 7.30959 osd.297
298 hdd 7.30959 osd.298
-130 87.71503 host osd25
311 hdd 7.30959 osd.311
312 hdd 7.30959 osd.312
313 hdd 7.30959 osd.313
314 hdd 7.30959 osd.314
315 hdd 7.30959 osd.315
316 hdd 7.30959 osd.316
317 hdd 7.30959 osd.317
318 hdd 7.30959 osd.318
319 hdd 7.30959 osd.319
320 hdd 7.30959 osd.320
321 hdd 7.30959 osd.321
322 hdd 7.30959 osd.322
-134 87.71503 host osd29
359 hdd 7.30959 osd.359
360 hdd 7.30959 osd.360
361 hdd 7.30959 osd.361
362 hdd 7.30959 osd.362
363 hdd 7.30959 osd.363
364 hdd 7.30959 osd.364
365 hdd 7.30959 osd.365
366 hdd 7.30959 osd.366
367 hdd 7.30959 osd.367
368 hdd 7.30959 osd.368
369 hdd 7.30959 osd.369
370 hdd 7.30959 osd.370
-127 774.81604 tor s1-bos-f-7-ng
-79 116.95337 host osd10
108 hdd 7.30959 osd.108
109 hdd 7.30959 osd.109
110 hdd 7.30959 osd.110
111 hdd 7.30959 osd.111
112 hdd 7.30959 osd.112
113 hdd 7.30959 osd.113
114 hdd 7.30959 osd.114
115 hdd 7.30959 osd.115
116 hdd 7.30959 osd.116
117 hdd 7.30959 osd.117
118 hdd 7.30959 osd.118
119 hdd 7.30959 osd.119
120 hdd 7.30959 osd.120
121 hdd 7.30959 osd.121
122 hdd 7.30959 osd.122
123 hdd 7.30959 osd.123
-85 116.95337 host osd11
125 hdd 7.30959 osd.125
126 hdd 7.30959 osd.126
127 hdd 7.30959 osd.127
129 hdd 7.30959 osd.129
138 hdd 7.30959 osd.138
144 hdd 7.30959 osd.144
149 hdd 7.30959 osd.149
151 hdd 7.30959 osd.151
188 hdd 7.30959 osd.188
189 hdd 7.30959 osd.189
190 hdd 7.30959 osd.190
197 hdd 7.30959 osd.197
198 hdd 7.30959 osd.198
201 hdd 7.30959 osd.201
202 hdd 7.30959 osd.202
203 hdd 7.30959 osd.203
-91 116.95337 host osd13
130 hdd 7.30959 osd.130
134 hdd 7.30959 osd.134
135 hdd 7.30959 osd.135
140 hdd 7.30959 osd.140
143 hdd 7.30959 osd.143
147 hdd 7.30959 osd.147
148 hdd 7.30959 osd.148
155 hdd 7.30959 osd.155
168 hdd 7.30959 osd.168
169 hdd 7.30959 osd.169
170 hdd 7.30959 osd.170
176 hdd 7.30959 osd.176
178 hdd 7.30959 osd.178
180 hdd 7.30959 osd.180
182 hdd 7.30959 osd.182
184 hdd 7.30959 osd.184
-121 87.71503 host osd24
299 hdd 7.30959 osd.299
300 hdd 7.30959 osd.300
301 hdd 7.30959 osd.301
302 hdd 7.30959 osd.302
303 hdd 7.30959 osd.303
304 hdd 7.30959 osd.304
305 hdd 7.30959 osd.305
306 hdd 7.30959 osd.306
307 hdd 7.30959 osd.307
308 hdd 7.30959 osd.308
309 hdd 7.30959 osd.309
310 hdd 7.30959 osd.310
-131 87.71503 host osd26
323 hdd 7.30959 osd.323
324 hdd 7.30959 osd.324
325 hdd 7.30959 osd.325
326 hdd 7.30959 osd.326
327 hdd 7.30959 osd.327
328 hdd 7.30959 osd.328
329 hdd 7.30959 osd.329
330 hdd 7.30959 osd.330
331 hdd 7.30959 osd.331
332 hdd 7.30959 osd.332
333 hdd 7.30959 osd.333
334 hdd 7.30959 osd.334
-5 80.40544 host osd3
2 hdd 7.30959 osd.2
12 hdd 7.30959 osd.12
21 hdd 7.30959 osd.21
34 hdd 7.30959 osd.34
42 hdd 7.30959 osd.42
50 hdd 7.30959 osd.50
54 hdd 7.30959 osd.54
63 hdd 7.30959 osd.63
100 hdd 7.30959 osd.100
101 hdd 7.30959 osd.101
102 hdd 7.30959 osd.102
-135 87.71503 host osd30
371 hdd 7.30959 osd.371
372 hdd 7.30959 osd.372
373 hdd 7.30959 osd.373
374 hdd 7.30959 osd.374
375 hdd 7.30959 osd.375
376 hdd 7.30959 osd.376
377 hdd 7.30959 osd.377
378 hdd 7.30959 osd.378
379 hdd 7.30959 osd.379
380 hdd 7.30959 osd.380
381 hdd 7.30959 osd.381
382 hdd 7.30959 osd.382
-13 80.40544 host osd9
8 hdd 7.30959 osd.8
13 hdd 7.30959 osd.13
24 hdd 7.30959 osd.24
32 hdd 7.30959 osd.32
39 hdd 7.30959 osd.39
48 hdd 7.30959 osd.48
56 hdd 7.30959 osd.56
64 hdd 7.30959 osd.64
75 hdd 7.30959 osd.75
84 hdd 7.30959 osd.84
94 hdd 7.30959 osd.94
-25 584.76685 tor s1-bos-f-8-ng
-97 87.71503 host osd16
216 hdd 7.30959 osd.216
217 hdd 7.30959 osd.217
218 hdd 7.30959 osd.218
219 hdd 7.30959 osd.219
220 hdd 7.30959 osd.220
221 hdd 7.30959 osd.221
222 hdd 7.30959 osd.222
223 hdd 7.30959 osd.223
224 hdd 7.30959 osd.224
225 hdd 7.30959 osd.225
226 hdd 7.30959 osd.226
227 hdd 7.30959 osd.227
-100 87.71503 host osd17
204 hdd 7.30959 osd.204
205 hdd 7.30959 osd.205
206 hdd 7.30959 osd.206
207 hdd 7.30959 osd.207
208 hdd 7.30959 osd.208
209 hdd 7.30959 osd.209
210 hdd 7.30959 osd.210
211 hdd 7.30959 osd.211
212 hdd 7.30959 osd.212
213 hdd 7.30959 osd.213
214 hdd 7.30959 osd.214
215 hdd 7.30959 osd.215
-103 80.40544 host osd18
228 hdd 7.30959 osd.228
229 hdd 7.30959 osd.229
230 hdd 7.30959 osd.230
231 hdd 7.30959 osd.231
232 hdd 7.30959 osd.232
233 hdd 7.30959 osd.233
234 hdd 7.30959 osd.234
235 hdd 7.30959 osd.235
236 hdd 7.30959 osd.236
237 hdd 7.30959 osd.237
238 hdd 7.30959 osd.238
-132 87.71503 host osd27
335 hdd 7.30959 osd.335
336 hdd 7.30959 osd.336
337 hdd 7.30959 osd.337
338 hdd 7.30959 osd.338
339 hdd 7.30959 osd.339
340 hdd 7.30959 osd.340
341 hdd 7.30959 osd.341
342 hdd 7.30959 osd.342
343 hdd 7.30959 osd.343
344 hdd 7.30959 osd.344
345 hdd 7.30959 osd.345
346 hdd 7.30959 osd.346
-11 80.40544 host osd4
3 hdd 7.30959 osd.3
16 hdd 7.30959 osd.16
18 hdd 7.30959 osd.18
31 hdd 7.30959 osd.31
36 hdd 7.30959 osd.36
49 hdd 7.30959 osd.49
59 hdd 7.30959 osd.59
68 hdd 7.30959 osd.68
72 hdd 7.30959 osd.72
77 hdd 7.30959 osd.77
82 hdd 7.30959 osd.82
-19 80.40544 host osd5
6 hdd 7.30959 osd.6
11 hdd 7.30959 osd.11
25 hdd 7.30959 osd.25
27 hdd 7.30959 osd.27
41 hdd 7.30959 osd.41
46 hdd 7.30959 osd.46
60 hdd 7.30959 osd.60
66 hdd 7.30959 osd.66
80 hdd 7.30959 osd.80
88 hdd 7.30959 osd.88
91 hdd 7.30959 osd.91
-3 80.40544 host osd6
7 hdd 7.30959 osd.7
14 hdd 7.30959 osd.14
20 hdd 7.30959 osd.20
30 hdd 7.30959 osd.30
37 hdd 7.30959 osd.37
47 hdd 7.30959 osd.47
57 hdd 7.30959 osd.57
67 hdd 7.30959 osd.67
74 hdd 7.30959 osd.74
83 hdd 7.30959 osd.83
96 hdd 7.30959 osd.96
-23 628.62439 tor s1-bos-f-9-ng
-82 116.95337 host osd14
124 hdd 7.30959 osd.124
131 hdd 7.30959 osd.131
132 hdd 7.30959 osd.132
133 hdd 7.30959 osd.133
136 hdd 7.30959 osd.136
137 hdd 7.30959 osd.137
141 hdd 7.30959 osd.141
145 hdd 7.30959 osd.145
173 hdd 7.30959 osd.173
174 hdd 7.30959 osd.174
175 hdd 7.30959 osd.175
179 hdd 7.30959 osd.179
181 hdd 7.30959 osd.181
185 hdd 7.30959 osd.185
186 hdd 7.30959 osd.186
187 hdd 7.30959 osd.187
-106 87.71503 host osd19
239 hdd 7.30959 osd.239
240 hdd 7.30959 osd.240
241 hdd 7.30959 osd.241
242 hdd 7.30959 osd.242
243 hdd 7.30959 osd.243
244 hdd 7.30959 osd.244
245 hdd 7.30959 osd.245
246 hdd 7.30959 osd.246
247 hdd 7.30959 osd.247
248 hdd 7.30959 osd.248
249 hdd 7.30959 osd.249
250 hdd 7.30959 osd.250
-109 87.71503 host osd20
251 hdd 7.30959 osd.251
252 hdd 7.30959 osd.252
253 hdd 7.30959 osd.253
254 hdd 7.30959 osd.254
255 hdd 7.30959 osd.255
256 hdd 7.30959 osd.256
257 hdd 7.30959 osd.257
258 hdd 7.30959 osd.258
259 hdd 7.30959 osd.259
260 hdd 7.30959 osd.260
261 hdd 7.30959 osd.261
262 hdd 7.30959 osd.262
-112 87.71503 host osd21
263 hdd 7.30959 osd.263
264 hdd 7.30959 osd.264
265 hdd 7.30959 osd.265
266 hdd 7.30959 osd.266
267 hdd 7.30959 osd.267
268 hdd 7.30959 osd.268
269 hdd 7.30959 osd.269
270 hdd 7.30959 osd.270
271 hdd 7.30959 osd.271
272 hdd 7.30959 osd.272
273 hdd 7.30959 osd.273
274 hdd 7.30959 osd.274
-133 87.71503 host osd28
347 hdd 7.30959 osd.347
348 hdd 7.30959 osd.348
349 hdd 7.30959 osd.349
350 hdd 7.30959 osd.350
351 hdd 7.30959 osd.351
352 hdd 7.30959 osd.352
353 hdd 7.30959 osd.353
354 hdd 7.30959 osd.354
355 hdd 7.30959 osd.355
356 hdd 7.30959 osd.356
357 hdd 7.30959 osd.357
358 hdd 7.30959 osd.358
-15 80.40544 host osd7
1 hdd 7.30959 osd.1
9 hdd 7.30959 osd.9
22 hdd 7.30959 osd.22
35 hdd 7.30959 osd.35
43 hdd 7.30959 osd.43
51 hdd 7.30959 osd.51
55 hdd 7.30959 osd.55
65 hdd 7.30959 osd.65
76 hdd 7.30959 osd.76
86 hdd 7.30959 osd.86
92 hdd 7.30959 osd.92
-9 80.40544 host osd8
5 hdd 7.30959 osd.5
17 hdd 7.30959 osd.17
23 hdd 7.30959 osd.23
29 hdd 7.30959 osd.29
40 hdd 7.30959 osd.40
45 hdd 7.30959 osd.45
61 hdd 7.30959 osd.61
71 hdd 7.30959 osd.71
78 hdd 7.30959 osd.78
87 hdd 7.30959 osd.87
90 hdd 7.30959 osd.90
Ceph MGR balancer is running in upmap mode, clients are "luminous".
During one FD s1-bos-f-9-ng outage there were many PGs in inactive
state, for example:
pg 11.558 is stuck inactive for 102s, current state
undersized+degraded+peered, last acting
[176,169,NONE,NONE,199,NONE,227,221]
this clearly shows that PG was distributed unevenly through FDs as
according crush rule it should be placed on 2 OSDs per FD.
# for i in 176 169 199 227 221;do echo -n "$i: "; ceph osd find $i |
jq -r .crush_location.tor;done
176: s1-bos-f-7-ng
169: s1-bos-f-7-ng
199: s1-bos-e-10-ng
227: s1-bos-f-8-ng
221: s1-bos-f-8-ng
# ceph osd tree | grep tor
-41 3.49318 tor s1-bos-e-10-ng.metadata
-124 3.49318 tor s1-bos-f-7-ng.metadata
-56 5.23978 tor s1-bos-f-8-ng.metadata
-67 3.49318 tor s1-bos-f-9-ng.metadata
-21 745.57776 tor s1-bos-e-10-ng
-127 774.81604 tor s1-bos-f-7-ng
-25 584.76685 tor s1-bos-f-8-ng
-23 628.62439 tor s1-bos-f-9-ng
after our FD came up I did run pg-upmap-items manually to make
placement correct.
But after enabling MGR balancer there were attempts to move PG
placement again to different OSD under different FD that breaks
crush rule.
Now I know this cluster FD weights are very different in CRUSH map.
So I've tried to use osdmaptool to simulate balancer.
Our balancer settings were:
mgr advanced mgr/balancer/mode upmap
mgr advanced mgr/balancer/upmap_max_deviation 1
mgr advanced mgr/balancer/upmap_max_optimizations 100
So I tried to adjust max deviation to level, that balancer would not
try to move PGs to OSDs in different FD that breaks crush rule. I
ended up on value 50, and honestly don't know if at this value
balancer will try to do anything anytime.
From my perspective MGR balancer is ignoring CRUSH rule - at least
on EC pools. I observed this behaviour on our another Ceph cluster
and also only on EC pool even if FD weights there were almost same.
I even tried TheJJ Ceph-balancer, and it breaks crush rule also.
Any ideas or similar experiences?
Thank you
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]