[ceph-users] Re: Device class not deleted/set correctly

2021-03-23 Thread Nico Schottelius


Stefan Kooman  writes:
>> OSDs from the wrong class (hdd). Does anyone have a hint on how to fix
>> this?
>
> Do you have: osd_class_update_on_start enabled?

So this one is a bit funky. It seems to be off, but the behaviour would
indicate it isn't. Checking the typical configurations:

[10:38:53] black2.place6:~# ceph config-key get 
config/global/osd_class_update_on_start; echo ""
obtained 'config/global/osd_class_update_on_start'
false

[10:39:59] black2.place6:~# ceph-conf -D | grep osd_class_update_on_start
osd_class_update_on_start = true

[10:47:24] black2.place6:~# grep osd_class_update_on_start /etc/ceph/ceph.conf

[10:52:59] black2.place6:~# ceph config dump | grep osd_class_update_on_start
global  advanced osd_class_update_on_start  false
[10:53:38] black2.place6:~#

So it looks like it's already disabled.

I am not sure where ceph-conf reads the value true from, but I assume
it's a builtin.

I was also searching for osd_class_update_on_start in the Internet and
it seems there is no reference to it in the ceph documentation. Do you
have any pointers to it?

Best regards,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Device class not deleted/set correctly

2021-03-23 Thread Stefan Kooman

On 3/22/21 3:52 PM, Nico Schottelius wrote:


Hello,

follow up from my mail from 2020 [0], it seems that OSDs sometimes have
"multiple classes" assigned:

[15:47:15] server6.place6:/var/lib/ceph/osd/ceph-4# ceph osd crush 
rm-device-class osd.4
done removing class of osd(s): 4
[15:47:17] server6.place6:/var/lib/ceph/osd/ceph-4# ceph osd crush 
rm-device-class osd.4
osd.4 belongs to no class,
[15:47:20] server6.place6:/var/lib/ceph/osd/ceph-4# ceph osd crush 
set-device-class xruk osd.4
set osd(s) 4 to class 'xruk'
[15:47:45] server6.place6:/var/lib/ceph/osd/ceph-4# ceph osd crush 
set-device-class xruk osd.4
osd.4 already set to class xruk. set-device-class item id 4 name 'osd.4' 
device_class 'xruk': no change.
[15:47:47] server6.place6:/var/lib/ceph/osd/ceph-4# /usr/bin/ceph-osd -i 4 
--pid-file /var/run/ceph/osd.4.pid -c /etc/ceph/ceph.conf --cluster ceph 
--setuser ceph --setgroup ceph

2021-03-22 15:48:02.773 7fe2f81e4d80 -1 osd.4 94608 log_to_monitors 
{default=true}
2021-03-22 15:48:02.777 7fe2f81e4d80 -1 osd.4 94608 mon_cmd_maybe_osd_create fail: 
'osd.4 has already bound to class 'xruk', can not reset class to 'hdd'; use 'ceph osd 
crush rm-device-class ' to remove old class first': (16) Device or resource 
busy
[15:48:02] server6.place6:/var/lib/ceph/osd/ceph-4#
[15:48:02] server6.place6:/var/lib/ceph/osd/ceph-4#

We are running ceph 14.2.9.

As written before, it also seems that the affected OSD is peering with
OSDs from the wrong class (hdd). Does anyone have a hint on how to fix
this?


Do you have: osd_class_update_on_start enabled?

On our cluster NVMe OSDs would try to wrongly add themselves to "SSD" 
class (which didn't succeed). But maybe sometimes your OSDs do manage to 
put themselve in a wrong class? Just guessing. But I would turn that 
off. The same for this parameter:


osd_crush_update_on_start

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Device class not deleted/set correctly

2021-03-23 Thread Stefan Kooman

On 3/23/21 11:00 AM, Nico Schottelius wrote:


Stefan Kooman  writes:

OSDs from the wrong class (hdd). Does anyone have a hint on how to fix
this?


Do you have: osd_class_update_on_start enabled?


So this one is a bit funky. It seems to be off, but the behaviour would
indicate it isn't. Checking the typical configurations:

[10:38:53] black2.place6:~# ceph config-key get config/global/osd_class_update_on_start; 
echo ""
obtained 'config/global/osd_class_update_on_start'
false

[10:39:59] black2.place6:~# ceph-conf -D | grep osd_class_update_on_start
osd_class_update_on_start = true

[10:47:24] black2.place6:~# grep osd_class_update_on_start /etc/ceph/ceph.conf

[10:52:59] black2.place6:~# ceph config dump | grep osd_class_update_on_start
global  advanced osd_class_update_on_start  false
[10:53:38] black2.place6:~#

So it looks like it's already disabled.


What does a "ceph daemon osd.$id config get osd_class_update_on_start" 
give on that host for an OSD that is running there?


It depends on settings on logging of the OSD daemons, but in our case it 
was logged to the daemon log I believe (or syslog, dunno anymore).




I am not sure where ceph-conf reads the value true from, but I assume
it's a builtin.

I was also searching for osd_class_update_on_start in the Internet and
it seems there is no reference to it in the ceph documentation. Do you
have any pointers to it?


Not anymore with new Ceph documentation. But the parameter is self 
explaining, it will try to put itself into the proper class at startup. 
Source code: src/common/options.cc


Option("osd_class_update_on_start", Option::TYPE_BOOL, 
Option::LEVEL_ADVANCED)

.set_default(true)
.set_description("set OSD device class on startup"),

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Device class not deleted/set correctly

2021-03-25 Thread Nico Schottelius



Stefan Kooman  writes:

> On 3/23/21 11:00 AM, Nico Schottelius wrote:
>> Stefan Kooman  writes:
 OSDs from the wrong class (hdd). Does anyone have a hint on how to fix
 this?
>>>
>>> Do you have: osd_class_update_on_start enabled?
>> So this one is a bit funky. It seems to be off, but the behaviour
>> would
>> indicate it isn't. Checking the typical configurations:
>> [10:38:53] black2.place6:~# ceph config-key get
>> config/global/osd_class_update_on_start; echo ""
>> obtained 'config/global/osd_class_update_on_start'
>> false
>> [10:39:59] black2.place6:~# ceph-conf -D | grep
>> osd_class_update_on_start
>> osd_class_update_on_start = true
>> [10:47:24] black2.place6:~# grep osd_class_update_on_start
>> /etc/ceph/ceph.conf
>> [10:52:59] black2.place6:~# ceph config dump | grep
>> osd_class_update_on_start
>> global  advanced osd_class_update_on_start  false
>> [10:53:38] black2.place6:~#
>> So it looks like it's already disabled.
>
> What does a "ceph daemon osd.$id config get osd_class_update_on_start"
> give on that host for an OSD that is running there?

That returns

[12:52:24] server6.place6:~# ceph daemon osd.4 config get 
osd_class_update_on_start
{
"osd_class_update_on_start": "false"
}

for all involved OSDs.

> It depends on settings on logging of the OSD daemons, but in our case
> it was logged to the daemon log I believe (or syslog, dunno anymore).

It's so strange, because none of the configurations indicate to use a
"hdd" class. Which, btw, we also don't use in other cases (i.e. none of
our used classes is hdd), so I suspect some builtin to try to setup the
class.

>> I am not sure where ceph-conf reads the value true from, but I
>> assume
>> it's a builtin.
>> I was also searching for osd_class_update_on_start in the Internet
>> and
>> it seems there is no reference to it in the ceph documentation. Do you
>> have any pointers to it?
>
> Not anymore with new Ceph documentation.

Out of curiosity, do you have any clue why it's not in there anymore?

> But the parameter is self
> explaining, it will try to put itself into the proper class at
> startup. Source code: src/common/options.cc
>
> Option("osd_class_update_on_start", Option::TYPE_BOOL,
> Option::LEVEL_ADVANCED)
> .set_default(true)
> .set_description("set OSD device class on startup"),

The description I am somewhat missing is "set based on which criteria?"

In any case, it seems that the running OSD has the correct class
assigned. However I can see that that OSD has connections open to
unrelated osds:

tcp6   0  0 2a0a:e5c0:2:1:21b:21ff:febc:bf30:6805 
2a0a:e5c0:2:1:21b:21ff:febb:68dc:57280 ESTABLISHED 17034/ceph-osd

So something is "not good" or "not correct" with this osd. This
particular one is in a special class that serves 1 pool with only 3 osds
in it. However this osd has around 200 connections established to what I
can see most (all?) other osds in the cluster.

To my understanding, it seems wrong that ceph osds form a complete mesh,
especially if they will never exchange data with the osds they are
connected to.

Can somebody confirm that osds should only connect to osds they share
data with?

And if my assumption is correct: is there any way to tell this osd to
behave correctly and only establish connections to osds of the same
class? (i.e. correctly assigning the class)

Best regards,

Nico


--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Device class not deleted/set correctly

2021-03-25 Thread Konstantin Shalygin
rotational device or not, from kernel


k

Sent from my iPhone

> On 25 Mar 2021, at 15:06, Nico Schottelius  
> wrote:
> 
> The description I am somewhat missing is "set based on which criteria?"
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Device class not deleted/set correctly

2021-03-30 Thread Stefan Kooman

On 3/25/21 1:05 PM, Nico Schottelius wrote:


it seems there is no reference to it in the ceph documentation. Do you
have any pointers to it?


Not anymore with new Ceph documentation.


Out of curiosity, do you have any clue why it's not in there anymore?


It might still be, but I cannot find it anymore ...

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io