[ceph-users] rebalance stuck backfill_toofull, OSD NOT full

2019-11-08 Thread Philippe D'Anjou
v14.2.4 

Following issue:
PG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull
    pg 1.285 is active+remapped+backfill_toofull, acting [118,94,84]

BUT:118   hdd 9.09470  0.8 9.1 TiB  7.4 TiB  7.4 TiB  12 KiB  19 GiB 1.7 
TiB 81.53 1.16  38 up 

Even with adjusted backfillfull ratio of 0.94 nothing is moving (incl 
restarting the OSD).

This is dangerous because it blocks recovery. This happens because there's a 
bug in the PG distribution algo, due to improper balance my PG counts are all 
over the place and some OSD are half empty and a few are up to 90%. 
How do I fix this rebalance issue now? I already googled and only came up with 
adjusting ratio rates or restarting the OSD but nothing helps.
Thanks for help
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread George Shuklin
Did you reinstall mons as well? If no, check if you've removed that osd
auth (ceph auth ls)

On Fri, Nov 8, 2019, 19:27 nokia ceph  wrote:

> Hi,
>
> The fifth node in the cluster was affected by hardware failure and hence
> the node was replaced in the ceph cluster. But we were not able to replace
> it properly and hence we uninstalled the ceph in all the nodes, deleted the
> pools and also zapped the osd's and recreated them as new ceph cluster. But
> not sure where from the reference for the old fifth nodes(failed nodes)
> osd's fsid's are coming from still. Is this creating the problem. Because I
> am seeing that the OSD's in the fifth node are showing up in the ceph
> status whereas the other nodes osd's are showing down.
>
> On Fri, Nov 8, 2019 at 7:25 PM huang jun  wrote:
>
>> I saw many lines like that
>>
>> mon.cn1@0(leader).osd e1805 preprocess_boot from osd.112
>> v2:10.50.11.45:6822/158344 clashes with existing osd: different fsid
>> (ours: 85908622-31bd-4728-9be3-f1f6ca44ed98 ; theirs:
>> 127fdc44-c17e-42ee-bcd4-d577c0ef4479)
>> the osd boot will be ignored if the fsid mismatch
>> what do you do before this happen?
>>
>> nokia ceph  于2019年11月8日周五 下午8:29写道:
>> >
>> > Hi,
>> >
>> > Please find the osd.0 which is restarted after the debug_mon is
>> increased to 20.
>> >
>> > cn1.chn8be1c1.cdn ~# date;systemctl restart ceph-osd@0.service
>> > Fri Nov  8 12:25:05 UTC 2019
>> >
>> > cn1.chn8be1c1.cdn ~# systemctl status ceph-osd@0.service -l
>> > ● ceph-osd@0.service - Ceph object storage daemon osd.0
>> >Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service;
>> enabled-runtime; vendor preset: disabled)
>> >   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>> >└─90-ExecStart_NUMA.conf
>> >Active: active (running) since Fri 2019-11-08 12:25:06 UTC; 29s ago
>> >   Process: 298505 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
>> --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
>> >  Main PID: 298512 (ceph-osd)
>> >CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>> >└─298512 /usr/bin/ceph-osd -f --cluster ceph --id 0
>> --setuser ceph --setgroup ceph
>> >
>> > Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object
>> storage daemon osd.0...
>> > Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object
>> storage daemon osd.0.
>> > Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08
>> 12:25:11.538 7f8515323d80 -1 osd.0 1795 log_to_monitors {default=true}
>> > Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08
>> 12:25:11.689 7f850792e700 -1 osd.0 1795 set_numa_affinity unable to
>> identify public interface 'dss-client' numa node: (2) No such file or
>> directory
>> >
>> > On Fri, Nov 8, 2019 at 4:48 PM huang jun  wrote:
>> >>
>> >> the osd.0 is still in down state after restart? if so, maybe the
>> >> problem is in mon,
>> >> can you set the leader mon's debug_mon=20 and restart one of the down
>> >> state osd.
>> >> and then attach the mon log file.
>> >>
>> >> nokia ceph  于2019年11月8日周五 下午6:38写道:
>> >> >
>> >> > Hi,
>> >> >
>> >> >
>> >> >
>> >> > Below is the status of the OSD after restart.
>> >> >
>> >> >
>> >> >
>> >> > # systemctl status ceph-osd@0.service
>> >> >
>> >> > ● ceph-osd@0.service - Ceph object storage daemon osd.0
>> >> >
>> >> >Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service;
>> enabled-runtime; vendor preset: disabled)
>> >> >
>> >> >   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>> >> >
>> >> >└─90-ExecStart_NUMA.conf
>> >> >
>> >> >Active: active (running) since Fri 2019-11-08 10:32:51 UTC; 1min
>> 1s ago
>> >> >
>> >> >   Process: 219213 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
>> --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)  Main PID:
>> 219218 (ceph-osd)
>> >> >
>> >> >CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>> >> >
>> >> >└─219218 /usr/bin/ceph-osd -f --cluster ceph --id 0
>> --setuser ceph --setgroup ceph
>> >> >
>> >> >
>> >> >
>> >> > Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object
>> storage daemon osd.0...
>> >> >
>> >> > Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object
>> storage daemon osd.0.
>> >> >
>> >> > Nov 08 10:33:03 cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08
>> 10:33:03.785 7f9adeed4d80 -1 osd.0 1795 log_to_monitors {default=true} Nov
>> 08 10:33:05 cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:05.474
>> 7f9ad14df700 -1 osd.0 1795 set_numa_affinity unable to identify public
>> interface 'dss-client' numa n...r directory
>> >> >
>> >> > Hint: Some lines were ellipsized, use -l to show in full.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > And I have attached the logs in the file in this mail while this
>> restart was initiated.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Nov 8, 2019 at 3:59 PM huang jun 
>> wrote:
>> >> >>
>> >> >> try to restart some of the down osds in 'ceph osd tree', and to see

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread huang jun
can you post your 'ceph osd tree' in pastebin?
do you mean the osds report fsid mismatch is from old removed nodes?

nokia ceph  于2019年11月8日周五 下午10:21写道:
>
> Hi,
>
> The fifth node in the cluster was affected by hardware failure and hence the 
> node was replaced in the ceph cluster. But we were not able to replace it 
> properly and hence we uninstalled the ceph in all the nodes, deleted the 
> pools and also zapped the osd's and recreated them as new ceph cluster. But 
> not sure where from the reference for the old fifth nodes(failed nodes) osd's 
> fsid's are coming from still. Is this creating the problem. Because I am 
> seeing that the OSD's in the fifth node are showing up in the ceph status 
> whereas the other nodes osd's are showing down.
>
> On Fri, Nov 8, 2019 at 7:25 PM huang jun  wrote:
>>
>> I saw many lines like that
>>
>> mon.cn1@0(leader).osd e1805 preprocess_boot from osd.112
>> v2:10.50.11.45:6822/158344 clashes with existing osd: different fsid
>> (ours: 85908622-31bd-4728-9be3-f1f6ca44ed98 ; theirs:
>> 127fdc44-c17e-42ee-bcd4-d577c0ef4479)
>> the osd boot will be ignored if the fsid mismatch
>> what do you do before this happen?
>>
>> nokia ceph  于2019年11月8日周五 下午8:29写道:
>> >
>> > Hi,
>> >
>> > Please find the osd.0 which is restarted after the debug_mon is increased 
>> > to 20.
>> >
>> > cn1.chn8be1c1.cdn ~# date;systemctl restart ceph-osd@0.service
>> > Fri Nov  8 12:25:05 UTC 2019
>> >
>> > cn1.chn8be1c1.cdn ~# systemctl status ceph-osd@0.service -l
>> > ● ceph-osd@0.service - Ceph object storage daemon osd.0
>> >Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; 
>> > enabled-runtime; vendor preset: disabled)
>> >   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>> >└─90-ExecStart_NUMA.conf
>> >Active: active (running) since Fri 2019-11-08 12:25:06 UTC; 29s ago
>> >   Process: 298505 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh 
>> > --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
>> >  Main PID: 298512 (ceph-osd)
>> >CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>> >└─298512 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser 
>> > ceph --setgroup ceph
>> >
>> > Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object storage 
>> > daemon osd.0...
>> > Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object storage 
>> > daemon osd.0.
>> > Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08 12:25:11.538 
>> > 7f8515323d80 -1 osd.0 1795 log_to_monitors {default=true}
>> > Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08 12:25:11.689 
>> > 7f850792e700 -1 osd.0 1795 set_numa_affinity unable to identify public 
>> > interface 'dss-client' numa node: (2) No such file or directory
>> >
>> > On Fri, Nov 8, 2019 at 4:48 PM huang jun  wrote:
>> >>
>> >> the osd.0 is still in down state after restart? if so, maybe the
>> >> problem is in mon,
>> >> can you set the leader mon's debug_mon=20 and restart one of the down
>> >> state osd.
>> >> and then attach the mon log file.
>> >>
>> >> nokia ceph  于2019年11月8日周五 下午6:38写道:
>> >> >
>> >> > Hi,
>> >> >
>> >> >
>> >> >
>> >> > Below is the status of the OSD after restart.
>> >> >
>> >> >
>> >> >
>> >> > # systemctl status ceph-osd@0.service
>> >> >
>> >> > ● ceph-osd@0.service - Ceph object storage daemon osd.0
>> >> >
>> >> >Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; 
>> >> > enabled-runtime; vendor preset: disabled)
>> >> >
>> >> >   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>> >> >
>> >> >└─90-ExecStart_NUMA.conf
>> >> >
>> >> >Active: active (running) since Fri 2019-11-08 10:32:51 UTC; 1min 1s 
>> >> > ago
>> >> >
>> >> >   Process: 219213 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh 
>> >> > --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)  Main PID: 
>> >> > 219218 (ceph-osd)
>> >> >
>> >> >CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>> >> >
>> >> >└─219218 /usr/bin/ceph-osd -f --cluster ceph --id 0 
>> >> > --setuser ceph --setgroup ceph
>> >> >
>> >> >
>> >> >
>> >> > Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object 
>> >> > storage daemon osd.0...
>> >> >
>> >> > Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object 
>> >> > storage daemon osd.0.
>> >> >
>> >> > Nov 08 10:33:03 cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 
>> >> > 10:33:03.785 7f9adeed4d80 -1 osd.0 1795 log_to_monitors {default=true} 
>> >> > Nov 08 10:33:05 cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 
>> >> > 10:33:05.474 7f9ad14df700 -1 osd.0 1795 set_numa_affinity unable to 
>> >> > identify public interface 'dss-client' numa n...r directory
>> >> >
>> >> > Hint: Some lines were ellipsized, use -l to show in full.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > And I have attached the logs in the file in this mail while this 
>> >> > restart was initiated.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Nov 8, 

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread nokia ceph
Hi,

The fifth node in the cluster was affected by hardware failure and hence
the node was replaced in the ceph cluster. But we were not able to replace
it properly and hence we uninstalled the ceph in all the nodes, deleted the
pools and also zapped the osd's and recreated them as new ceph cluster. But
not sure where from the reference for the old fifth nodes(failed nodes)
osd's fsid's are coming from still. Is this creating the problem. Because I
am seeing that the OSD's in the fifth node are showing up in the ceph
status whereas the other nodes osd's are showing down.

On Fri, Nov 8, 2019 at 7:25 PM huang jun  wrote:

> I saw many lines like that
>
> mon.cn1@0(leader).osd e1805 preprocess_boot from osd.112
> v2:10.50.11.45:6822/158344 clashes with existing osd: different fsid
> (ours: 85908622-31bd-4728-9be3-f1f6ca44ed98 ; theirs:
> 127fdc44-c17e-42ee-bcd4-d577c0ef4479)
> the osd boot will be ignored if the fsid mismatch
> what do you do before this happen?
>
> nokia ceph  于2019年11月8日周五 下午8:29写道:
> >
> > Hi,
> >
> > Please find the osd.0 which is restarted after the debug_mon is
> increased to 20.
> >
> > cn1.chn8be1c1.cdn ~# date;systemctl restart ceph-osd@0.service
> > Fri Nov  8 12:25:05 UTC 2019
> >
> > cn1.chn8be1c1.cdn ~# systemctl status ceph-osd@0.service -l
> > ● ceph-osd@0.service - Ceph object storage daemon osd.0
> >Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service;
> enabled-runtime; vendor preset: disabled)
> >   Drop-In: /etc/systemd/system/ceph-osd@.service.d
> >└─90-ExecStart_NUMA.conf
> >Active: active (running) since Fri 2019-11-08 12:25:06 UTC; 29s ago
> >   Process: 298505 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
> --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
> >  Main PID: 298512 (ceph-osd)
> >CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
> >└─298512 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser
> ceph --setgroup ceph
> >
> > Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object
> storage daemon osd.0...
> > Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object
> storage daemon osd.0.
> > Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08
> 12:25:11.538 7f8515323d80 -1 osd.0 1795 log_to_monitors {default=true}
> > Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08
> 12:25:11.689 7f850792e700 -1 osd.0 1795 set_numa_affinity unable to
> identify public interface 'dss-client' numa node: (2) No such file or
> directory
> >
> > On Fri, Nov 8, 2019 at 4:48 PM huang jun  wrote:
> >>
> >> the osd.0 is still in down state after restart? if so, maybe the
> >> problem is in mon,
> >> can you set the leader mon's debug_mon=20 and restart one of the down
> >> state osd.
> >> and then attach the mon log file.
> >>
> >> nokia ceph  于2019年11月8日周五 下午6:38写道:
> >> >
> >> > Hi,
> >> >
> >> >
> >> >
> >> > Below is the status of the OSD after restart.
> >> >
> >> >
> >> >
> >> > # systemctl status ceph-osd@0.service
> >> >
> >> > ● ceph-osd@0.service - Ceph object storage daemon osd.0
> >> >
> >> >Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service;
> enabled-runtime; vendor preset: disabled)
> >> >
> >> >   Drop-In: /etc/systemd/system/ceph-osd@.service.d
> >> >
> >> >└─90-ExecStart_NUMA.conf
> >> >
> >> >Active: active (running) since Fri 2019-11-08 10:32:51 UTC; 1min
> 1s ago
> >> >
> >> >   Process: 219213 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
> --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)  Main PID:
> 219218 (ceph-osd)
> >> >
> >> >CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
> >> >
> >> >└─219218 /usr/bin/ceph-osd -f --cluster ceph --id 0
> --setuser ceph --setgroup ceph
> >> >
> >> >
> >> >
> >> > Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object
> storage daemon osd.0...
> >> >
> >> > Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object
> storage daemon osd.0.
> >> >
> >> > Nov 08 10:33:03 cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08
> 10:33:03.785 7f9adeed4d80 -1 osd.0 1795 log_to_monitors {default=true} Nov
> 08 10:33:05 cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:05.474
> 7f9ad14df700 -1 osd.0 1795 set_numa_affinity unable to identify public
> interface 'dss-client' numa n...r directory
> >> >
> >> > Hint: Some lines were ellipsized, use -l to show in full.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > And I have attached the logs in the file in this mail while this
> restart was initiated.
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Nov 8, 2019 at 3:59 PM huang jun  wrote:
> >> >>
> >> >> try to restart some of the down osds in 'ceph osd tree', and to see
> >> >> what happened?
> >> >>
> >> >> nokia ceph  于2019年11月8日周五 下午6:24写道:
> >> >> >
> >> >> > Adding my official mail id
> >> >> >
> >> >> > -- Forwarded message -
> >> >> > From: nokia ceph 
> >> >> > Date: Fri, Nov 8, 2019 at 3:57 PM
> >> >> > Subject: OSD's 

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread huang jun
I saw many lines like that

mon.cn1@0(leader).osd e1805 preprocess_boot from osd.112
v2:10.50.11.45:6822/158344 clashes with existing osd: different fsid
(ours: 85908622-31bd-4728-9be3-f1f6ca44ed98 ; theirs:
127fdc44-c17e-42ee-bcd4-d577c0ef4479)
the osd boot will be ignored if the fsid mismatch
what do you do before this happen?

nokia ceph  于2019年11月8日周五 下午8:29写道:
>
> Hi,
>
> Please find the osd.0 which is restarted after the debug_mon is increased to 
> 20.
>
> cn1.chn8be1c1.cdn ~# date;systemctl restart ceph-osd@0.service
> Fri Nov  8 12:25:05 UTC 2019
>
> cn1.chn8be1c1.cdn ~# systemctl status ceph-osd@0.service -l
> ● ceph-osd@0.service - Ceph object storage daemon osd.0
>Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; 
> enabled-runtime; vendor preset: disabled)
>   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>└─90-ExecStart_NUMA.conf
>Active: active (running) since Fri 2019-11-08 12:25:06 UTC; 29s ago
>   Process: 298505 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster 
> ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
>  Main PID: 298512 (ceph-osd)
>CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>└─298512 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph 
> --setgroup ceph
>
> Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object storage 
> daemon osd.0...
> Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object storage 
> daemon osd.0.
> Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08 12:25:11.538 
> 7f8515323d80 -1 osd.0 1795 log_to_monitors {default=true}
> Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08 12:25:11.689 
> 7f850792e700 -1 osd.0 1795 set_numa_affinity unable to identify public 
> interface 'dss-client' numa node: (2) No such file or directory
>
> On Fri, Nov 8, 2019 at 4:48 PM huang jun  wrote:
>>
>> the osd.0 is still in down state after restart? if so, maybe the
>> problem is in mon,
>> can you set the leader mon's debug_mon=20 and restart one of the down
>> state osd.
>> and then attach the mon log file.
>>
>> nokia ceph  于2019年11月8日周五 下午6:38写道:
>> >
>> > Hi,
>> >
>> >
>> >
>> > Below is the status of the OSD after restart.
>> >
>> >
>> >
>> > # systemctl status ceph-osd@0.service
>> >
>> > ● ceph-osd@0.service - Ceph object storage daemon osd.0
>> >
>> >Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; 
>> > enabled-runtime; vendor preset: disabled)
>> >
>> >   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>> >
>> >└─90-ExecStart_NUMA.conf
>> >
>> >Active: active (running) since Fri 2019-11-08 10:32:51 UTC; 1min 1s ago
>> >
>> >   Process: 219213 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh 
>> > --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)  Main PID: 
>> > 219218 (ceph-osd)
>> >
>> >CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>> >
>> >└─219218 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser 
>> > ceph --setgroup ceph
>> >
>> >
>> >
>> > Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object storage 
>> > daemon osd.0...
>> >
>> > Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object storage 
>> > daemon osd.0.
>> >
>> > Nov 08 10:33:03 cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:03.785 
>> > 7f9adeed4d80 -1 osd.0 1795 log_to_monitors {default=true} Nov 08 10:33:05 
>> > cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:05.474 7f9ad14df700 -1 
>> > osd.0 1795 set_numa_affinity unable to identify public interface 
>> > 'dss-client' numa n...r directory
>> >
>> > Hint: Some lines were ellipsized, use -l to show in full.
>> >
>> >
>> >
>> >
>> >
>> > And I have attached the logs in the file in this mail while this restart 
>> > was initiated.
>> >
>> >
>> >
>> >
>> > On Fri, Nov 8, 2019 at 3:59 PM huang jun  wrote:
>> >>
>> >> try to restart some of the down osds in 'ceph osd tree', and to see
>> >> what happened?
>> >>
>> >> nokia ceph  于2019年11月8日周五 下午6:24写道:
>> >> >
>> >> > Adding my official mail id
>> >> >
>> >> > -- Forwarded message -
>> >> > From: nokia ceph 
>> >> > Date: Fri, Nov 8, 2019 at 3:57 PM
>> >> > Subject: OSD's not coming up in Nautilus
>> >> > To: Ceph Users 
>> >> >
>> >> >
>> >> > Hi Team,
>> >> >
>> >> > There is one 5 node ceph cluster which we have upgraded from Luminous 
>> >> > to Nautilus and everything was going well until yesterday when we 
>> >> > noticed that the ceph osd's are marked down and not recognized by the 
>> >> > monitors as running eventhough the osd processes are running.
>> >> >
>> >> > We noticed that the admin.keyring and the mon.keyring are missing in 
>> >> > the nodes which we have recreated it with the below commands.
>> >> >
>> >> > ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring 
>> >> > --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap 
>> >> > mds allow
>> >> >
>> >> > ceph-authtool --create_keyring 

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread huang jun
the osd.0 is still in down state after restart? if so, maybe the
problem is in mon,
can you set the leader mon's debug_mon=20 and restart one of the down
state osd.
and then attach the mon log file.

nokia ceph  于2019年11月8日周五 下午6:38写道:
>
> Hi,
>
>
>
> Below is the status of the OSD after restart.
>
>
>
> # systemctl status ceph-osd@0.service
>
> ● ceph-osd@0.service - Ceph object storage daemon osd.0
>
>Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; 
> enabled-runtime; vendor preset: disabled)
>
>   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>
>└─90-ExecStart_NUMA.conf
>
>Active: active (running) since Fri 2019-11-08 10:32:51 UTC; 1min 1s ago
>
>   Process: 219213 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster 
> ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)  Main PID: 219218 
> (ceph-osd)
>
>CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>
>└─219218 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph 
> --setgroup ceph
>
>
>
> Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object storage 
> daemon osd.0...
>
> Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object storage 
> daemon osd.0.
>
> Nov 08 10:33:03 cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:03.785 
> 7f9adeed4d80 -1 osd.0 1795 log_to_monitors {default=true} Nov 08 10:33:05 
> cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:05.474 7f9ad14df700 -1 
> osd.0 1795 set_numa_affinity unable to identify public interface 'dss-client' 
> numa n...r directory
>
> Hint: Some lines were ellipsized, use -l to show in full.
>
>
>
>
>
> And I have attached the logs in the file in this mail while this restart was 
> initiated.
>
>
>
>
> On Fri, Nov 8, 2019 at 3:59 PM huang jun  wrote:
>>
>> try to restart some of the down osds in 'ceph osd tree', and to see
>> what happened?
>>
>> nokia ceph  于2019年11月8日周五 下午6:24写道:
>> >
>> > Adding my official mail id
>> >
>> > -- Forwarded message -
>> > From: nokia ceph 
>> > Date: Fri, Nov 8, 2019 at 3:57 PM
>> > Subject: OSD's not coming up in Nautilus
>> > To: Ceph Users 
>> >
>> >
>> > Hi Team,
>> >
>> > There is one 5 node ceph cluster which we have upgraded from Luminous to 
>> > Nautilus and everything was going well until yesterday when we noticed 
>> > that the ceph osd's are marked down and not recognized by the monitors as 
>> > running eventhough the osd processes are running.
>> >
>> > We noticed that the admin.keyring and the mon.keyring are missing in the 
>> > nodes which we have recreated it with the below commands.
>> >
>> > ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring 
>> > --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap 
>> > mds allow
>> >
>> > ceph-authtool --create_keyring /etc/ceph/ceph.mon.keyring --gen-key -n 
>> > mon. --cap mon 'allow *'
>> >
>> > In logs we find the below lines.
>> >
>> > 2019-11-08 09:01:50.525 7ff61722b700  0 log_channel(audit) log [DBG] : 
>> > from='client.? 10.50.11.44:0/2398064782' entity='client.admin' 
>> > cmd=[{"prefix": "df", "format": "json"}]: dispatch
>> > 2019-11-08 09:02:37.686 7ff61722b700  0 log_channel(cluster) log [INF] : 
>> > mon.cn1 calling monitor election
>> > 2019-11-08 09:02:37.686 7ff61722b700  1 mon.cn1@0(electing).elector(31157) 
>> > init, last seen epoch 31157, mid-election, bumping
>> > 2019-11-08 09:02:37.688 7ff61722b700 -1 mon.cn1@0(electing) e3 failed to 
>> > get devid for : udev_device_new_from_subsystem_sysname failed on ''
>> > 2019-11-08 09:02:37.770 7ff61722b700  0 log_channel(cluster) log [INF] : 
>> > mon.cn1 is new leader, mons cn1,cn2,cn3,cn4,cn5 in quorum (ranks 0,1,2,3,4)
>> > 2019-11-08 09:02:37.857 7ff613a24700  0 log_channel(cluster) log [DBG] : 
>> > monmap e3: 5 mons at 
>> > {cn1=[v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0],cn2=[v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0],cn3=[v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0],cn4=[v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0],cn5=[v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0]}
>> >
>> >
>> >
>> > # ceph mon dump
>> > dumped monmap epoch 3
>> > epoch 3
>> > fsid 9dbf207a-561c-48ba-892d-3e79b86be12f
>> > last_changed 2019-09-03 07:53:39.031174
>> > created 2019-08-23 18:30:55.970279
>> > min_mon_release 14 (nautilus)
>> > 0: [v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0] mon.cn1
>> > 1: [v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0] mon.cn2
>> > 2: [v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0] mon.cn3
>> > 3: [v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0] mon.cn4
>> > 4: [v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0] mon.cn5
>> >
>> >
>> > # ceph -s
>> >   cluster:
>> > id: 9dbf207a-561c-48ba-892d-3e79b86be12f
>> > health: HEALTH_WARN
>> > 85 osds down
>> > 3 hosts (72 osds) down
>> > 1 nearfull osd(s)
>> > 1 pool(s) nearfull
>> > Reduced data availability: 2048 pgs inactive
>> > too few PGs per OSD (17 < min 30)

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread nokia ceph
Hi,



Below is the status of the OSD after restart.



# systemctl status ceph-osd@0.service

● ceph-osd@0.service - Ceph object storage daemon osd.0

   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service;
enabled-runtime; vendor preset: disabled)

  Drop-In: /etc/systemd/system/ceph-osd@.service.d

   └─90-ExecStart_NUMA.conf

   Active: active (running) since Fri 2019-11-08 10:32:51 UTC; 1min 1s ago

  Process: 219213 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster
${CLUSTER} --id %i (code=exited, status=0/SUCCESS)  Main PID: 219218
(ceph-osd)

   CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service

   └─219218 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser
ceph --setgroup ceph



Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object storage
daemon osd.0...

Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object storage
daemon osd.0.

Nov 08 10:33:03 cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:03.785
7f9adeed4d80 -1 osd.0 1795 log_to_monitors {default=true} Nov 08 10:33:05
cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:05.474 7f9ad14df700 -1
osd.0 1795 set_numa_affinity unable to identify public interface
'dss-client' numa n...r directory

Hint: Some lines were ellipsized, use -l to show in full.





And I have attached the logs in the file in this mail while this restart
was initiated.



On Fri, Nov 8, 2019 at 3:59 PM huang jun  wrote:

> try to restart some of the down osds in 'ceph osd tree', and to see
> what happened?
>
> nokia ceph  于2019年11月8日周五 下午6:24写道:
> >
> > Adding my official mail id
> >
> > -- Forwarded message -
> > From: nokia ceph 
> > Date: Fri, Nov 8, 2019 at 3:57 PM
> > Subject: OSD's not coming up in Nautilus
> > To: Ceph Users 
> >
> >
> > Hi Team,
> >
> > There is one 5 node ceph cluster which we have upgraded from Luminous to
> Nautilus and everything was going well until yesterday when we noticed that
> the ceph osd's are marked down and not recognized by the monitors as
> running eventhough the osd processes are running.
> >
> > We noticed that the admin.keyring and the mon.keyring are missing in the
> nodes which we have recreated it with the below commands.
> >
> > ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring
> --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds
> allow
> >
> > ceph-authtool --create_keyring /etc/ceph/ceph.mon.keyring --gen-key -n
> mon. --cap mon 'allow *'
> >
> > In logs we find the below lines.
> >
> > 2019-11-08 09:01:50.525 7ff61722b700  0 log_channel(audit) log [DBG] :
> from='client.? 10.50.11.44:0/2398064782' entity='client.admin'
> cmd=[{"prefix": "df", "format": "json"}]: dispatch
> > 2019-11-08 09:02:37.686 7ff61722b700  0 log_channel(cluster) log [INF] :
> mon.cn1 calling monitor election
> > 2019-11-08 09:02:37.686 7ff61722b700  1 mon.cn1@0(electing).elector(31157)
> init, last seen epoch 31157, mid-election, bumping
> > 2019-11-08 09:02:37.688 7ff61722b700 -1 mon.cn1@0(electing) e3 failed
> to get devid for : udev_device_new_from_subsystem_sysname failed on ''
> > 2019-11-08 09:02:37.770 7ff61722b700  0 log_channel(cluster) log [INF] :
> mon.cn1 is new leader, mons cn1,cn2,cn3,cn4,cn5 in quorum (ranks 0,1,2,3,4)
> > 2019-11-08 09:02:37.857 7ff613a24700  0 log_channel(cluster) log [DBG] :
> monmap e3: 5 mons at {cn1=[v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0
> ],cn2=[v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0],cn3=[v2:
> 10.50.11.43:3300/0,v1:10.50.11.43:6789/0],cn4=[v2:
> 10.50.11.44:3300/0,v1:10.50.11.44:6789/0],cn5=[v2:
> 10.50.11.45:3300/0,v1:10.50.11.45:6789/0]}
> >
> >
> >
> > # ceph mon dump
> > dumped monmap epoch 3
> > epoch 3
> > fsid 9dbf207a-561c-48ba-892d-3e79b86be12f
> > last_changed 2019-09-03 07:53:39.031174
> > created 2019-08-23 18:30:55.970279
> > min_mon_release 14 (nautilus)
> > 0: [v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0] mon.cn1
> > 1: [v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0] mon.cn2
> > 2: [v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0] mon.cn3
> > 3: [v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0] mon.cn4
> > 4: [v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0] mon.cn5
> >
> >
> > # ceph -s
> >   cluster:
> > id: 9dbf207a-561c-48ba-892d-3e79b86be12f
> > health: HEALTH_WARN
> > 85 osds down
> > 3 hosts (72 osds) down
> > 1 nearfull osd(s)
> > 1 pool(s) nearfull
> > Reduced data availability: 2048 pgs inactive
> > too few PGs per OSD (17 < min 30)
> > 1/5 mons down, quorum cn2,cn3,cn4,cn5
> >
> >   services:
> > mon: 5 daemons, quorum cn2,cn3,cn4,cn5 (age 57s), out of quorum: cn1
> > mgr: cn1(active, since 73m), standbys: cn2, cn3, cn4, cn5
> > osd: 120 osds: 35 up, 120 in; 909 remapped pgs
> >
> >   data:
> > pools:   1 pools, 2048 pgs
> > objects: 0 objects, 0 B
> > usage:   176 TiB used, 260 TiB / 437 TiB avail
> > pgs: 100.000% pgs 

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread huang jun
try to restart some of the down osds in 'ceph osd tree', and to see
what happened?

nokia ceph  于2019年11月8日周五 下午6:24写道:
>
> Adding my official mail id
>
> -- Forwarded message -
> From: nokia ceph 
> Date: Fri, Nov 8, 2019 at 3:57 PM
> Subject: OSD's not coming up in Nautilus
> To: Ceph Users 
>
>
> Hi Team,
>
> There is one 5 node ceph cluster which we have upgraded from Luminous to 
> Nautilus and everything was going well until yesterday when we noticed that 
> the ceph osd's are marked down and not recognized by the monitors as running 
> eventhough the osd processes are running.
>
> We noticed that the admin.keyring and the mon.keyring are missing in the 
> nodes which we have recreated it with the below commands.
>
> ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key 
> -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds allow
>
> ceph-authtool --create_keyring /etc/ceph/ceph.mon.keyring --gen-key -n mon. 
> --cap mon 'allow *'
>
> In logs we find the below lines.
>
> 2019-11-08 09:01:50.525 7ff61722b700  0 log_channel(audit) log [DBG] : 
> from='client.? 10.50.11.44:0/2398064782' entity='client.admin' 
> cmd=[{"prefix": "df", "format": "json"}]: dispatch
> 2019-11-08 09:02:37.686 7ff61722b700  0 log_channel(cluster) log [INF] : 
> mon.cn1 calling monitor election
> 2019-11-08 09:02:37.686 7ff61722b700  1 mon.cn1@0(electing).elector(31157) 
> init, last seen epoch 31157, mid-election, bumping
> 2019-11-08 09:02:37.688 7ff61722b700 -1 mon.cn1@0(electing) e3 failed to get 
> devid for : udev_device_new_from_subsystem_sysname failed on ''
> 2019-11-08 09:02:37.770 7ff61722b700  0 log_channel(cluster) log [INF] : 
> mon.cn1 is new leader, mons cn1,cn2,cn3,cn4,cn5 in quorum (ranks 0,1,2,3,4)
> 2019-11-08 09:02:37.857 7ff613a24700  0 log_channel(cluster) log [DBG] : 
> monmap e3: 5 mons at 
> {cn1=[v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0],cn2=[v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0],cn3=[v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0],cn4=[v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0],cn5=[v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0]}
>
>
>
> # ceph mon dump
> dumped monmap epoch 3
> epoch 3
> fsid 9dbf207a-561c-48ba-892d-3e79b86be12f
> last_changed 2019-09-03 07:53:39.031174
> created 2019-08-23 18:30:55.970279
> min_mon_release 14 (nautilus)
> 0: [v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0] mon.cn1
> 1: [v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0] mon.cn2
> 2: [v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0] mon.cn3
> 3: [v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0] mon.cn4
> 4: [v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0] mon.cn5
>
>
> # ceph -s
>   cluster:
> id: 9dbf207a-561c-48ba-892d-3e79b86be12f
> health: HEALTH_WARN
> 85 osds down
> 3 hosts (72 osds) down
> 1 nearfull osd(s)
> 1 pool(s) nearfull
> Reduced data availability: 2048 pgs inactive
> too few PGs per OSD (17 < min 30)
> 1/5 mons down, quorum cn2,cn3,cn4,cn5
>
>   services:
> mon: 5 daemons, quorum cn2,cn3,cn4,cn5 (age 57s), out of quorum: cn1
> mgr: cn1(active, since 73m), standbys: cn2, cn3, cn4, cn5
> osd: 120 osds: 35 up, 120 in; 909 remapped pgs
>
>   data:
> pools:   1 pools, 2048 pgs
> objects: 0 objects, 0 B
> usage:   176 TiB used, 260 TiB / 437 TiB avail
> pgs: 100.000% pgs unknown
>  2048 unknown
>
>
> The osd logs show the below logs.
>
> 2019-11-08 09:05:33.332 7fd1a36eed80  0 _get_class not permitted to load kvs
> 2019-11-08 09:05:33.332 7fd1a36eed80  0 _get_class not permitted to load lua
> 2019-11-08 09:05:33.337 7fd1a36eed80  0 _get_class not permitted to load sdk
> 2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features 
> 43262930805112, adjusting msgr requires for clients
> 2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features 
> 43262930805112 was 8705, adjusting msgr requires for mons
> 2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features 
> 1009090060360105984, adjusting msgr requires for osds
>
> Please let us know what might be the issue. There seems to be no network 
> issues in any of the servers public and private interfaces.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread nokia ceph
Adding my official mail id

-- Forwarded message -
From: nokia ceph 
Date: Fri, Nov 8, 2019 at 3:57 PM
Subject: OSD's not coming up in Nautilus
To: Ceph Users 


Hi Team,

There is one 5 node ceph cluster which we have upgraded from Luminous to
Nautilus and everything was going well until yesterday when we noticed that
the ceph osd's are marked down and not recognized by the monitors as
running eventhough the osd processes are running.

We noticed that the admin.keyring and the mon.keyring are missing in the
nodes which we have recreated it with the below commands.

ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring
--gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds
allow

ceph-authtool --create_keyring /etc/ceph/ceph.mon.keyring --gen-key -n mon.
--cap mon 'allow *'

In logs we find the below lines.

2019-11-08 09:01:50.525 7ff61722b700  0 log_channel(audit) log [DBG] :
from='client.? 10.50.11.44:0/2398064782' entity='client.admin'
cmd=[{"prefix": "df", "format": "json"}]: dispatch
2019-11-08 09:02:37.686 7ff61722b700  0 log_channel(cluster) log [INF] :
mon.cn1 calling monitor election
2019-11-08 09:02:37.686 7ff61722b700  1 mon.cn1@0(electing).elector(31157)
init, last seen epoch 31157, mid-election, bumping
2019-11-08 09:02:37.688 7ff61722b700 -1 mon.cn1@0(electing) e3 failed to
get devid for : udev_device_new_from_subsystem_sysname failed on ''
2019-11-08 09:02:37.770 7ff61722b700  0 log_channel(cluster) log [INF] :
mon.cn1 is new leader, mons cn1,cn2,cn3,cn4,cn5 in quorum (ranks 0,1,2,3,4)
2019-11-08 09:02:37.857 7ff613a24700  0 log_channel(cluster) log [DBG] :
monmap e3: 5 mons at {cn1=[v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0
],cn2=[v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0],cn3=[v2:
10.50.11.43:3300/0,v1:10.50.11.43:6789/0],cn4=[v2:
10.50.11.44:3300/0,v1:10.50.11.44:6789/0],cn5=[v2:
10.50.11.45:3300/0,v1:10.50.11.45:6789/0]}



# ceph mon dump
dumped monmap epoch 3
epoch 3
fsid 9dbf207a-561c-48ba-892d-3e79b86be12f
last_changed 2019-09-03 07:53:39.031174
created 2019-08-23 18:30:55.970279
min_mon_release 14 (nautilus)
0: [v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0] mon.cn1
1: [v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0] mon.cn2
2: [v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0] mon.cn3
3: [v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0] mon.cn4
4: [v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0] mon.cn5


# ceph -s
  cluster:
id: 9dbf207a-561c-48ba-892d-3e79b86be12f
health: HEALTH_WARN
85 osds down
3 hosts (72 osds) down
1 nearfull osd(s)
1 pool(s) nearfull
Reduced data availability: 2048 pgs inactive
too few PGs per OSD (17 < min 30)
1/5 mons down, quorum cn2,cn3,cn4,cn5

  services:
mon: 5 daemons, quorum cn2,cn3,cn4,cn5 (age 57s), out of quorum: cn1
mgr: cn1(active, since 73m), standbys: cn2, cn3, cn4, cn5
osd: 120 osds: 35 up, 120 in; 909 remapped pgs

  data:
pools:   1 pools, 2048 pgs
objects: 0 objects, 0 B
usage:   176 TiB used, 260 TiB / 437 TiB avail
pgs: 100.000% pgs unknown
 2048 unknown


The osd logs show the below logs.

2019-11-08 09:05:33.332 7fd1a36eed80  0 _get_class not permitted to load kvs
2019-11-08 09:05:33.332 7fd1a36eed80  0 _get_class not permitted to load lua
2019-11-08 09:05:33.337 7fd1a36eed80  0 _get_class not permitted to load sdk
2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features
43262930805112, adjusting msgr requires for clients
2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features
43262930805112 was 8705, adjusting msgr requires for mons
2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features
1009090060360105984, adjusting msgr requires for osds

Please let us know what might be the issue. There seems to be no network
issues in any of the servers public and private interfaces.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD's not coming up in Nautilus

2019-11-08 Thread nokia ceph
Hi Team,

There is one 5 node ceph cluster which we have upgraded from Luminous to
Nautilus and everything was going well until yesterday when we noticed that
the ceph osd's are marked down and not recognized by the monitors as
running eventhough the osd processes are running.

We noticed that the admin.keyring and the mon.keyring are missing in the
nodes which we have recreated it with the below commands.

ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring
--gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds
allow

ceph-authtool --create_keyring /etc/ceph/ceph.mon.keyring --gen-key -n mon.
--cap mon 'allow *'

In logs we find the below lines.

2019-11-08 09:01:50.525 7ff61722b700  0 log_channel(audit) log [DBG] :
from='client.? 10.50.11.44:0/2398064782' entity='client.admin'
cmd=[{"prefix": "df", "format": "json"}]: dispatch
2019-11-08 09:02:37.686 7ff61722b700  0 log_channel(cluster) log [INF] :
mon.cn1 calling monitor election
2019-11-08 09:02:37.686 7ff61722b700  1 mon.cn1@0(electing).elector(31157)
init, last seen epoch 31157, mid-election, bumping
2019-11-08 09:02:37.688 7ff61722b700 -1 mon.cn1@0(electing) e3 failed to
get devid for : udev_device_new_from_subsystem_sysname failed on ''
2019-11-08 09:02:37.770 7ff61722b700  0 log_channel(cluster) log [INF] :
mon.cn1 is new leader, mons cn1,cn2,cn3,cn4,cn5 in quorum (ranks 0,1,2,3,4)
2019-11-08 09:02:37.857 7ff613a24700  0 log_channel(cluster) log [DBG] :
monmap e3: 5 mons at {cn1=[v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0
],cn2=[v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0],cn3=[v2:
10.50.11.43:3300/0,v1:10.50.11.43:6789/0],cn4=[v2:
10.50.11.44:3300/0,v1:10.50.11.44:6789/0],cn5=[v2:
10.50.11.45:3300/0,v1:10.50.11.45:6789/0]}



# ceph mon dump
dumped monmap epoch 3
epoch 3
fsid 9dbf207a-561c-48ba-892d-3e79b86be12f
last_changed 2019-09-03 07:53:39.031174
created 2019-08-23 18:30:55.970279
min_mon_release 14 (nautilus)
0: [v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0] mon.cn1
1: [v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0] mon.cn2
2: [v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0] mon.cn3
3: [v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0] mon.cn4
4: [v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0] mon.cn5


# ceph -s
  cluster:
id: 9dbf207a-561c-48ba-892d-3e79b86be12f
health: HEALTH_WARN
85 osds down
3 hosts (72 osds) down
1 nearfull osd(s)
1 pool(s) nearfull
Reduced data availability: 2048 pgs inactive
too few PGs per OSD (17 < min 30)
1/5 mons down, quorum cn2,cn3,cn4,cn5

  services:
mon: 5 daemons, quorum cn2,cn3,cn4,cn5 (age 57s), out of quorum: cn1
mgr: cn1(active, since 73m), standbys: cn2, cn3, cn4, cn5
osd: 120 osds: 35 up, 120 in; 909 remapped pgs

  data:
pools:   1 pools, 2048 pgs
objects: 0 objects, 0 B
usage:   176 TiB used, 260 TiB / 437 TiB avail
pgs: 100.000% pgs unknown
 2048 unknown


The osd logs show the below logs.

2019-11-08 09:05:33.332 7fd1a36eed80  0 _get_class not permitted to load kvs
2019-11-08 09:05:33.332 7fd1a36eed80  0 _get_class not permitted to load lua
2019-11-08 09:05:33.337 7fd1a36eed80  0 _get_class not permitted to load sdk
2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features
43262930805112, adjusting msgr requires for clients
2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features
43262930805112 was 8705, adjusting msgr requires for mons
2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features
1009090060360105984, adjusting msgr requires for osds

Please let us know what might be the issue. There seems to be no network
issues in any of the servers public and private interfaces.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph patch mimic release 13.2.7-8?

2019-11-08 Thread Erikas Kučinskis
Hello everyone,

Anybody knows when the next update for ceph mimic stable is coming up?

Have a really nasty bug i hope will be fixed in the patch.

The bug is osd down on snaptrim, 9 months ago , on version 13.2.4 and i 
reported it. Tracker: BUG #38124

When i reported it few months later 13.2.6 came out and it was not implemented. 
Now i'm hoping with the next patch it will be implemented

Regards,

Erikas.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com