In case you see this : I experienced multiple occurrences of wrong crushmap
placement. Ie. A osd running on node A would be on node B in the map. This
was a direct result of using ceph.target for starts. Ceph would not report
this as anything , i.e. Health ok.

Also performance testing indicated less than expected RBD performance in
this state. After fixing manually performance would return to expected
levels.

Best Regards
Wade
On Thu, May 5, 2016 at 1:47 PM Ben Hines <bhi...@gmail.com> wrote:

> Nevermind, they just came back. Looks like i had some other issues, such
> as manually enabled ceph-osd@#.service files in systemd config for OSDs
> that had been moved to different nodes.
>
> The root problem is clearly that ceph-osd-prestart updates the crush map
> before the OSD successfully starts at all. If there's duplicate IDs for
> example, due to leftover files or somesuch, then a working OSD on another
> OSD may be forcibly moved in the crush map to another node where it doesn't
> exist. I would expect OSDs to update their own location in CRUSH, rather
> than having this be a prestart step.
>
> -Ben
>
>
> On Wed, May 4, 2016 at 10:27 PM, Ben Hines <bhi...@gmail.com> wrote:
>
>> Centos 7.2.
>>
>> .. and i think i just figured it out. One node had directories from
>> former OSDs in /var/lib/ceph/osd. When restarting other OSDs on this host,
>> ceph apparently added those to the crush map, too.
>>
>> [root@sm-cld-mtl-013 osd]# ls -la /var/lib/ceph/osd/
>> total 128
>> drwxr-x--- 8 ceph ceph  90 Feb 24 14:44 .
>> drwxr-x--- 9 ceph ceph 106 Feb 24 14:44 ..
>> drwxr-xr-x 2 root root   6 Jul  2  2015 ceph-42
>> drwxr-xr-x 2 root root   6 Jul  2  2015 ceph-43
>> drwxr-xr-x 1 root root 278 May  4 22:21 ceph-44
>> drwxr-xr-x 1 root root 278 May  4 22:21 ceph-45
>> drwxr-xr-x 1 root root 278 May  4 22:25 ceph-67
>> drwxr-xr-x 1 root root 304 May  4 22:25 ceph-86
>>
>>
>> (42 and 43 are on a different host.. yet when 'systemctl start
>> ceph.target' is used, the osd preflight adds them to the crush map anyway:
>>
>>
>> May  4 22:13:26 sm-cld-mtl-013 ceph-osd: starting osd.67 at :/0 osd_data
>> /var/lib/ceph/osd/ceph-67 /var/lib/ceph/osd/ceph-67/journal
>> May  4 22:13:26 sm-cld-mtl-013 ceph-osd: starting osd.45 at :/0 osd_data
>> /var/lib/ceph/osd/ceph-45 /var/lib/ceph/osd/ceph-45/journal
>> May  4 22:13:26 sm-cld-mtl-013 ceph-osd: WARNING: will not setuid/gid:
>> /var/lib/ceph/osd/ceph-42 owned by 0:0 and not requested 167:167
>> May  4 22:13:26 sm-cld-mtl-013 ceph-osd: 2016-05-04 22:13:26.529176
>> 7f00cca7c900 -1 #033[0;31m ** ERROR: unable to open OSD superblock on
>> /var/lib/ceph/osd/ceph-43: (2) No such file or directory#033[0m
>> May  4 22:13:26 sm-cld-mtl-013 ceph-osd: 2016-05-04 22:13:26.534657
>> 7fb55c17e900 -1 #033[0;31m ** ERROR: unable to open OSD superblock on
>> /var/lib/ceph/osd/ceph-42: (2) No such file or directory#033[0m
>> May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@43.service: main
>> process exited, code=exited, status=1/FAILURE
>> May  4 22:13:26 sm-cld-mtl-013 systemd: Unit ceph-osd@43.service entered
>> failed state.
>> May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@43.service failed.
>> May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@42.service: main
>> process exited, code=exited, status=1/FAILURE
>> May  4 22:13:26 sm-cld-mtl-013 systemd: Unit ceph-osd@42.service entered
>> failed state.
>> May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@42.service failed.
>>
>>
>>
>> -Ben
>>
>> On Tue, May 3, 2016 at 7:16 PM, Wade Holler <wade.hol...@gmail.com>
>> wrote:
>>
>>> Hi Ben,
>>>
>>> What OS+Version ?
>>>
>>> Best Regards,
>>> Wade
>>>
>>>
>>> On Tue, May 3, 2016 at 2:44 PM Ben Hines <bhi...@gmail.com> wrote:
>>>
>>>> My crush map keeps putting some OSDs on the wrong node. Restarting them
>>>> fixes it temporarily, but they eventually hop back to the other node that
>>>> they aren't really on.
>>>>
>>>> Is there anything that can cause this to look for?
>>>>
>>>> Ceph 9.2.1
>>>>
>>>> -Ben
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to