Resending in case this email was lost

On Tue, Jan 23, 2018 at 10:50 PM Mayank Kumar <krmaya...@gmail.com> wrote:

> Thanks Burkhard for the detailed explanation. Regarding the following:-
>
> >>>The ceph client (librbd accessing a volume in this case) gets
> asynchronous notification from the ceph mons in case of relevant changes,
> e.g. updates to the osd map reflecting the failure of an OSD.
> i have some more questions:-
> 1:  Does the asynchronous notification for both osdmap and monmap comes
> from mons ?
> 2:  Are these asynchronous notifications retriable ?
> 3: Is it possible that the these asynchronous notifications are lost  ?
> 4: Does the monmap and osdmap reside in the kernel or user space ? The
> reason i am asking is , for a rbd volume that is already mounted on a host,
> will it continue to receive those asynchronoous notifications for changes
> to both osd and mon ips or not ? If All mon ips change,  but the mon
> configuration file is updated to reflect the new mon ips, should the
> existing rbd volume mounted still be able to contact the osd's and mons or
> is there some form of caching in the kernel space for an already mounted
> rbd volume
>
>
> Some more context for why i am getting all these doubts:-
> We internally had a ceph cluster with rbd volumes being provisioned by
> Kubernetes. With existing rbd volumes already mounted , we wiped out the
> old ceph cluster and created a brand new ceph cluster . But the existing
> rbd volumes from the old cluster still remained. Any kubernetes pods that
> landed on the same host as an old rbd volume would not create because the
> volume failed to attach and mount. Looking at the kernel messages we saw
> the following:-
>
> -- Logs begin at Fri 2018-01-19 02:05:38 GMT, end at Fri 2018-01-19
> 19:23:14 GMT. --
>
> Jan 19 19:20:39 host1.com kernel: *libceph: osd2 10.231.171.131:6808
> <http://10.231.171.131:6808/> socket closed (con state CONNECTING)*
>
> Jan 19 19:18:30 host1.com kernel: *libceph: osd28 10.231.171.52:6808
> <http://10.231.171.52:6808/> socket closed (con state CONNECTING)*
>
> Jan 19 19:18:30 host1.com kernel: *libceph: osd0 10.231.171.131:6800
> <http://10.231.171.131:6800/> socket closed (con state CONNECTING)*
>
> Jan 19 19:15:40 host1.com kernel: *libceph: osd21 10.231.171.99:6808
> <http://10.231.171.99:6808/> wrong peer at address*
>
> Jan 19 19:15:40 host1.com kernel: *libceph: wrong peer,
> want 10.231.171.99:6808/42661 <http://10.231.171.99:6808/42661>,
> got 10.231.171.99:6808/73168 <http://10.231.171.99:6808/73168>*
>
> Jan 19 19:15:34 host1.com kernel: *libceph: osd11 10.231.171.114:6816
> <http://10.231.171.114:6816/> wrong peer at address*
>
> Jan 19 19:15:34 host1.com kernel: *libceph: wrong peer,
> want 10.231.171.114:6816/130908 <http://10.231.171.114:6816/130908>,
> got 10.231.171.114:6816/85562 <http://10.231.171.114:6816/85562>*
>
> The Ceph cluster had new osd ip and mon ips.
>
> So my questions, since these messages are coming from the kernel module,
> why cant the kernel module figure out that the mon and osd ips have
> changed. Is there some caching in the kernel ? when rbd create/attach is
> called on that host, it is passed new mon ips , so doesnt that update the
> old already mounted rbd volumes.
>
> Hope i made my doubts clear and yes i am a beginner in Ceph with very
> limited knowledge.
>
> Thanks for your help again
> Mayank
>
>
> On Tue, Jan 23, 2018 at 1:24 AM, Burkhard Linke <
> burkhard.li...@computational.bio.uni-giessen.de> wrote:
>
>> Hi,
>>
>>
>> On 01/23/2018 09:53 AM, Mayank Kumar wrote:
>>
>>> Hi Ceph Experts
>>>
>>> I am a new user of Ceph and currently using Kubernetes to deploy Ceph
>>> RBD Volumes. We our doing some initial work rolling it out to internal
>>> customers and in doing that we are using the ip of the host as the ip of
>>> the osd and mons. This means if a host goes down , we loose that ip. While
>>> we are still experimenting with these behaviors, i wanted to see what the
>>> community thinks for the following scenario :-
>>>
>>> 1: a rbd volume is already attached and mounted on host A
>>> 2: the osd on which this rbd volume resides, dies and never comes back up
>>> 3: another osd is replaced in its place. I dont know the intricacies
>>> here, but i am assuming the data for this rbd volume either moves to
>>> different osd's or goes back to the newly installed osd
>>> 4: the new osd has completley new ip
>>> 5: will the rbd volume attached to host A learn the new osd ip on which
>>> its data resides and everything just continues to work ?
>>>
>>> What if all the mons also have changed ip ?
>>>
>> A volume does not reside "on a osd". The volume is striped, and each
>> strip is stored in a placement group; the placement group on the other hand
>> is distributed to several OSDs depending on the crush rules and the number
>> of replicates.
>>
>> If an OSD dies, ceph will backfill the now missing replicates to another
>> OSD, given another OSD satisfying the crush rules is available. The same
>> process is also triggered if an OSD is added.
>>
>> This process is somewhat transparent to the ceph client, as long as
>> enough replicates a present. The ceph client (librbd accessing a volume in
>> this case) gets asynchronous notification from the ceph mons in case of
>> relevant changes, e.g. updates to the osd map reflecting the failure of an
>> OSD. Traffic to the OSD will be automatically rerouted depending on the
>> crush rules as explained above. The OSD map also contains the IP address of
>> all OSDs, so changes to the IP address are just another update to the map.
>>
>> The only problem you might run into is changing the IP address of the
>> mons. There's also a mon map listing all active mons; if the mon a ceph
>> client is using dies/is removed, the client will switch to another active
>> mon from the map. This works fine in a running system; you can change the
>> IP address of a mon one by one without any interruption to the client
>> (theoretically....).
>>
>> The problem is starting the ceph client. In this case the client uses the
>> list of mons from the ceph configuration file to contact one mon and
>> receive the initial mon map. If you change the hostnames/IP address of the
>> mons, you also need to update the ceph configuration file.
>>
>> The above outline is how it should work, given a valid ceph and network
>> setup. YMMV.
>>
>> Regards,
>> Burkhard
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to