[ceph-users] Re: iSCSI Gateway reboots and permanent loss

Wesley Dillingham Thu, 05 Dec 2019 13:24:45 -0800

Thats great thank you so much. I will try and get this patch in my test env
asap but will likely wait for official release cut for prod. I really
appreciate you adding this in to the product.


Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Thu, Dec 5, 2019 at 4:14 PM Mike Christie <mchri...@redhat.com> wrote:

> On 12/04/2019 02:34 PM, Wesley Dillingham wrote:
> > I have never had a permanent loss of a gateway but I'm a believer in
> > Murphy's law and want to have a plan. Glad to hear that there is a
> > solution in-the-works, curious when might that be available in a
> > release? If sooner than later I'll plan to upgrade then immediately,
>
> It should be in the next release which I think we would just make when
> the patch gets merged since we have a good number of fixes sitting in
> the repo.
>
> Patch/PR is here
>
> https://github.com/ceph/ceph-iscsi/pull/156
>
> if you have a non production setup and are used to applying patches and
> testing upstream.
>
> > otherwise, if far down the queue I would like to know if I should ready
> > a standby server.
> >
> >  Thanks so much for all your great work on this product.
> >
> >
> > Respectfully,
> >
> > *Wes Dillingham*
> > w...@wesdillingham.com <mailto:w...@wesdillingham.com>
> > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> >
> >
> > On Wed, Dec 4, 2019 at 11:18 AM Mike Christie <mchri...@redhat.com
> > <mailto:mchri...@redhat.com>> wrote:
> >
> >     On 12/04/2019 08:26 AM, Gesiel Galvão Bernardes wrote:
> >     > Hi,
> >     >
> >     > Em qua., 4 de dez. de 2019 às 00:31, Mike Christie
> >     <mchri...@redhat.com <mailto:mchri...@redhat.com>
> >     > <mailto:mchri...@redhat.com <mailto:mchri...@redhat.com>>>
> escreveu:
> >     >
> >     >     On 12/03/2019 04:19 PM, Wesley Dillingham wrote:
> >     >     > Thanks. If I am reading this correctly the ability to remove
> >     an iSCSI
> >     >     > gateway would allow the remaining iSCSI gateways to take
> >     over for the
> >     >     > removed gateway's LUN's as of > 3.0. Thats good, we run 3.2.
> >     However,
> >     >     > because the actual update of the central config object
> happens
> >     >     from the
> >     >     > to-be-deleted iSCSI gateway, despite where the gwcli command
> is
> >     >     issued,
> >     >     > it will fail to actually remove said gateway from the object
> >     if that
> >     >     > gateway is not functioning.
> >     >
> >     >     Yes.
> >     >
> >     >     >
> >     >     > I guess this leaves the question still of how to proceed
> >     when one
> >     >     of the
> >     >     > iSCSI gateways fails permanently?  Is that possible, or is it
> >     >     > potentially possible other than manually intervening on the
> >     config
> >     >
> >     >     You could edit the gateway.cfg manually, but I would not do
> >     it, because
> >     >     it's error prone.
> >     >
> >     >     It's probably safest to run in degraded mode and wait for an
> >     updated
> >     >     ceph-iscsi package with a fix. If you are running into the
> >     problem right
> >     >     now, I can bump the priority.
> >     >
> >     > I permanently lost a gateway. I can not leave running "degraded"
> >     because
> >     > I need to add another redundancy gateway, and it does not allow
> >     with the
> >     > gateway "offline".
> >     >
> >     > In this case, what can I do? If I create a new gateway with the
> same
> >     > name and IP as the lost one, and then try to use "delete" in
> >     gwcli, will
> >     > it work?
> >
> >     Yes.
> >
> >     If you can have a temp stop in services you can also do the
> following as
> >     a workaround:
> >
> >     0. Stop applications accessing iscsi luns, and have the initiator log
> >     out of the iscsi target.
> >
> >     1. Stop ceph iscsi service. On all iscsi gw nodes do:
> >
> >     systemctl stop rbd-target-api
> >
> >     2. Delete gateway.cfg. This will delete the configuration info like
> the
> >     target and its ACL and LUN mappings. It does not delete the actual
> >     images or pools that you have data on.
> >
> >     rados -p rbd rm gateway.cfg
> >
> >     3. Start ceph iscsi services again. On all iscsi gw nodes do:
> >
> >     systemctl start rbd-target-api
> >
> >     4. Resetup target with gwcli. For the image/disk setup stage,
> instead of
> >     doing the "create" command do the "attach"command:
> >
> >     attach pool=your_pool image=image_name
> >
> >     Then just re-add your target, ACLs and LUN mappings.
> >
> >     5. On the initiator side relogin to the iscsi target.
> >
> >
> >     >
> >     >
> >     >
> >     >     > object? If its not possible would the best course of action
> >     be to have
> >     >     > standby hardware and quickly recreate the node or perhaps
> >     run the
> >     >     > gateways more ephemerally, from a VM or container?
> >     >     >
> >     >     > Thanks again.
> >     >     >
> >     >     > Respectfully,
> >     >     >
> >     >     > *Wes Dillingham*
> >     >     > w...@wesdillingham.com <mailto:w...@wesdillingham.com>
> >     <mailto:w...@wesdillingham.com <mailto:w...@wesdillingham.com>>
> >     >     <mailto:w...@wesdillingham.com <mailto:w...@wesdillingham.com>
> >     <mailto:w...@wesdillingham.com <mailto:w...@wesdillingham.com>>>
> >     >     > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> >     >     >
> >     >     >
> >     >     > On Tue, Dec 3, 2019 at 2:45 PM Mike Christie
> >     <mchri...@redhat.com <mailto:mchri...@redhat.com>
> >     >     <mailto:mchri...@redhat.com <mailto:mchri...@redhat.com>>
> >     >     > <mailto:mchri...@redhat.com <mailto:mchri...@redhat.com>
> >     <mailto:mchri...@redhat.com <mailto:mchri...@redhat.com>>>> wrote:
> >     >     >
> >     >     >     I do not think it's going to do what you want when the
> node
> >     >     you want to
> >     >     >     delete is down.
> >     >     >
> >     >     >     It looks like we only temporarily stop the gw from being
> >     >     exported. It
> >     >     >     does not update the gateway.cfg, because we do the config
> >     >     removal call
> >     >     >     on the node we want to delete.
> >     >     >
> >     >     >     So gwcli would report success and the ls command will
> >     show it
> >     >     as no
> >     >     >     longer running/exported, but if you restart the
> >     rbd-target-api
> >     >     service
> >     >     >     then it will show up again.
> >     >     >
> >     >     >     There is an internal command to do what you want. I will
> >     post
> >     >     a PR for
> >     >     >     gwlci and so it can be used by dashboard.
> >     >     >
> >     >     >
> >     >     >     On 12/03/2019 01:19 PM, Jason Dillaman wrote:
> >     >     >     > If I recall correctly, the recent ceph-iscsi release
> >     >     supports the
> >     >     >     > removal of a gateway via the "gwcli". I think the Ceph
> >     >     dashboard can
> >     >     >     > do that as well.
> >     >     >     >
> >     >     >     > On Tue, Dec 3, 2019 at 1:59 PM Wesley Dillingham
> >     >     >     <w...@wesdillingham.com <mailto:w...@wesdillingham.com>
> >     <mailto:w...@wesdillingham.com <mailto:w...@wesdillingham.com>>
> >     >     <mailto:w...@wesdillingham.com <mailto:w...@wesdillingham.com>
> >     <mailto:w...@wesdillingham.com <mailto:w...@wesdillingham.com>>>>
> wrote:
> >     >     >     >>
> >     >     >     >> We utilize 4 iSCSI gateways in a cluster and have
> >     noticed the
> >     >     >     following during patching cycles when we sequentially
> reboot
> >     >     single
> >     >     >     iSCSI-gateways:
> >     >     >     >>
> >     >     >     >> "gwcli" often hangs on the still-up iSCSI GWs but
> >     sometimes
> >     >     still
> >     >     >     functions and gives the message:
> >     >     >     >>
> >     >     >     >> "1 gateway is inaccessible - updates will be disabled"
> >     >     >     >>
> >     >     >     >> This got me thinking about what the course of action
> >     would be
> >     >     >     should an iSCSI gateway fail permanently or
> >     semi-permanently,
> >     >     say a
> >     >     >     hardware issue. What would be the best course of action
> to
> >     >     instruct
> >     >     >     the remaining iSCSI gateways that one of them is no
> longer
> >     >     available
> >     >     >     and that they should allow updates again and take
> >     ownership of the
> >     >     >     now-defunct-node's LUNS?
> >     >     >     >>
> >     >     >     >> I'm guessing pulling down the RADOS config object and
> >     rewriting
> >     >     >     it and re-put'ing it followed by a rbd-target-api
> >     restart might do
> >     >     >     the trick but am hoping there is a more "in-band" and
> less
> >     >     >     potentially devastating way to do this.
> >     >     >     >>
> >     >     >     >> Thanks for any insights.
> >     >     >     >>
> >     >     >     >> Respectfully,
> >     >     >     >>
> >     >     >     >> Wes Dillingham
> >     >     >     >> w...@wesdillingham.com <mailto:w...@wesdillingham.com>
> >     <mailto:w...@wesdillingham.com <mailto:w...@wesdillingham.com>>
> >     >     <mailto:w...@wesdillingham.com <mailto:w...@wesdillingham.com>
> >     <mailto:w...@wesdillingham.com <mailto:w...@wesdillingham.com>>>
> >     >     >     >> LinkedIn
> >     >     >     >> _______________________________________________
> >     >     >     >> ceph-users mailing list -- ceph-users@ceph.io
> >     <mailto:ceph-users@ceph.io>
> >     >     <mailto:ceph-users@ceph.io <mailto:ceph-users@ceph.io>>
> >     >     >     <mailto:ceph-users@ceph.io <mailto:ceph-users@ceph.io>
> >     <mailto:ceph-users@ceph.io <mailto:ceph-users@ceph.io>>>
> >     >     >     >> To unsubscribe send an email to
> >     ceph-users-le...@ceph.io <mailto:ceph-users-le...@ceph.io>
> >     >     <mailto:ceph-users-le...@ceph.io
> >     <mailto:ceph-users-le...@ceph.io>>
> >     >     >     <mailto:ceph-users-le...@ceph.io
> >     <mailto:ceph-users-le...@ceph.io>
> >     >     <mailto:ceph-users-le...@ceph.io
> >     <mailto:ceph-users-le...@ceph.io>>>
> >     >     >     >
> >     >     >     >
> >     >     >     >
> >     >     >
> >     >     _______________________________________________
> >     >     ceph-users mailing list -- ceph-users@ceph.io
> >     <mailto:ceph-users@ceph.io>
> >     >     <mailto:ceph-users@ceph.io <mailto:ceph-users@ceph.io>>
> >     >     To unsubscribe send an email to ceph-users-le...@ceph.io
> >     <mailto:ceph-users-le...@ceph.io>
> >     >     <mailto:ceph-users-le...@ceph.io
> >     <mailto:ceph-users-le...@ceph.io>>
> >     >
> >     _______________________________________________
> >     ceph-users mailing list -- ceph-users@ceph.io
> >     <mailto:ceph-users@ceph.io>
> >     To unsubscribe send an email to ceph-users-le...@ceph.io
> >     <mailto:ceph-users-le...@ceph.io>
> >
>
>

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: iSCSI Gateway reboots and permanent loss

Reply via email to