Re: [ceph-users] Ceph-ISCSI

Adrian Saul Wed, 11 Oct 2017 17:31:05 -0700

It’s a fair point – in our case we are based on CentOS so self-support only 
anyway (business does not like paying support costs).  At the time we evaluated 
LIO, SCST and STGT, with a  directive to use ALUA support instead of IP 
failover.   In the end we went with SCST as it had more mature ALUA support at 
the time, and was easier to integrate into pacemaker to support the ALUA 
failover, it also seemed to perform fairly well.


However given the road we have gone down and the issues we are facing as we 
scale up and load up the storage, having a vendor support channel would be a 
relief.


From: Samuel Soulard [mailto:samuel.soul...@gmail.com]
Sent: Thursday, 12 October 2017 11:20 AM
To: Adrian Saul <adrian.s...@tpgtelecom.com.au>
Cc: Zhu Lingshan <ls...@suse.com>; dilla...@redhat.com; ceph-users 
<ceph-us...@ceph.com>
Subject: RE: [ceph-users] Ceph-ISCSI

Yes I looked at this solution, and it seems interesting.  However, one point 
often stick with business requirements is commercial support.

With Redhat or Suse, you have support provided with the solution.   I'm not 
sure about SCST what support channel they offer.

Sam

On Oct 11, 2017 20:05, "Adrian Saul" 
<adrian.s...@tpgtelecom.com.au<mailto:adrian.s...@tpgtelecom.com.au>> wrote:

As an aside, SCST  iSCSI will support ALUA and does PGRs through the use of 
DLM.  We have been using that with Solaris and Hyper-V initiators for RBD 
backed storage but still have some ongoing issues with ALUA (probably our 
current config, we need to lab later recommendations).



> -----Original Message-----
> From: ceph-users 
> [mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
>  On Behalf Of
> Jason Dillaman
> Sent: Thursday, 12 October 2017 5:04 AM
> To: Samuel Soulard <samuel.soul...@gmail.com<mailto:samuel.soul...@gmail.com>>
> Cc: ceph-users <ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>>; Zhu 
> Lingshan <ls...@suse.com<mailto:ls...@suse.com>>
> Subject: Re: [ceph-users] Ceph-ISCSI
>
> On Wed, Oct 11, 2017 at 1:10 PM, Samuel Soulard
> <samuel.soul...@gmail.com<mailto:samuel.soul...@gmail.com>> wrote:
> > Hmmm, If you failover the identity of the LIO configuration including
> > PGRs (I believe they are files on disk), this would work no?  Using an
> > 2 ISCSI gateways which have shared storage to store the LIO
> > configuration and PGR data.
>
> Are you referring to the Active Persist Through Power Loss (APTPL) support
> in LIO where it writes the PR metadata to "/var/target/pr/aptpl_<wwn>"? I
> suppose that would work for a Pacemaker failover if you had a shared file
> system mounted between all your gateways *and* the initiator requests
> APTPL mode(?).
>
> > Also, you said another "fails over to another port", do you mean a
> > port on another ISCSI gateway?  I believe LIO with multiple target
> > portal IP on the same node for path redundancy works with PGRs.
>
> Yes, I was referring to the case with multiple active iSCSI gateways which
> doesn't currently distribute PGRs to all gateways in the group.
>
> > In my scenario, if my assumptions are correct, you would only have 1
> > ISCSI gateway available through 2 target portal IP (for data path
> > redundancy).  If this first ISCSI gateway fails, both target portal IP
> > failover to the standby node with the PGR data that is available on share
> stored.
> >
> >
> > Sam
> >
> > On Wed, Oct 11, 2017 at 12:52 PM, Jason Dillaman 
> > <jdill...@redhat.com<mailto:jdill...@redhat.com>>
> > wrote:
> >>
> >> On Wed, Oct 11, 2017 at 12:31 PM, Samuel Soulard
> >> <samuel.soul...@gmail.com<mailto:samuel.soul...@gmail.com>> wrote:
> >> > Hi to all,
> >> >
> >> > What if you're using an ISCSI gateway based on LIO and KRBD (that
> >> > is, RBD block device mounted on the ISCSI gateway and published
> >> > through LIO).
> >> > The
> >> > LIO target portal (virtual IP) would failover to another node.
> >> > This would theoretically provide support for PGRs since LIO does
> >> > support SPC-3.
> >> > Granted it is not distributed and limited to 1 single node
> >> > throughput, but this would achieve high availability required by
> >> > some environment.
> >>
> >> Yes, LIO technically supports PGR but it's not distributed to other
> >> nodes. If you have a pacemaker-initiated target failover to another
> >> node, the PGR state would be lost / missing after migration (unless I
> >> am missing something like a resource agent that attempts to preserve
> >> the PGRs). For initiator-initiated failover (e.g. a target is alive
> >> but the initiator cannot reach it), after it fails over to another
> >> port the PGR data won't be available.
> >>
> >> > Of course, multiple target portal would be awesome since available
> >> > throughput would be able to scale linearly, but since this isn't
> >> > here right now, this would provide at least an alternative.
> >>
> >> It would definitely be great to go active/active but there are
> >> concerns of data-corrupting edge conditions when using MPIO since it
> >> relies on client-side failure timers that are not coordinated with
> >> the target.
> >>
> >> For example, if an initiator writes to sector X down path A and there
> >> is delay to the path A target (i.e. the target and initiator timeout
> >> timers are not in-sync), and MPIO fails over to path B, quickly
> >> performs the write to sector X and performs second write to sector X,
> >> there is a possibility that eventually path A will unblock and
> >> overwrite the new value in sector 1 with the old value. The safe way
> >> to handle that would require setting the initiator-side IO timeouts
> >> to such high values as to cause higher-level subsystems to mark the
> >> MPIO path as failed should a failure actually occur.
> >>
> >> The iSCSI MCS protocol would address these concerns since in theory
> >> path B could discover that the retried IO was actually a retry, but
> >> alas it's not available in the Linux Open-iSCSI nor ESX iSCSI
> >> initiators.
> >>
> >> > On Wed, Oct 11, 2017 at 12:26 PM, David Disseldorp 
> >> > <dd...@suse.de<mailto:dd...@suse.de>>
> >> > wrote:
> >> >>
> >> >> Hi Jason,
> >> >>
> >> >> Thanks for the detailed write-up...
> >> >>
> >> >> On Wed, 11 Oct 2017 08:57:46 -0400, Jason Dillaman wrote:
> >> >>
> >> >> > On Wed, Oct 11, 2017 at 6:38 AM, Jorge Pinilla López
> >> >> > <jorp...@unizar.es<mailto:jorp...@unizar.es>>
> >> >> > wrote:
> >> >> >
> >> >> > > As far as I am able to understand there are 2 ways of setting
> >> >> > > iscsi for ceph
> >> >> > >
> >> >> > > 1- using kernel (lrbd) only able on SUSE, CentOS, fedora...
> >> >> > >
> >> >> >
> >> >> > The target_core_rbd approach is only utilized by SUSE (and its
> >> >> > derivatives like PetaSAN) as far as I know. This was the initial
> >> >> > approach for Red Hat-derived kernels as well until the upstream
> >> >> > kernel maintainers indicated that they really do not want a
> >> >> > specialized target backend for just krbd.
> >> >> > The next attempt was to re-use the existing target_core_iblock
> >> >> > to interface with krbd via the kernel's block layer, but that
> >> >> > hit similar upstream walls trying to get support for SCSI
> >> >> > command passthrough to the block layer.
> >> >> >
> >> >> >
> >> >> > > 2- using userspace (tcmu , ceph-iscsi-conf, ceph-iscsi-cli)
> >> >> > >
> >> >> >
> >> >> > The TCMU approach is what upstream and Red Hat-derived kernels
> >> >> > will support going forward.
> >> >>
> >> >> SUSE is also in the process of migrating to the upstream tcmu
> >> >> approach, for the reasons that you gave in (1).
> >> >>
> >> >> ...
> >> >>
> >> >> > The TCMU approach also does not currently support SCSI
> >> >> > persistent reservation groups (needed for Windows clustering)
> >> >> > because that support isn't available in the upstream kernel. The
> >> >> > SUSE kernel has an approach that utilizes two round-trips to the
> >> >> > OSDs for each IO to simulate PGR support. Earlier this summer I
> >> >> > believe SUSE started to look into how to get generic PGR support
> >> >> > merged into the upstream kernel using corosync/dlm to
> >> >> > synchronize the states between multiple nodes in the target. I
> >> >> > am not sure of the current state of that work, but it would
> >> >> > benefit all LIO targets when complete.
> >> >>
> >> >> Zhu Lingshan (cc'ed) worked on a prototype for tcmu PR support.
> >> >> IIUC, whether DLM or the underlying Ceph cluster gets used for PR
> >> >> state storage is still under consideration.
> >> >>
> >> >> Cheers, David
> >> >> _______________________________________________
> >> >> ceph-users mailing list
> >> >> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Jason
> >
> >
>
>
>
> --
> Jason
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph-ISCSI

Reply via email to