Oh yeah, I am not completely sure (have not tested myself), but if you
were doing a setup where you were not using a clustering app like
windows/redhat clustering that uses PRs, did not use vmfs and were
instead accessing the disks exported by LIO/TGT directly in the vm
(either using the guest's iscsi client or as a raw esx device), and were
not using ESX clustering, then you might be safe doing active/passive or
active/active with no modifications needed other than some scripts to
distribute the setup info across LIO/TGT nodes.

Were any of you trying this type of setup when you were describing your
results? If so, were you running oracle or something like that? Just

On 01/27/2015 08:58 PM, Mike Christie wrote:
> I do not know about perf, but here is some info on what is safe and
> general info.
> - If you are not using VAAI then it will use older style RESERVE/RELEASE
> commands only.
> If you are using VAAI ATS, and doing active/active then you need
> something, like the lock/sync talked about in the slides/hammer doc,
> that would coordinate multiple ATS/COMPARE_AND_WRITEs from executing at
> the same time on the same sectors. You probably do not ever see problems
> today, because it seems ESX normally does this command for only one
> sector and I do not think there are multiple commands for the same
> sectors in flight normally.
> For active/passive, ATS is simple since you only have the one LIO/TGT
> node executing commands at a time, so the locking is done locally using
> a normal old mutex.
> - tgt and LIO both support SCSI-3 persistent reservations. This is not
> really needed for ESX vmfs though since it uses ATS or older
> RESERVE/RELEASE. If you were using a cluster app like windows
> clustering, red hat cluster, etc in ESX or in normal non vm use, then
> you need something extra to support SCSI-3 PRs in both active/active or
> active/passive.
> For AA, you need something like described in that doc/video.
> For AP, you would need to copy over the PR state from one node to the
> other when failing over/back across nodes. For LIO this is in /var/target.
> Depending on how you do AP (what ALUA states you use if you do ALUA),
> you might also need to always distribute the PR info if you are doing
> windows clustering. Windows wants to see a consistent view of the PR
> info from all ports if you do something like ALUA active-optimized and
> standby states for active/passive.
> - I do not completely understand the comment about using LIO as a
> backend for tgt. You would either use tgt or LIO to export a rbd device.
> Not both at the same time like using LIO for some sort of tgt backend.
> Maybe people meant using the "RBD backend" instead of "LIO backend"
> - There are some other setup complications that you can see here
> http://comments.gmane.org/gmane.linux.scsi.target.devel/7044
> if you are using ALUA. I think tgt does not support ALUA, but LIO does.
> On 01/23/2015 04:25 PM, Zoltan Arnold Nagy wrote:
>> Correct me if I'm wrong, but tgt doesn't have full SCSI-3 persistence
>> support when _not_ using the LIO
>> backend for it, right?
>> AFAIK you can either run tgt with it's own iSCSI implementation or you
>> can use tgt to manage your LIO targets.
>> I assume when you're running tgt with the rbd backend code you're
>> skipping all the in-kernel LIO parts (in which case
>> the RedHat patches won't help a bit), and you won't have proper
>> active-active support, since the initiators
>> have no way to synchronize state (and more importantly, no way to 
>> synchronize write caching! [I can think
>> of some really ugly hacks to get around that, tho...]).
>> On 01/23/2015 05:46 PM, Jake Young wrote:
>>> Thanks for the feedback Nick and Zoltan,
>>> I have been seeing periodic kernel panics when I used LIO.  It was
>>> either due to LIO or the kernel rbd mapping.  I have seen this on
>>> Ubuntu precise with kernel 3.14.14 and again in Ubunty trusty with the
>>> utopic kernel (currently 3.16.0-28).  Ironically, this is the primary
>>> reason I started exploring a redundancy solution for my iSCSI proxy
>>> node.  So, yes, these crashes have nothing to do with running the
>>> Active/Active setup.
>>> I am moving my entire setup from LIO to rbd enabled tgt, which I've
>>> found to be much more stable and gives equivalent performance.
>>> I've been testing active/active LIO since July of 2014 with VMWare and
>>> I've never seen any vmfs corruption.  I am now convinced (thanks Nick)
>>> that it is possible.  The reason I have not seen any corruption may
>>> have to do with how VMWare happens to be configured.
>>> Originally, I had made a point to use round robin path selection in
>>> the VMware hosts; but as I did performance testing, I found that it
>>> actually didn't help performance.  When the host switches iSCSI
>>> targets there is a short "spin up time" for LIO to get to 100% IO
>>> capability.  Since round robin switches targets every 30 seconds (60
>>> seconds? I forget), this seemed to be significant.  A secondary goal
>>> for me was to end up with a config that required minimal tuning from
>>> VMWare and the target software; so the obvious choice is to leave
>>> VMWare's path selection at the default which is Fixed and picks the
>>> first target in ASCII-betical order.  That means I am actually
>>> functioning in Active/Passive mode.
>>> Jake
>>> On Fri, Jan 23, 2015 at 8:46 AM, Zoltan Arnold Nagy
>>> <zol...@linux.vnet.ibm.com <mailto:zol...@linux.vnet.ibm.com>> wrote:
>>>     Just to chime in: it will look fine, feel fine, but underneath
>>>     it's quite easy to get VMFS corruption. Happened in our tests.
>>>     Also if you're running LIO, from time to time expect a kernel
>>>     panic (haven't tried with the latest upstream, as I've been using
>>>     Ubuntu 14.04 on my "export" hosts for the test, so might have
>>>     improved...).
>>>     As of now I would not recommend this setup without being aware of
>>>     the risks involved.
>>>     There have been a few upstream patches getting the LIO code in
>>>     better cluster-aware shape, but no idea if they have been merged
>>>     yet. I know RedHat has a guy on this.
>>>     On 01/21/2015 02:40 PM, Nick Fisk wrote:
>>>>     Hi Jake,
>>>>     Thanks for this, I have been going through this and have a pretty
>>>>     good idea on what you are doing now, however I maybe missing
>>>>     something looking through your scripts, but I’m still not quite
>>>>     understanding how you are managing to make sure locking is
>>>>     happening with the ESXi ATS SCSI command.
>>>>     From this slide
>>>> http://xo4t.mjt.lu/link/xo4t/gzyhtx3/1/_9gJVMUrSdvzGXYaZfCkVA/aHR0cHM6Ly93aWtpLmNlcGguY29tL0BhcGkvZGVraS9maWxlcy8zOC9oYW1tZXItY2VwaC1kZXZlbC1zdW1taXQtc2NzaS10YXJnZXQtY2x1c3RlcmluZy5wZGY
>>>>     (Page 8)
>>>>     It seems to indicate that for a true active/active setup the two
>>>>     targets need to be aware of each other and exchange locking
>>>>     information for it to work reliably, I’ve also watched the video
>>>>     from the Ceph developer summit where this is discussed and it
>>>>     seems that Ceph+Kernel need changes to allow this locking to be
>>>>     pushed back to the RBD layer so it can be shared, from what I can
>>>>     see browsing through the Linux Git Repo, these patches haven’t
>>>>     made the mainline kernel yet.
>>>>     Can you shed any light on this? As tempting as having
>>>>     active/active is, I’m wary about using the configuration until I
>>>>     understand how the locking is working and if fringe cases
>>>>     involving multiple ESXi hosts writing to the same LUN on
>>>>     different targets could spell disaster.
>>>>     Many thanks,
>>>>     Nick
>>>>     *From:*Jake Young [mailto:jak3...@gmail.com]
>>>>     *Sent:* 14 January 2015 16:54
>>>>     *To:* Nick Fisk
>>>>     *Cc:* Giuseppe Civitella; ceph-users
>>>>     *Subject:* Re: [ceph-users] Ceph, LIO, VMWARE anyone?
>>>>     Yes, it's active/active and I found that VMWare can switch from
>>>>     path to path with no issues or service impact.
>>>>     I posted some config files here: github.com/jak3kaj/misc
>>>> <http://xo4t.mjt.lu/link/xo4t/gzyhtx3/2/_P2HWj3RxQZC1v5DQ_206Q/aHR0cDovL2dpdGh1Yi5jb20vamFrM2thai9taXNj>
>>>>     One set is from my LIO nodes, both the primary and secondary
>>>>     configs so you can see what I needed to make unique.  The other
>>>>     set (targets.conf) are from my tgt nodes.  They are both 4 LUN
>>>>     configs.
>>>>     Like I said in my previous email, there is no performance
>>>>     difference between LIO and tgt.  The only service I'm running on
>>>>     these nodes is a single iscsi target instance (either LIO or tgt).
>>>>     Jake
>>>>     On Wed, Jan 14, 2015 at 8:41 AM, Nick Fisk <n...@fisk.me.uk
>>>>     <mailto:n...@fisk.me.uk>> wrote:
>>>>         Hi Jake,
>>>>         I can’t remember the exact details, but it was something to
>>>>         do with a potential problem when using the pacemaker resource
>>>>         agents. I think it was to do with a potential hanging issue
>>>>         when one LUN on a shared target failed and then it tried to
>>>>         kill all the other LUNS to fail the target over to another
>>>>         host. This then leaves the TCM part of LIO locking the RBD
>>>>         which also can’t fail over.
>>>>         That said I did try multiple LUNS on one target as a test and
>>>>         didn’t experience any problems.
>>>>         I’m interested in the way you have your setup configured
>>>>         though. Are you saying you effectively have an active/active
>>>>         configuration with a path going to either host, or are you
>>>>         failing the iSCSI IP between hosts? If it’s the former, have
>>>>         you had any problems with scsi locking/reservations…etc
>>>>         between the two targets?
>>>>         I can see the advantage to that configuration as you
>>>>         reduce/eliminate a lot of the troubles I have had with
>>>>         resources failing over.
>>>>         Nick
>>>>         *From:*Jake Young [mailto:jak3...@gmail.com
>>>>         <mailto:jak3...@gmail.com>]
>>>>         *Sent:* 14 January 2015 12:50
>>>>         *To:* Nick Fisk
>>>>         *Cc:* Giuseppe Civitella; ceph-users
>>>>         *Subject:* Re: [ceph-users] Ceph, LIO, VMWARE anyone?
>>>>         Nick,
>>>>         Where did you read that having more than 1 LUN per target
>>>>         causes stability problems?
>>>>         I am running 4 LUNs per target. 
>>>>         For HA I'm running two linux iscsi target servers that map
>>>>         the same 4 rbd images. The two targets have the same serial
>>>>         numbers, T10 address, etc.  I copy the primary's config to
>>>>         the backup and change IPs. This way VMWare thinks they are
>>>>         different target IPs on the same host. This has worked very
>>>>         well for me. 
>>>>         One suggestion I have is to try using rbd enabled tgt. The
>>>>         performance is equivalent to LIO, but I found it is much
>>>>         better at recovering from a cluster outage. I've had LIO lock
>>>>         up the kernel or simply not recognize that the rbd images are
>>>>         available; where tgt will eventually present the rbd images
>>>>         again. 
>>>>         I have been slowly adding servers and am expanding my test
>>>>         setup to a production setup (nice thing about ceph). I now
>>>>         have 6 OSD hosts with 7 disks on each. I'm using the LSI
>>>>         Nytro cache raid controller, so I don't have a separate
>>>>         journal and have 40Gb networking. I plan to add another 6 OSD
>>>>         hosts in another rack in the next 6 months (and then another
>>>>         6 next year). I'm doing 3x replication, so I want to end up
>>>>         with 3 racks. 
>>>>         Jake
>>>>         On Wednesday, January 14, 2015, Nick Fisk <n...@fisk.me.uk
>>>>         <mailto:n...@fisk.me.uk>> wrote:
>>>>             Hi Giuseppe,
>>>>             I am working on something very similar at the moment. I
>>>>             currently have it working on some test hardware but seems
>>>>             to be working reasonably well.
>>>>             I say reasonably as I have had a few instability’s but
>>>>             these are on the HA side, the LIO and RBD side of things
>>>>             have been rock solid so far. The main problems I have had
>>>>             seem to be around recovering from failure with resources
>>>>             ending up in a unmanaged state. I’m not currently using
>>>>             fencing so this may be part of the cause.
>>>>             As a brief description of my configuration.
>>>>             4 Hosts each having 2 OSD’s also running the monitor role
>>>>             3 additional host in a HA cluster which act as iSCSI
>>>>             proxy nodes.
>>>>             I’m using the IP, RBD, iSCSITarget and iSCSILUN resource
>>>>             agents to provide HA iSCSI LUN which maps back to a RBD.
>>>>             All the agents for each RBD are in a group so they follow
>>>>             each other between hosts.
>>>>             I’m using 1 LUN per target as I read somewhere there are
>>>>             stability problems using more than 1 LUN per target.
>>>>             Performance seems ok, I can get about 1.2k random IO’s
>>>>             out the iSCSI LUN. These seems to be about right for the
>>>>             Ceph cluster size, so I don’t think the LIO part is
>>>>             causing any significant overhead.
>>>>             We should be getting our production hardware shortly
>>>>             which wil have 40 OSD’s with journals and a SSD caching
>>>>             tier, so within the next month or so I will have a better
>>>>             idea of running it in a production environment and the
>>>>             performance of the system.
>>>>             Hope that helps, if you have any questions, please let me
>>>>             know.
>>>>             Nick
>>>>             *From:*ceph-users
>>>>             [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of
>>>>             *Giuseppe Civitella
>>>>             *Sent:* 13 January 2015 11:23
>>>>             *To:* ceph-users
>>>>             *Subject:* [ceph-users] Ceph, LIO, VMWARE anyone?
>>>>             Hi all,
>>>>             I'm working on a lab setup regarding Ceph serving rbd
>>>>             images as ISCSI datastores to VMWARE via a LIO box. Is
>>>>             there someone that already did something similar wanting
>>>>             to share some knowledge? Any production deployments? What
>>>>             about LIO's HA and luns' performances?
>>>>             Thanks 
>>>>             Giuseppe
>>>>     _______________________________________________
>>>>     ceph-users mailing list
>>>>     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

ceph-users mailing list

Reply via email to