Re: [ceph-users] Ceph, LIO, VMWARE anyone?

Jake Young Fri, 16 Jan 2015 15:18:43 -0800

Yes, it's active/active and I found that VMWare can switch from path to
path with no issues or service impact.



I posted some config files here: github.com/jak3kaj/misc

One set is from my LIO nodes, both the primary and secondary configs so you
can see what I needed to make unique.  The other set (targets.conf) are
from my tgt nodes.  They are both 4 LUN configs.

Like I said in my previous email, there is no performance difference
between LIO and tgt.  The only service I'm running on these nodes is a
single iscsi target instance (either LIO or tgt).

Jake

On Wed, Jan 14, 2015 at 8:41 AM, Nick Fisk <n...@fisk.me.uk> wrote:

> Hi Jake,
>
>
>
> I can’t remember the exact details, but it was something to do with a
> potential problem when using the pacemaker resource agents. I think it was
> to do with a potential hanging issue when one LUN on a shared target failed
> and then it tried to kill all the other LUNS to fail the target over to
> another host. This then leaves the TCM part of LIO locking the RBD which
> also can’t fail over.
>
>
>
> That said I did try multiple LUNS on one target as a test and didn’t
> experience any problems.
>
>
>
> I’m interested in the way you have your setup configured though. Are you
> saying you effectively have an active/active configuration with a path
> going to either host, or are you failing the iSCSI IP between hosts? If
> it’s the former, have you had any problems with scsi
> locking/reservations…etc between the two targets?
>
>
>
> I can see the advantage to that configuration as you reduce/eliminate a
> lot of the troubles I have had with resources failing over.
>
>
>
> Nick
>
>
>
> *From:* Jake Young [mailto:jak3...@gmail.com]
> *Sent:* 14 January 2015 12:50
> *To:* Nick Fisk
> *Cc:* Giuseppe Civitella; ceph-users
> *Subject:* Re: [ceph-users] Ceph, LIO, VMWARE anyone?
>
>
>
> Nick,
>
>
>
> Where did you read that having more than 1 LUN per target causes stability
> problems?
>
>
>
> I am running 4 LUNs per target.
>
>
>
> For HA I'm running two linux iscsi target servers that map the same 4 rbd
> images. The two targets have the same serial numbers, T10 address, etc.  I
> copy the primary's config to the backup and change IPs. This way VMWare
> thinks they are different target IPs on the same host. This has worked very
> well for me.
>
>
>
> One suggestion I have is to try using rbd enabled tgt. The performance is
> equivalent to LIO, but I found it is much better at recovering from a
> cluster outage. I've had LIO lock up the kernel or simply not recognize
> that the rbd images are available; where tgt will eventually present the
> rbd images again.
>
>
>
> I have been slowly adding servers and am expanding my test setup to a
> production setup (nice thing about ceph). I now have 6 OSD hosts with 7
> disks on each. I'm using the LSI Nytro cache raid controller, so I don't
> have a separate journal and have 40Gb networking. I plan to add another 6
> OSD hosts in another rack in the next 6 months (and then another 6 next
> year). I'm doing 3x replication, so I want to end up with 3 racks.
>
>
>
> Jake
>
> On Wednesday, January 14, 2015, Nick Fisk <n...@fisk.me.uk> wrote:
>
> Hi Giuseppe,
>
>
>
> I am working on something very similar at the moment. I currently have it
> working on some test hardware but seems to be working reasonably well.
>
>
>
> I say reasonably as I have had a few instability’s but these are on the HA
> side, the LIO and RBD side of things have been rock solid so far. The main
> problems I have had seem to be around recovering from failure with
> resources ending up in a unmanaged state. I’m not currently using fencing
> so this may be part of the cause.
>
>
>
> As a brief description of my configuration.
>
>
>
> 4 Hosts each having 2 OSD’s also running the monitor role
>
> 3 additional host in a HA cluster which act as iSCSI proxy nodes.
>
>
>
> I’m using the IP, RBD, iSCSITarget and iSCSILUN resource agents to provide
> HA iSCSI LUN which maps back to a RBD. All the agents for each RBD are in a
> group so they follow each other between hosts.
>
>
>
> I’m using 1 LUN per target as I read somewhere there are stability
> problems using more than 1 LUN per target.
>
>
>
> Performance seems ok, I can get about 1.2k random IO’s out the iSCSI LUN.
> These seems to be about right for the Ceph cluster size, so I don’t think
> the LIO part is causing any significant overhead.
>
>
>
> We should be getting our production hardware shortly which wil have 40
> OSD’s with journals and a SSD caching tier, so within the next month or so
> I will have a better idea of running it in a production environment and the
> performance of the system.
>
>
>
> Hope that helps, if you have any questions, please let me know.
>
>
>
> Nick
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Giuseppe Civitella
> *Sent:* 13 January 2015 11:23
> *To:* ceph-users
> *Subject:* [ceph-users] Ceph, LIO, VMWARE anyone?
>
>
>
> Hi all,
>
>
>
> I'm working on a lab setup regarding Ceph serving rbd images as ISCSI
> datastores to VMWARE via a LIO box. Is there someone that already did
> something similar wanting to share some knowledge? Any production
> deployments? What about LIO's HA and luns' performances?
>
>
>
> Thanks
>
> Giuseppe
>
>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph, LIO, VMWARE anyone?

Reply via email to