> On 6 Apr 2017, at 08:42, Nick Fisk <n...@fisk.me.uk> wrote:
> 
> I assume Brady is referring to the death spiral LIO gets into with some 
> initiators, including vmware, if an IO takes longer than about 10s.

We have occasionally seen this issue with vmware+LIO, almost always when 
upgrading OSD nodes. Didn’t realise it was a known issue! Apart from that, 
though, we've found LIO generally to be far more performant and stable 
(especially in our multipathing setup) so would like to stick with it if 
possible.

I’m wondering, are there any additional steps we should be taking to minimise 
the risk of LIO timeouts during upgrades? At the moment, we set the cluster to 
“noout”, stop the node’s services, upgrade the packages and reboot. For 
instance, is there a way to drain connections from clients to a particular node 
before shutting down its OSDs?

Thanks,

Oliver.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to