On 2013-11-20 08:35, Jefferson Ogata wrote:
Indeed, using iptables with REJECT and tcp-reset, this seems to piss off
the initiators, creating immediate i/o errors. But one can use DROP on
incoming SYN packets and let established connections drain. I've been
trying to get this to work but am finding that it takes so long for some
connections to drain that something times out. I haven't given up on
this approach, tho. Testing this stuff can be tricky because if i make
one mistake, stonith kicks in and i end up having to wait 5-10 minutes
for the machine to reboot and resync its DRBD devices.

Follow-up on this: the original race condition i reported still occurs with this strategy: if existing TCP connections are allowed to drain by passing packets from established initiator connections (by blocking only SYN packets), then the initiator can also send new requests to the target during the takedown process; the takedown removes LUNs from the live target and the initiator generates an i/o error if it happens to try to access a LUN that has been removed before the connection is removed.

This happens because the configuration looks something like this (crm):

group foo portblock vip iSCSITarget:target iSCSILogicalUnit:lun1 iSCSILogicalUnit:lun2 iSCSILogicalUnit:lun3 portunblock

On takedown, if portblock is tweaked to pass packets for existing connections so they can drain, there's a window while LUNs lun3, lun2, lun1 are being removed from the target where this race condition occurs. The connection isn't removed until iSCSITarget runs to stop the target.

A way to handle this that should actually work is to write a new RA that deletes the connections from the target *before* the LUNs are removed during takedown. The config would look something like this, then:

group foo portblock vip iSCSITarget:target iSCSILogicalUnit:lun1 iSCSILogicalUnit:lun2 iSCSILogicalUnit:lun3 tgtConnections portunblock

On takedown, then, portunblock will block new incoming connections, tgtConnections will shut down existing connections and wait for them to drain, then the LUNs can be safely removed before the target is taken down.

I'll write this RA today and see how that works.
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to