CCing: netapp-linux-community On 06/17/2014 04:56 AM, Hans van Kranenburg wrote: > I ran into this same issue today. > > iSCSI target: NetApp FAS2240-2 NetApp Release 8.1.2 7-Mode > iSCSI initiator: Debian GNU/Linux (kernel 3.2.57-3, amd64) > > It seems that as soon as an unmap iscsi command is fired to the NetApp, > the result is: > > [4230017.814546] sd 12:0:0:0: [sdc] Unhandled sense code > [4230017.814610] sd 12:0:0:0: [sdc] Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > [4230017.814682] sd 12:0:0:0: [sdc] Sense Key : Medium Error [current] > [4230017.814748] sd 12:0:0:0: [sdc] Add. Sense: Incompatible medium > installed > [4230017.814818] sd 12:0:0:0: [sdc] CDB: Unmap/Read sub-channel: 42 00 > 00 00 00 00 00 00 18 00 > [4230017.814986] end_request: I/O error, dev sdc, sector 4372576896 > > What does this message exactly mean (besides telling the fact that > there's no cd in the cdrom drive, which is a clearly not so practical > re-use of message numbers here :) ). > > I guess multipath tries to fire the same request to the next (and next) > connection, because of the I/O error, which results in a situation where > are paths are failing. > > The reason why testing with squeeze and 2.6.32 succeeds, and > wheezy/jessie fails is that in the squeeze case, mkfs does not issue > discard requests by default, because device mapper did not support it > back then. > > I don't really know why this is happening. What I know is that it takes > down the entire multipath/iscsi connection, because all paths start > failing. In my case, the debian machine is a Xen dom0, which runs a > number of virtual machines. All of them experienced 100% disk iowait > right away. I think the "You've seen mkfs fail because all the paths > were faulty." should be "You've seen all paths go faulty because you did > an mkfs, which tried to discard the new empty space." :-) > > Additional interesting info: When doing the same on an iSCSI target > which is a NetApp FAS2040, either running 8.1RC2 or 8.1, I can use iSCSI > UNMAP. Well, at least when using Debian kernel 3.2.46-1+deb7u1 and > 3.2.51-1, which were on the iSCSI initiators last time I used this. > > (Well, actually, it seems that NetApp equipment can respond quite badly > to UNMAP (high load/latency spikes, even without using snapshots), so > that's why we only use discard/unmap when removing old lvm logical > volumes by slow-discarding them. Anyway, using it on the newer NetApp > system has a clearly different result.) > > I just started researching this situation and found this bug report. I'd > appreciate to hear if the poster of this bug has made progress since Mar > 5 2014 on this topic. > > Attached is a slightly munged syslog file showing what happened this > afternoon when trying to use fstrim on a mounted ext4 filesystem, which > was on lvm on iscsi on this netapp. > > Although I do not have a dedicated test setup containing a spare NetApp > FAS2240, I know this issue results in impact on the iSCSCI initiator > running linux, instead on any impact on the storage array itself, so I'd > be happy to help debug this issue to find out what exactly is causing > this, and how we could improve on it. Is it really a multipath issue, or > a kernel issue, is the NetApp software to blame? (I cannot find anything > related in release notes since 8.1.2) Should iSCSI and/or multipath > handle this response in another way?
Okay!! Let me check in the lab. I will try to reproduce it. In case you do not hear back from me, please feel free to ping back. Meanwhile, can you run sg_inq on the SCSI device ?? -- Ritesh Raj Sarraf | Linux Engineering | NetApp Inc.
signature.asc
Description: OpenPGP digital signature