[OmniOS-discuss] stmf trouble, crash and dump

Johan Kragsterman Thu, 06 Oct 2016 02:04:26 -0700

Hi!


Got a problem here a couple of days ago when I ran a snapshot stream over fibre 
channel on my home/business/devel server to the clone backup server.

Systems: OmniOS 5.11     omnios-r151018-95eaa7e on both systems, initiator on 
one, and and target on the other. Also same hardware: Dell precision 
workstation with dual xeon 6-cores and 96 GB registred ram, intel quad port gb 
nic, and qlogic QLE2462 HBA's.

Configured with one lun provisioned from the target/backup system to the 
initiator system as a backup lun, and that backup lun configured as a zpool in 
the initiator system. I should also say, that I run this Fc connection 
point-to-point, no switch is involved, and it's a single fibre pair, 10 m.

I did a zfs send/recv of a snapshot, and I thought it took a long time. It was 
around 67 GB. Then the initiator system crashed and dumped. It rebooted, and I 
got into it again without any trouble.
What I immediately saw was that the zpool "backpool" that was backed by the Fc 
lun was not present any longer. I made a zpool import, and it was back there 
again. I did another test, sent a much smaller snap, this was around 450 MB, 
and that worked fine, although I thought it took a lot of time.

I once again tried with the bigger snap, and same thing happened, system 
crashed and dumped. I got those two dump files, but I don't know wether this 
might be a problem on the target system or the initiator side.

I can provide access to dump files.

Here is some information from the two systems that I find interesting:

The initiator system, omni:

omni:

root@omni:/var/log# dmesg | grep qlc
Oct  2 18:33:08 omni qlc: [ID 439991 kern.info] NOTICE: Qlogic qlc(0,0): Loop 
OFFLINE


root@omni:/# dmesg | grep scsi
Oct  2 18:34:58 omni scsi: [ID 243001 kern.info] 
/pci@19,0/pci8086,3410@9/pci1077,138@0/fp@0,0 (fcp0):
Oct  2 18:34:58 omni genunix: [ID 408114 kern.info] 
/scsi_vhci/disk@g600144f0c648ae73000057ef6d370001 (sd5) offline
Oct  2 18:34:58 omni genunix: [ID 483743 kern.info] 
/scsi_vhci/disk@g600144f0c648ae73000057ef6d370001 (sd5) multipath status: 
failed: path 4 fp0/disk@w2101001b32a19a92,0 is offline
root@omni:/# dmesg | grep multipath
Oct  2 18:34:58 omni genunix: [ID 483743 kern.info] 
/scsi_vhci/disk@g600144f0c648ae73000057ef6d370001 (sd5) multipath status: 
failed: path 4 fp0/disk@w2101001b32a19a92,0 is offline

As you can see, the loop is marked offline at the occasion for the crash. But 
notably strange, there is also an info of a failed multipath...? Why this? 
There is no multipath here...

The target system, omni2:

root@omni2:/root# grep stmf /var/adm/messages
Oct  2 09:56:37 omni2 pseudo: [ID 129642 kern.info] pseudo-device: stmf_sbd0
Oct  2 09:56:37 omni2 genunix: [ID 936769 kern.info] stmf_sbd0 is 
/pseudo/stmf_sbd@0
Oct  2 09:56:46 omni2 pseudo: [ID 129642 kern.info] pseudo-device: stmf0
Oct  2 09:56:46 omni2 genunix: [ID 936769 kern.info] stmf0 is /pseudo/stmf@0
Oct  2 09:57:31 omni2 pseudo: [ID 129642 kern.info] pseudo-device: stmf0
Oct  2 09:57:31 omni2 genunix: [ID 936769 kern.info] stmf0 is /pseudo/stmf@0
Oct  2 09:57:31 omni2 stmf_sbd: [ID 690249 kern.warning] WARNING: 
ioctl(DKIOCINFO) failed 25


There is this warning, ioctl(DKCINFO) failed 25, that I tried to find out what 
it is about, but not succeeded.


Perhaps it is just so simple that the Fc connection isn't good enough. The 
cable shouldn't be a problem, since it is brand new, but it could of coarse be 
something with the HBA's. I could get another cable for doing multipath, and 
see how that would work, but let's start with this first.

Best regards from/Med vänliga hälsningar från

Johan Kragsterman

Capvert


_______________________________________________
OmniOS-discuss mailing list
[email protected]
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] stmf trouble, crash and dump

Reply via email to