On Wednesday, June 6, 2012 8:36:04 PM UTC-5, Mark Felder wrote:
Hi guys I'm excitedly posting this from my phone. Good news for you guys, bad
news for us -- we were building HA storage on vmware for a client and can now
replicate the crash on demand. I'll be posting details when I get home to my
PC tonight, but this hopefully is enough to replicate the crash for any
curious followers:
ESXi 5
9 or 9-STABLE
HAST
1 cpu is fine
1GB of ram
UFS SUJ on HAST device
No special loader.conf, sysctl, etc
No need for VMWare tools
Run Bonnie++ on the HAST device
We can get the crash to happen on the first run of bonnie++ right now. I'll
post the exact specs and precise command run in the PR. We found an old post
from 2004 when we looked up the process state obtained from CTRL+T -- flswai
-- which describes the symptoms nearly perfectly.
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2004-02/0250.html
Hopefully this gets us closer to a fix...
Is this a crash or a hang? Over the past couple of weeks, I've been working
with a FreeBSD 9.1RC1 system under VMware ESXi 5.0 with a 64GB UFS root FS and
2TB ZFS filesystem mounted via a virtual LSI SAS interface. Sometimes during
heavy I/O load (rsync from other servers) on the ZFS FS, this shows up in
/var/log/messages:
Sep 21 02:14:55 backups kernel: (da1:mpt0:0:1:0): WRITE(10). CDB: 2a 0 5 ee 60
16 0 1 0 0
Sep 21 02:14:55 backups kernel: (da1:mpt0:0:1:0): CAM status: SCSI Status Error
Sep 21 02:14:55 backups kernel: (da1:mpt0:0:1:0): SCSI status: Busy
Sep 21 02:14:55 backups kernel: (da1:mpt0:0:1:0): Retrying command
Sep 21 02:18:44 backups kernel: (da1:mpt0:0:1:0): WRITE(10). CDB: 2a 0 3 ef 42
51 0 1 0 0
Sep 21 02:18:44 backups kernel: (da1:mpt0:0:1:0): CAM status: SCSI Status Error
Sep 21 02:18:44 backups kernel: (da1:mpt0:0:1:0): SCSI status: Busy
Sep 21 02:18:44 backups kernel: (da1:mpt0:0:1:0): Retrying command
Sep 21 02:18:48 backups kernel: (da1:mpt0:0:1:0): WRITE(10). CDB: 2a 0 3 ef 64
51 0 1 0 0
Sep 21 02:18:48 backups kernel: (da1:mpt0:0:1:0): CAM status: SCSI Status Error
Sep 21 02:18:48 backups kernel: (da1:mpt0:0:1:0): SCSI status: Busy
Sep 21 02:18:48 backups kernel: (da1:mpt0:0:1:0): Retrying command
Sep 21 02:18:49 backups kernel: (da1:mpt0:0:1:0): WRITE(10). CDB: 2a 0 3 ef 66
51 0 1 0 0
Sep 21 02:18:49 backups kernel: (da1:mpt0:0:1:0): CAM status: SCSI Status Error
Sep 21 02:18:49 backups kernel: (da1:mpt0:0:1:0): SCSI status: Busy
...
Sep 21 05:06:18 backups kernel: (da1:mpt0:0:1:0): WRITE(10). CDB: 2a 0 41 f3 94
99 0 1 0 0
Sep 21 05:06:18 backups kernel: (da1:mpt0:0:1:0): CAM status: SCSI Status Error
Sep 21 05:06:18 backups kernel: (da1:mpt0:0:1:0): SCSI status: Busy
Sep 21 05:06:18 backups kernel: (da1:mpt0:0:1:0): Retrying command
These have been happening roughly every other day.
mpt0 and em0 were sharing int 18, so today I put
hint.mpt.0.msi_enable=1
into /boot/devices.hints and rebooted; now mpt0 is using int 256. I'll see if
it helps.
Guy
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org