Marvell MV88SX6081 on FreeBSD 8.0-RELEASE

2009-11-29 Thread Stephane LAPIE
Hello,

did anyone have the opportunity to test operation with a MV88SX6081 SATA
controller on FreeBSD 8.0 ? My amd64 dual-Opteron (using 4G ECC RAM)
system just works fine with it on FreeBSD 7.2, but can't handle any
high-speed disk I/O when booting with a FreeBSD 8.0 kernel.

I have two controllers as follow :
atap...@pci0:17:4:0:class=0x01 card=0x11ab11ab chip=0x608111ab
rev=0x09 hdr=0x00
vendor = 'Marvell Semiconductor (Was: Galileo Technology Ltd)'
device = 'MV88SX6081 8-port SATA II PCI-X Controller'
class  = mass storage
subclass   = SCSI
atap...@pci0:18:4:0:class=0x01 card=0x11ab11ab chip=0x608111ab
rev=0x09 hdr=0x00
vendor = 'Marvell Semiconductor (Was: Galileo Technology Ltd)'
device = 'MV88SX6081 8-port SATA II PCI-X Controller'
class  = mass storage
subclass   = SCSI

http://www.supermicro.com/products/accessories/addon/AOC-SAT2-MV8.cfm

On FreeBSD 8.0, attempting to scrub a ZFS pool results in a few I/O
bursts (confirmed with zpool iostat), before totally freezing down and
locking the ZFS pool (the system is still up and only ZFS based file
systems are unusable in this state), probably to avoid data corruption.
Occasionally I also witness a "READ_DMA48 soft error (ECC corrected)"
error message showing up, on a random hard disk.

I already gave the hard disks a thorough check and they work just fine.

On FreeBSD 7.2-STABLE, the scrub proceeds nicely and the I/O peaks at
300MB/s (confirmed with zpool iostat) on the pool without a hitch. I
could also confirm that attempts at booting a FreeBSD 8.0-RELEASE kernel
did not damage my ZFS pool checksums or anything.

Therefore, I am inclined to think the motherboard/memory (a "TYAN
Thunder K8WE S2895") would be at fault here, and that "something" in
FreeBSD 8.0 brings out this very specific problem, but I would first
like to hear about any tests of the aforementioned controller on FreeBSD
8.0 on another environment before upgrading the hardware.

Thanks in advance for your time,

P.S. : Here is the dmesg trace for FreeBSD 8.0.

Copyright (c) 1992-2009 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.0-RELEASE #8: Wed Nov 25 03:48:44 JST 2009

darks...@eirei-no-za.yomi.darkbsd.org:/usr/storage/tech/eirei-no-za.yomi.darkbsd.org/usr/obj/usr/storage/tech/eirei-no-za.yomi.darkbsd.org/usr/src/sys/DARK-2009KERN
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Dual Core AMD Opteron(tm) Processor 275 (2210.20-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x20f12  Stepping = 2

Features=0x178bfbff
  Features2=0x1
  AMD Features=0xe2500800
  AMD Features2=0x3
real memory  = 5100273664 (4864 MB)
avail memory = 4109283328 (3918 MB)
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-27 on motherboard
ioapic2  irqs 28-31 on motherboard
ioapic3  irqs 32-55 on motherboard
kbd1 at kbdmux0
iscsi: version 2.1.0
acpi0:  on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0
acpi_button0:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pci0:  at device 0.0 (no driver attached)
isab0:  at device 1.0 on pci0
isa0:  on isab0
nfsmb0:  port
0xa000-0xa03f,0xa040-0xa07f at device 1.1 on pci0
smbus0:  on nfsmb0
smb0:  on smbus0
nfsmb1:  on nfsmb0
smbus1:  on nfsmb1
smb1:  on smbus1
ohci0:  mem 0xdd80-0xdd800fff irq 20
at device 2.0 on pci0
ohci0: [ITHREAD]
usbus0:  on ohci0
atapci0:  port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x1400-0x140f at device 6.0 on pci0
ata0:  on atapci0
ata0: [ITHREAD]
ata1:  on atapci0
ata1: [ITHREAD]
pcib1:  at device 9.0 on pci0
pci1:  on pcib1
vgapci0:  port 0x2000-0x207f mem
0xde00-0xde7f,0xdd90-0xdd90 at device 4.0 on pci1
pcib2:  at device 14.0 on pci0
pci2:  on pcib2
amdtemp0:  on hostb3
amdtemp1:  on hostb7
pcib3:  port 0xcf8-0xcff on acpi0
pci16:  on pcib3
pcib4:  at device 10.0 on pci16
pci17:  on pcib4
atapci1:  port 0x3000-0x30ff mem
0xde90-0xde9f,0xdec0-0xdeff irq 24 at device 4.0 on pci17
atapci1: [ITHREAD]
ata2:  on atapci1
ata2: [ITHREAD]
ata3:  on atapci1
ata3: [ITHREAD]
ata4:  on atapci1
ata4: [ITHREAD]
ata5:  on atapci1
ata5: [ITHREAD]
ata6:  on atapci1
ata6: [ITHREAD]
ata7:  on atapci1
ata7: [ITHREAD]
ata8:  on atapci1
ata8: [ITHREAD]
ata9:  on atapci1
ata9: [ITHREAD]
pcib5:  at device 11.0 on pci16
pci18:  on pcib5
atapci2:  port 0x4000-0x40ff mem
0xdf40-0xdf4f,0xdf00-0xdf3f irq 28 at device 4.0 on pci18
atapci2: [ITHREAD]
ata10:  on atapci2
ata10: [ITHREAD]
ata11:  on atap

Re: Marvell MV88SX6081 on FreeBSD 8.0-RELEASE

2009-11-29 Thread Dieter
In message <4b13159d.9010...@darkbsd.org>, Stephane LAPIE writes:

> On FreeBSD 8.0, attempting to scrub a ZFS pool results in a few I/O
> bursts (confirmed with zpool iostat), before totally freezing down and
> locking the ZFS pool (the system is still up and only ZFS based file
> systems are unusable in this state), probably to avoid data corruption.
> Occasionally I also witness a "READ_DMA48 soft error (ECC corrected)"
> error message showing up, on a random hard disk.

And there are these, which you didn't mention:
> ad1: FAILURE - SET_MULTI status=3D51 error=3D4
> ad12: FAILURE - SET_MULTI status=3D51 error=3D4
(No, I don't know what that means, sorry.)

> ad18: 1430799MB  at ata9-master SATA300
> ad20: 1430799MB  at ata10-master SATA300

How old is the SD1A disk?  As you may know, Seagate had various troubles
with the .11 firmware.  I have some of the ST31500341AS CC1H and mine are
new enough that they are supposed to be ok.

> Therefore, I am inclined to think the motherboard/memory (a "TYAN
> Thunder K8WE S2895") would be at fault here,

I have been told that Tyan does a good job with memory, although even
assuming that's true it could still be a memory problem.  My Tyan
board has a memory scrubbing feature that can be turned on in
firmware, but I've never tried it.  Also, you could try rotating the
SIMMs and see if anything changes.
___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to "freebsd-hardware-unsubscr...@freebsd.org"


Re: Marvell MV88SX6081 on FreeBSD 8.0-RELEASE

2009-11-30 Thread Stephane LAPIE
Dieter wrote:
> In message <4b13159d.9010...@darkbsd.org>, Stephane LAPIE writes:
> 
>> On FreeBSD 8.0, attempting to scrub a ZFS pool results in a few I/O
>> bursts (confirmed with zpool iostat), before totally freezing down and
>> locking the ZFS pool (the system is still up and only ZFS based file
>> systems are unusable in this state), probably to avoid data corruption.
>> Occasionally I also witness a "READ_DMA48 soft error (ECC corrected)"
>> error message showing up, on a random hard disk.
> 
> And there are these, which you didn't mention:
>> ad1: FAILURE - SET_MULTI status=3D51 error=3D4
>> ad12: FAILURE - SET_MULTI status=3D51 error=3D4
> (No, I don't know what that means, sorry.)

Ah, actually I do have an idea as to what these are about : These two
disks are SSDs I'm using in my ZFS pool as cache devices. So I guess
this is the kind of error that says "I tried to use an access mode this
device was not designed for". In the same fashion, I also get such error
messages when trying SMART checks on ad0, which is a flash card.

ad0: FAILURE - SMART status=51 error=4

So, I think these could safely be discarded under "not a normal disk
device". (Unless I'm completely mistaken about this...)

>> ad18: 1430799MB  at ata9-master SATA300
>> ad20: 1430799MB  at ata10-master SATA300
> 
> How old is the SD1A disk?  As you may know, Seagate had various troubles
> with the .11 firmware.  I have some of the ST31500341AS CC1H and mine are
> new enough that they are supposed to be ok.

I bought all these disks at roughly the same time around April, but
strangely got an odd one in the lot and didn't bother to change it.

From what I checked with the Seagate page about the firmware issues,
SD1A is not affected. I also have three spare disks in a drawer in case
a problem occurs.

>> Therefore, I am inclined to think the motherboard/memory (a "TYAN
>> Thunder K8WE S2895") would be at fault here,
> 
> I have been told that Tyan does a good job with memory, although even
> assuming that's true it could still be a memory problem.  My Tyan
> board has a memory scrubbing feature that can be turned on in
> firmware, but I've never tried it.  Also, you could try rotating the
> SIMMs and see if anything changes.

The memory is brand new, from Corsair. I tried swapping the sticks
(while still using the same slots) and strangely enough, some
combinations just... don't boot at all.

Also, completely unrelated to the SATA controller, some quirks on this
motherboard (though at BIOS level) have been annoying me quite a lot :
- Booting FreeBSD from anything besides an IDE device has a 80% chance
of freezing the computer at BTX level.
- Sometimes the Option ROMs (this including the VGA card) are not loaded
properly because of an "out of memory" problem at BIOS level.

However, once the system is booted, it can go on for several months.
(Though, I have witnessed one "Fatal error 12: Page fault"-type kernel
panic in six months)

Thanks again for your time.
-- 
Stephane LAPIE, EPITA SRS, Promo 2005
"Even when they have digital readouts, I can't understand them."
--MegaTokyo



signature.asc
Description: OpenPGP digital signature


Re: Marvell MV88SX6081 on FreeBSD 8.0-RELEASE

2009-11-30 Thread Dieter
In message <4b1392cf.5090...@darkbsd.org>, Stephane LAPIE writes:

> >> Therefore, I am inclined to think the motherboard/memory (a "TYAN
> >> Thunder K8WE S2895") would be at fault here,
> >=20
> > I have been told that Tyan does a good job with memory, although even
> > assuming that's true it could still be a memory problem.  My Tyan
> > board has a memory scrubbing feature that can be turned on in
> > firmware, but I've never tried it.  Also, you could try rotating the
> > SIMMs and see if anything changes.
> 
> The memory is brand new, from Corsair. I tried swapping the sticks
> (while still using the same slots) and strangely enough, some
> combinations just... don't boot at all.

That sounds like a major clue that you probably have a very bad stick
of memory (probably a hard error).  I would try booting with just 1 stick
at a time (or whatever the minimum is for your board) and isolate the bad
stick.

Most likely with 7.2 something landed on the bad location that doesn't
actually get used, but with 8.0 something disk related lands there.

> Also, completely unrelated to the SATA controller, some quirks on this
> motherboard (though at BIOS level) have been annoying me quite a lot :
> - Booting FreeBSD from anything besides an IDE device has a 80% chance
> of freezing the computer at BTX level.
> - Sometimes the Option ROMs (this including the VGA card) are not loaded
> properly because of an "out of memory" problem at BIOS level.
> 
> However, once the system is booted, it can go on for several months.
> (Though, I have witnessed one "Fatal error 12: Page fault"-type kernel
> panic in six months)

Let me guess, "Phoenix - AwardBIOS"?
On mine they can't even spell the name of the board correctly:
"TYAN Tomact K8E BIOS V1.00   022105"
(should be Tomcat)  Such quality control.

Mine hangs in boot if I have 2 JMB363 cards in the 2 PCIe x1 slots.
Moved one to the x16 slot and it boots.  I've been blaming the
JMB363 cards but maybe the Phoenix AwardBIOS is the problem child?

Most of the time one of the cards doesn't do it's display the drives
and give me 5.1 nanoseconds to hit some control character to
enter a setup-a-raid thingy.  And frequently FreeBSD doesn't see
one of the controllers and thus doesn't make it to multiuser.  I'd
expect these events to be correlated but oddly they don't seem to be.
Sometimes it takes several reboots to get all the controllers seen.
I haven't seen an "out of memory" message, but the way things fly by
perhaps I just missed it.  It always works correctly the first time
after a power cycle, so my theory is that the expansion cards aren't
getting reset properly.  I have a firewire PCI card that got into
a funky mode and rebooting didn't fix it but a power cycle did.

Tyan is supposed to be tier 1 but they aren't doing themselves
any favors with that pathetic excuse for firmware.
___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to "freebsd-hardware-unsubscr...@freebsd.org"


Re: Marvell MV88SX6081 on FreeBSD 8.0-RELEASE

2009-11-30 Thread Stephane LAPIE
Dieter wrote:
> That sounds like a major clue that you probably have a very bad stick
> of memory (probably a hard error).  I would try booting with just 1 stick
> at a time (or whatever the minimum is for your board) and isolate the bad
> stick.
> 
> Most likely with 7.2 something landed on the bad location that doesn't
> actually get used, but with 8.0 something disk related lands there.

I'll try that the next time I have a chance to reboot that server.

> Let me guess, "Phoenix - AwardBIOS"?

...The very one.

As per kenv output :

smbios.bios.reldate="11/18/2008"
smbios.bios.vendor="Phoenix Technologies Ltd."
smbios.bios.version="2004Q3"
smbios.chassis.maker="TYAN Computer Corp"
smbios.memory.enabled="2097152"
smbios.planar.maker="Tyan Computer Corporation"
smbios.planar.product="S2895"
smbios.planar.serial="0123456789"
smbios.planar.version="TYAN Thunder K8WE S2895"
smbios.socket.enabled="2"
smbios.socket.populated="2"
smbios.system.maker="TYAN Computer Corp."
smbios.system.product="S2895"
smbios.system.serial="0123456789"
smbios.system.version="TYAN Thunder K8WE S2895"
smbios.version="2.33"


> On mine they can't even spell the name of the board correctly:
> "TYAN Tomact K8E BIOS V1.00   022105"
> (should be Tomcat)  Such quality control.

Ouch.

> Mine hangs in boot if I have 2 JMB363 cards in the 2 PCIe x1 slots.
> Moved one to the x16 slot and it boots.  I've been blaming the
> JMB363 cards but maybe the Phoenix AwardBIOS is the problem child?

In my case I guess that would be it. I have never seen FreeBSD freeze on
BTX level when loading from media, on any other computer.

> Most of the time one of the cards doesn't do it's display the drives
> and give me 5.1 nanoseconds to hit some control character to
> enter a setup-a-raid thingy.  And frequently FreeBSD doesn't see
> one of the controllers and thus doesn't make it to multiuser.  I'd

I also happen to have these ones, but with my Intel NICs (no way I'm
using the Marvell default ones, I get lost interrupt messages all over
the place when I crank up network I/O a bit)

> expect these events to be correlated but oddly they don't seem to be.
> Sometimes it takes several reboots to get all the controllers seen.
> I haven't seen an "out of memory" message, but the way things fly by
> perhaps I just missed it.  It always works correctly the first time
> after a power cycle, so my theory is that the expansion cards aren't
> getting reset properly.  I have a firewire PCI card that got into
> a funky mode and rebooting didn't fix it but a power cycle did.

Just for the recall, it's more like "Could not load Option ROM" (I
recall it was because it was out of memory buffers for these, but I
don't have the exact message text available at hand), and it pops up
randomly ; sometimes as you said, full power cycles do help. Sometimes
they don't.

> Tyan is supposed to be tier 1 but they aren't doing themselves
> any favors with that pathetic excuse for firmware.

Definitely not...
-- 
Stephane LAPIE, EPITA SRS, Promo 2005
"Even when they have digital readouts, I can't understand them."
--MegaTokyo



signature.asc
Description: OpenPGP digital signature


Re: Marvell MV88SX6081 on FreeBSD 8.0-RELEASE

2009-12-01 Thread Alexander Motin
Stephane LAPIE wrote:
> Dieter wrote:
>> In message <4b13159d.9010...@darkbsd.org>, Stephane LAPIE writes:
>> And there are these, which you didn't mention:
>>> ad1: FAILURE - SET_MULTI status=3D51 error=3D4
>>> ad12: FAILURE - SET_MULTI status=3D51 error=3D4
>> (No, I don't know what that means, sorry.)
> 
> Ah, actually I do have an idea as to what these are about : These two
> disks are SSDs I'm using in my ZFS pool as cache devices. So I guess
> this is the kind of error that says "I tried to use an access mode this
> device was not designed for".

It is probably result of bug in ata(4) code. It was fixed recently by
r199749. It is not critical when drive operates in DMA mode.

-- 
Alexander Motin
___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to "freebsd-hardware-unsubscr...@freebsd.org"