On 2021/03/02 00:09, Mark Schneider wrote:
> Hi,
> 
> Thank you for your feeeback.
> 
> Also OpenBSD 6.9beta snapshot is crashing when I setup RAID5 with three
> "Samsung PRO 860 1TB" SSDs.
> OpenBSD obsd69b.it-infra.org 6.9 GENERIC.MP#368 amd64
> 
> obsd69b# dmesg | grep  -i bios
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdc312018 (61 entries)
> bios0: vendor American Megatrends Inc. version "2201" date 03/23/2015
> bios0: ASUSTeK COMPUTER INC. CROSSHAIR V FORMULA-Z
> acpi0 at bios0: ACPI 5.0

Can you isolate softraid from the equation? Are the drives reliable with
this hardware configuration when not using softraid? I guess it would
need testing with simultaneous writes to the 3 drives to give a closer
match to the situation with softraid.

> > > bs=10M count=1024
> > > 
> > > # Error messages
> > > 
> > > uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e
> > > kernel: page fault trap, code=0
> > > Stopped at      sr_validate_io+0x44:    cmpl     $0,0x40(%r9)
> > > ddb{2}>

$ objdump -dlr softraid.o | less
...skipping...
0000000000009cc0 <sr_validate_io>:
sr_validate_io():
/usr/src/sys/dev/softraid.c:4569
    9cc0:       4c 8b 1d 00 00 00 00    mov    0(%rip),%r11        # 9cc7 
<sr_validate_io+0x7>
                        9cc3: R_X86_64_PC32     
__retguard_3962+0xfffffffffffffffc
    9cc7:       4c 33 1c 24             xor    (%rsp),%r11
    9ccb:       55                      push   %rbp
    9ccc:       48 89 e5                mov    %rsp,%rbp
    9ccf:       57                      push   %rdi
    9cd0:       56                      push   %rsi
    9cd1:       52                      push   %rdx
    9cd2:       57                      push   %rdi
    9cd3:       41 53                   push   %r11
    9cd5:       50                      push   %rax
/usr/src/sys/dev/softraid.c:4570
    9cd6:       4c 8b 47 08             mov    0x8(%rdi),%r8
/usr/src/sys/dev/softraid.c:4577
    9cda:       49 8b 88 70 09 00 00    mov    0x970(%r8),%rcx
    9ce1:       83 b9 94 00 00 00 00    cmpl   $0x0,0x94(%rcx)
    9ce8:       0f 84 a2 01 00 00       je     9e90 <sr_validate_io+0x1d0>
    9cee:       b8 01 00 00 00          mov    $0x1,%eax
/usr/src/sys/dev/softraid.c:4580
    9cf3:       41 83 b8 20 0a 00 00    cmpl   $0x1,0xa20(%r8)
    9cfa:       01
    9cfb:       0f 84 69 01 00 00       je     9e6a <sr_validate_io+0x1aa>
    9d01:       4c 8b 0f                mov    (%rdi),%r9
/usr/src/sys/dev/softraid.c:4586
    9d04:       41 83 79 40 00          cmpl   $0x0,0x40(%r9)
    9d09:       74 47                   je     9d52 <sr_validate_io+0x92>
/usr/src/sys/dev/softraid.c:4592

putting sr_validate_io+0x44 at the xs->datalen dereference,

4580         if (sd->sd_vol_status == BIOC_SVOFFLINE) {
4581                 DNPRINTF(SR_D_DIS, "%s: %s device offline\n",
4582                     DEVNAME(sd->sd_sc), func);
4583                 goto bad;
4584         }
4585
4586         if (xs->datalen == 0) {
4587                 printf("%s: %s: illegal block count for %s\n",
4588                     DEVNAME(sd->sd_sc), func, sd->sd_meta->ssd_devname)    
 ;
4589                 goto bad;
4590         }

...so null/invalid xs?

"trace" and "sh reg" from ddb would give more clues.

Reply via email to