On 2021/03/02 00:09, Mark Schneider wrote: > Hi, > > Thank you for your feeeback. > > Also OpenBSD 6.9beta snapshot is crashing when I setup RAID5 with three > "Samsung PRO 860 1TB" SSDs. > OpenBSD obsd69b.it-infra.org 6.9 GENERIC.MP#368 amd64 > > obsd69b# dmesg | grep -i bios > bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdc312018 (61 entries) > bios0: vendor American Megatrends Inc. version "2201" date 03/23/2015 > bios0: ASUSTeK COMPUTER INC. CROSSHAIR V FORMULA-Z > acpi0 at bios0: ACPI 5.0
Can you isolate softraid from the equation? Are the drives reliable with this hardware configuration when not using softraid? I guess it would need testing with simultaneous writes to the 3 drives to give a closer match to the situation with softraid. > > > bs=10M count=1024 > > > > > > # Error messages > > > > > > uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e > > > kernel: page fault trap, code=0 > > > Stopped at sr_validate_io+0x44: cmpl $0,0x40(%r9) > > > ddb{2}> $ objdump -dlr softraid.o | less ...skipping... 0000000000009cc0 <sr_validate_io>: sr_validate_io(): /usr/src/sys/dev/softraid.c:4569 9cc0: 4c 8b 1d 00 00 00 00 mov 0(%rip),%r11 # 9cc7 <sr_validate_io+0x7> 9cc3: R_X86_64_PC32 __retguard_3962+0xfffffffffffffffc 9cc7: 4c 33 1c 24 xor (%rsp),%r11 9ccb: 55 push %rbp 9ccc: 48 89 e5 mov %rsp,%rbp 9ccf: 57 push %rdi 9cd0: 56 push %rsi 9cd1: 52 push %rdx 9cd2: 57 push %rdi 9cd3: 41 53 push %r11 9cd5: 50 push %rax /usr/src/sys/dev/softraid.c:4570 9cd6: 4c 8b 47 08 mov 0x8(%rdi),%r8 /usr/src/sys/dev/softraid.c:4577 9cda: 49 8b 88 70 09 00 00 mov 0x970(%r8),%rcx 9ce1: 83 b9 94 00 00 00 00 cmpl $0x0,0x94(%rcx) 9ce8: 0f 84 a2 01 00 00 je 9e90 <sr_validate_io+0x1d0> 9cee: b8 01 00 00 00 mov $0x1,%eax /usr/src/sys/dev/softraid.c:4580 9cf3: 41 83 b8 20 0a 00 00 cmpl $0x1,0xa20(%r8) 9cfa: 01 9cfb: 0f 84 69 01 00 00 je 9e6a <sr_validate_io+0x1aa> 9d01: 4c 8b 0f mov (%rdi),%r9 /usr/src/sys/dev/softraid.c:4586 9d04: 41 83 79 40 00 cmpl $0x0,0x40(%r9) 9d09: 74 47 je 9d52 <sr_validate_io+0x92> /usr/src/sys/dev/softraid.c:4592 putting sr_validate_io+0x44 at the xs->datalen dereference, 4580 if (sd->sd_vol_status == BIOC_SVOFFLINE) { 4581 DNPRINTF(SR_D_DIS, "%s: %s device offline\n", 4582 DEVNAME(sd->sd_sc), func); 4583 goto bad; 4584 } 4585 4586 if (xs->datalen == 0) { 4587 printf("%s: %s: illegal block count for %s\n", 4588 DEVNAME(sd->sd_sc), func, sd->sd_meta->ssd_devname) ; 4589 goto bad; 4590 } ...so null/invalid xs? "trace" and "sh reg" from ddb would give more clues.