Taylor R Campbell a écrit : >> Date: Fri, 4 Oct 2024 10:37:24 +0200 >> From: BERTRAND Joël <[email protected]> >> >> -tco* at tcoichbus? # TCO watch dog timer >> +tco* at ichlpcib? # TCO watch dog timer > > This a curious change to make; what prompted it? Are you using the > watchdog timer? I'm slightly surprised this builds at all, and I'm > not sure it will work.
I don't remember I have done this configuration... My config fiel was
written a long time ago.
For your information, server has crashed last night.
>> I have upgraded my tree maybe 10 days ago. Before this upgrade system
>> was stable (uptime greater than 120 days).
>
> When was your tree previously updated? This might help to narrow down
> which change might have introduced the problem. (And, if you can
> bisect, that would be even more helpful!)
Last running kernel has a uptime greater than 100 days. I have rebooted
with a up to date -10.0 kernel. Thus, I think faulty patch was
introduced after may 2024.
>> I've just rebuild a new kernel. I don't know if someone use a system
>> with a similar configuration (I suspect a bad interaction between ccd
>> and iscsi). But how can I found more information to debug ?
I have rebuilt a kernel (same tree) with all diagnostic options. It
panics in iscsi routines when iscsictl tries to connect to first iscsi
volume.
[ 74.238270] panic: mutex_vector_enter,517: uninitialized lock
(lock=0xffff938021d86010, from=ffffffff80f71234)
[ 74.238270] cpu1: Begin traceback...
[ 74.238270] vpanic() at netbsd:vpanic+0x183
[ 74.238270] panic() at netbsd:panic+0x3c
[ 74.238270] lockdebug_wantlock() at netbsd:lockdebug_wantlock+0x180
[ 74.248268] mutex_enter() at netbsd:mutex_enter+0x23f
[ 74.248268] send_pdu() at netbsd:send_pdu+0x1b5
[ 74.248268] send_logout() at netbsd:send_logout+0x1d4
[ 74.248268] kill_connection() at netbsd:kill_connection+0x2fa
[ 74.248268] kill_session() at netbsd:kill_session+0x134
[ 74.248268] iscsiioctl() at netbsd:iscsiioctl+0x30f
[ 74.248268] sys_ioctl() at netbsd:sys_ioctl+0x56d
[ 74.248268] syscall() at netbsd:syscall+0x196
[ 74.248268] --- syscall (number 54) ---
[ 74.248268] netbsd:syscall+0x196:
[ 74.248268] cpu1: End traceback...
You can download faulty kernel (with and without debug option) at
ftp://newton.systella.fr. (files NETBSD.;1 and NETBSD.GDB;1).
Please note that this server runs OpenVMS and use binary transfer.
> You could try a current kernel. If the problem is there in current,
> it may be detected -- and reported in a more obvious way -- by the new
> heartbeat(9) diagnostic where each CPU's progress is periodically
> checked on by some other CPU
I will try.
Please note also last I cannot reboot my server with shutdown -r now if
I haven't killed (with kill -9) altqd. For me, it's not a real issue as
this server is two floors below my office, but for some users, if server
was far away...
Best regartds,
JB
signature.asc
Description: OpenPGP digital signature
