epoch callback panic

2022-04-01 Thread Peter Holm
markj@ asked me to post this one:

panic: rw lock 0xf801bccb1410 not unlocked
cpuid = 4
time = 1648770125
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00e48a3d10
vpanic() at vpanic+0x17f/frame 0xfe00e48a3d60
panic() at panic+0x43/frame 0xfe00e48a3dc0
_rw_destroy() at _rw_destroy+0x35/frame 0xfe00e48a3dd0
in_lltable_destroy_lle_unlocked() at in_lltable_destroy_lle_unlocked+0x1a/frame 
0xfe00e48a3df0
epoch_call_task() at epoch_call_task+0x13a/frame 0xfe00e48a3e40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0xa7/frame 0xfe00e48a3ec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfe00e48a3ef0
fork_exit() at fork_exit+0x80/frame 0xfe00e48a3f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00e48a3f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Details @ https://people.freebsd.org/~pho/stress/log/log0275.txt

- Peter



Re: epoch callback panic

2022-04-01 Thread Hans Petter Selasky

On 4/1/22 19:07, Peter Holm wrote:

markj@ asked me to post this one:

panic: rw lock 0xf801bccb1410 not unlocked
cpuid = 4
time = 1648770125
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00e48a3d10
vpanic() at vpanic+0x17f/frame 0xfe00e48a3d60
panic() at panic+0x43/frame 0xfe00e48a3dc0
_rw_destroy() at _rw_destroy+0x35/frame 0xfe00e48a3dd0
in_lltable_destroy_lle_unlocked() at in_lltable_destroy_lle_unlocked+0x1a/frame 
0xfe00e48a3df0
epoch_call_task() at epoch_call_task+0x13a/frame 0xfe00e48a3e40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0xa7/frame 0xfe00e48a3ec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfe00e48a3ef0
fork_exit() at fork_exit+0x80/frame 0xfe00e48a3f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00e48a3f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Details @ https://people.freebsd.org/~pho/stress/log/log0275.txt



Hi,

Maybe you need to grab the lock before destroying it?

Is this easily reproducible?

--HPS




Re: epoch callback panic

2022-04-01 Thread Hans Petter Selasky

On 4/1/22 22:33, Hans Petter Selasky wrote:

Hi,

Maybe you need to grab the lock before destroying it?

Is this easily reproducible?

--HPS


Can you figure out the owner of the lock?

I guess the owner is not in an epoch section like it should!

--HPS



Re: epoch callback panic

2022-04-01 Thread Peter Holm
On Fri, Apr 01, 2022 at 10:33:15PM +0200, Hans Petter Selasky wrote:
> On 4/1/22 19:07, Peter Holm wrote:
> > markj@ asked me to post this one:
> > 
> > panic: rw lock 0xf801bccb1410 not unlocked
> > cpuid = 4
> > time = 1648770125
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> > 0xfe00e48a3d10
> > vpanic() at vpanic+0x17f/frame 0xfe00e48a3d60
> > panic() at panic+0x43/frame 0xfe00e48a3dc0
> > _rw_destroy() at _rw_destroy+0x35/frame 0xfe00e48a3dd0
> > in_lltable_destroy_lle_unlocked() at 
> > in_lltable_destroy_lle_unlocked+0x1a/frame 0xfe00e48a3df0
> > epoch_call_task() at epoch_call_task+0x13a/frame 0xfe00e48a3e40
> > gtaskqueue_run_locked() at gtaskqueue_run_locked+0xa7/frame 
> > 0xfe00e48a3ec0
> > gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 
> > 0xfe00e48a3ef0
> > fork_exit() at fork_exit+0x80/frame 0xfe00e48a3f30
> > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00e48a3f30
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > 
> > Details @ https://people.freebsd.org/~pho/stress/log/log0275.txt
> > 
> 
> Hi,
> 
> Maybe you need to grab the lock before destroying it?
> 

This was on a pristine main-n254137-31190aa02eef0.

> Is this easily reproducible?
> 

No.  I have only seen this once when running rsync between a nfs mount
and a SU file system.

- Peter

> --HPS



Re: epoch callback panic

2022-04-01 Thread Bjoern A. Zeeb

On 1 Apr 2022, at 20:51, Peter Holm wrote:


On Fri, Apr 01, 2022 at 10:33:15PM +0200, Hans Petter Selasky wrote:

On 4/1/22 19:07, Peter Holm wrote:

markj@ asked me to post this one:

panic: rw lock 0xf801bccb1410 not unlocked
cpuid = 4
time = 1648770125
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe00e48a3d10

vpanic() at vpanic+0x17f/frame 0xfe00e48a3d60
panic() at panic+0x43/frame 0xfe00e48a3dc0
_rw_destroy() at _rw_destroy+0x35/frame 0xfe00e48a3dd0
in_lltable_destroy_lle_unlocked() at 
in_lltable_destroy_lle_unlocked+0x1a/frame 0xfe00e48a3df0

epoch_call_task() at epoch_call_task+0x13a/frame 0xfe00e48a3e40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0xa7/frame 
0xfe00e48a3ec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 
0xfe00e48a3ef0

fork_exit() at fork_exit+0x80/frame 0xfe00e48a3f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00e48a3f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Details @ https://people.freebsd.org/~pho/stress/log/log0275.txt



Hi,

Maybe you need to grab the lock before destroying it?



This was on a pristine main-n254137-31190aa02eef0.


If there was no other emory corruption and my memory serves me right, 
la_flags = 0x1 was LLE_DELETED which gives a good hint on the call path.


If I had to bet it’s coming out of the 2nd condition in 
in_scrubprefixlle ... I’d check lltable_delete_addr .. and so on ..


/bz