Re: crashes in amd64 8.99.51/9.99.2 with panic: pr_find_pagehead: [npfcn4pl]

2019-08-07 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo  writes:

> Geoff Wing  writes:
>
>> 8.99.51 crash:
>> panic: pr_find_pagehead: [npfcn4pl] item 0x98a0b89491b8 poolid 182 != 181
>
> I'm seeing these on amd64 and aarch64, both current at the time that the
> release branch for version 9 was created.

...but no longer, after rmind supplied patches that christos committed. :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: crashes in amd64 8.99.51/9.99.2 with panic: pr_find_pagehead: [npfcn4pl]

2019-08-04 Thread Tom Ivar Helbekkmo
Geoff Wing  writes:

> 8.99.51 crash:
> panic: pr_find_pagehead: [npfcn4pl] item 0x98a0b89491b8 poolid 182 != 181

I'm seeing these on amd64 and aarch64, both current at the time that the
release branch for version 9 was created.  The crashes seem to happen
when there's quite a bit of disk activity: both systems crash during the
nightly jobs cron run, and a surefire way to provoke a crash is to
install a new version of the system, by running 'tar xzpf' on each of
the set files in turn.  (I've tried, several times, to get a complete
distribution from the above mentioned point in time successfully
installed on these, but still don't have all the X stuff completed.)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: crashes in amd64 8.99.51/9.99.2 with panic: pr_find_pagehead: [npfcn4pl]

2019-08-01 Thread Maxime Villard

8.99.51 crash:
panic: pr_find_pagehead: [npfcn4pl] item 0x98a0b89491b8 poolid 182 != 181
cpu1: Begin traceback...
vpanic() at netbsd:vpanic+0x160
snprintf() at netbsd:snprintf
pool_put() at netbsd:pool_put+0x6b9
pool_cache_invalidate_groups() at netbsd:pool_cache_invalidate_groups+0x71
pool_cache_invalidate() at netbsd:pool_cache_invalidate+0xd5
pool_reclaim() at netbsd:pool_reclaim+0xa7
pool_drain() at netbsd:pool_drain+0x85
uvmpd_pool_drain_thread() at netbsd:uvmpd_pool_drain_thread+0x74
cpu1: End traceback...


Mmh, interesting, there is a pool mismatch. (This is a bug detection feature
I added recently.)

Here NPF called pool_cache_put() on the wrong pool. It seems that a buffer
allocated from conn_cache[1] ended up being freed in conn_cache[0], probably
in npf_conn_destroy().

Mindaugas, can you have a look? Thanks