Re: panic: biodone2 already

2018-08-18 Thread Michael van Elst
On Fri, Aug 17, 2018 at 12:59:57PM +0200, Jaromir Dolecek wrote:
> 
> yes, one of the problems is the code happily handles stale bufs. it does not 
> clear the buf pointer when the response is already handled. we should add 
> some KASSERTs there for this and clear the response structure on reuse. 

yes, that also makes it clear wether a request is processed twice (and
not just a buffer).

> also DPRINTF() in the loop assume bp is balid so it uses stale pointer for 
> disk flush ...

It also tries to handle invalid operation codes, but then assumes other
data (i.e. the buffer pointer) is valid.

Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: panic: biodone2 already

2018-08-17 Thread Jaromir Dolecek



> Le 17 août 2018 à 07:07, Michael van Elst  a écrit :
> 
>> On Fri, Aug 17, 2018 at 02:23:16AM +, Emmanuel Dreyfus wrote:
>> 
>>blkif_response_t *rep = RING_GET_RESPONSE(&sc->sc_ring, i);
>>struct xbd_req *xbdreq = &sc->sc_reqs[rep->id];
>>bp = xbdreq->req_bp;
>> 
>> It decides to call dk_done for the last occurence and return. Next
>> call to xbd_handler finds the same offending buf_t leading the queue.
>> dk_done is called again, leading to the panic.
> 

It should not do this since cons should equal prod and it should not enter the 
loop. i was investigating whether it could be some interaction between 
DIOCCACHEFLUSH and bio, or raw vs block I/O, nothing found yet. 

yes, one of the problems is the code happily handles stale bufs. it does not 
clear the buf pointer when the response is already handled. we should add some 
KASSERTs there for this and clear the response structure on reuse. 

also DPRINTF() in the loop assume bp is balid so it uses stale pointer for disk 
flush ...

the whole xbd code really needs a cleanup and proper mpification. 

I was not able to repeat the panic yet on my machine though. maybe i need 
bigger virtual disk. 

Re: panic: biodone2 already

2018-08-16 Thread Michael van Elst
On Fri, Aug 17, 2018 at 02:23:16AM +, Emmanuel Dreyfus wrote:

> blkif_response_t *rep = RING_GET_RESPONSE(&sc->sc_ring, i);
> struct xbd_req *xbdreq = &sc->sc_reqs[rep->id];
> bp = xbdreq->req_bp;
> 
> It decides to call dk_done for the last occurence and return. Next
> call to xbd_handler finds the same offending buf_t leading the queue.
> dk_done is called again, leading to the panic.

The ring returns the response with the request ID. Is that ID returned
multiple times or do several requests reference the same buffer ?

> Where should I look for the code whrere the receive ring is populated in
> the dom0?

The answers are pushed back to the ring by dom0 in xbdback_iodone()
that calls xbdback_send_reply().


Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: panic: biodone2 already

2018-08-16 Thread Emmanuel Dreyfus
On Fri, Aug 10, 2018 at 06:55:46AM -, Michael van Elst wrote:
> a queued operation eventually returns with a call to xbd_handler.
> - for every buffer returned, dk_done is called which finally ends
>   in invoking biodone.

After adding debug statements, I can now tel the offending buf_t 
is queued once, but xbd_handler finds it hundred of times in 
response queue here:
for (i = sc->sc_ring.rsp_cons; i != resp_prod; i++) {
blkif_response_t *rep = RING_GET_RESPONSE(&sc->sc_ring, i);
struct xbd_req *xbdreq = &sc->sc_reqs[rep->id];
bp = xbdreq->req_bp;


It decides to call dk_done for the last occurence and return. Next
call to xbd_handler finds the same offending buf_t leading the queue.
dk_done is called again, leading to the panic.

Where should I look for the code whrere the receive ring is populated in
the dom0?


-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: panic: biodone2 already

2018-08-10 Thread Michael van Elst
m...@netbsd.org (Emmanuel Dreyfus) writes:

>I can tell that in vfs_bio.c, bread() -> bio_doread() will call
>VOP_STRATEGY once for the offendinf buf_t, but biodone() is called twice
>in interrupt context for the buf_t, leading to the biodone2 already
>panic later.

>Since you know the xbd code you could save me some time: where do we go
>below VOP_SRATEGY? 

The buffer is passed to xbdstrategy that tries to run the I/O queue.
-> on error it finishes with biodone
regulary it calls dk_strategy that iterates over xbd_diskstart.
-> on error from xbd_diskstart it again finishes with biodone

xbd_diskstart tries to push a single buffer, it either
- queues the buffer to the "hardware" (a hypervisor event) or
- returns EAGAIN to retry the operation or
- returns another error to fail the operation

a queued operation eventually returns with a call to xbd_handler.
- for every buffer returned, dk_done is called which finally ends
  in invoking biodone.

hypervisor events link to a 'xbd_req' structure (and per request,
which can be I/O but also a cache flush operation). For I/O requests
the 'xbd_req' structure points to the buffer.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: panic: biodone2 already

2018-08-09 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> > xbd is not mpsafe, so it shouldn't be even race due to parallell
> > processing on different CPUs. Maybe it would be useful to check if the
> > problem still happens when you assign just single CPU to the DOMU.
> 
> I get the crash with vcpu = 1 for the domU. I also tried to pin a single
> cpu for the test domU, I still get it to crash:

I started tracing the code to see where the problem comes from. So far,
I can tell that in vfs_bio.c, bread() -> bio_doread() will call
VOP_STRATEGY once for the offendinf buf_t, but biodone() is called twice
in interrupt context for the buf_t, leading to the biodone2 already
panic later.

Since you know the xbd code you could save me some time: where do we go
below VOP_SRATEGY? 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: panic: biodone2 already

2018-08-08 Thread Emmanuel Dreyfus
On Wed, Aug 08, 2018 at 10:30:23AM +0700, Robert Elz wrote:
> This suggests to me that something is getting totally scrambled in
> the buf headers when things get busy.

I tried to crash with BIOHIST enabled, here is he story about the
buf_t that triggers the panic beause it as BO_DONE:

1533732322.896722 bread#23928@0: called!
1533732322.896727 biowait#53190@0: called: bp=0xa57d96c0, oflags=0x0, 
ret_addr=0x804c2e40
1533732322.896727 biowait#53190@0: waiting bp=0xa57d96c0
1533732322.897058 biodone#16699@0: called: bp=0xa57d96c0, softint 
scheduled
1533732322.897058 biodone#16700@0: called: bp=0xa57d96c0, softint 
scheduled
1533732322.897064 biointr#16597@0: called!
1533732322.897064 biointr#16597@0: bp=0xa57d96c0
1533732322.897064 biodone2#16777@0: called: bp=0xa57d96c0
1533732322.897064 biodone2#16777@0: wake-up
1533732322.897065 biointr#16597@0: bp=0xa57d96c0
1533732322.897065 biodone2#16778@0: called: bp=0xa57d96c0

There does not seems to be anything corrupted. the buf_t is handled
twice in biointr because it has been inserted twice in ci_data.cpu_biodone
by biodone.

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: panic: biodone2 already

2018-08-07 Thread Emmanuel Dreyfus
Robert Elz  wrote:

> This suggests to me that something is getting totally scrambled in
> the buf headers when things get busy.

I tried dumping  the buf_t before panic, to check if it could be
completely corrupted, but it seems it is not the case. Iblkno is
4904744, filesystem has 131891200 blocks.

bp = 0xa5c1e000
bp->b_error = 0
bp->b_resid = 0
bp->b_flags = 0x10
bp->b_prio = 1
bp->b_bufsize = 2048
bp->b_bcount = 2048
bp->b_dev = 0x8e00
bp->b_blkno = 4904744
bp->b_proc = 0x0
bp->b_saveaddr = 0x0
bp->b_private = 0x0
bp->b_dcookie = 0
bp->b_refcnt = 1
bp->b_lblkno = 0
bp->b_cflags = 0x10
bp->b_vp = 0xa00005c2b2a8
bp->b_oflags = 0x200
panic: biodone2 already

db{0}> show vnode 0xa5c2b2a8
OBJECT 0xa5c2b2a8: locked=1, pgops=0x8058ac00, npages=0,
refs=2

vnode 0xa5c2b2a8 flags 0x30
tag VT_UFS(1) type VDIR(2) mount 0xa448c000 typedata 0x0
usecount 2 writecount 0 holdcount 1
size 200 writesize 200 numoutput 0
data 0xa5c2cf00 lock 0xa5c2b3d8
state LOADED key(0xa448c000 8) 3a 9f 04 00 00 00 00 00
lrulisthd 0x8067bb80


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: panic: biodone2 already

2018-08-07 Thread Robert Elz
For what it is worth, and in this case it might not be much, I did a similar
test on my test amd64 DomU last night.

Running dump and /etc/daily in parallel did nothing, but running lots of
them in parallel eventually did cause a crash.

But I saw a different crash -- rather than a biodone2, I got a KASSERT
from one I added as part of attempting to diagnose the babylon5
"install failures" - that is, if my test kernel ever gets an I/O error,
the KASSERT (which is just KASSERT(0)) causes a crash.   This
was intended for generic kernels that run in qemu - but I use the
same sources for my generic testing, and simply left that there.
My DomU test system *never* gets an I/O error, so it simply did
not matter (its filesystem is on a raid on the Dom0, and neither the
Dom0 nor the raid report anything even smelling like I/O errors,
what's more, the Dom0 is more likely to crash than ever allow a
real I/O error through to the DomU).

This is the I/O error that occurred...

[ 485570.8105971] xbd0a: error writing fsbn 49691936 of 49691936-49691951 (xbd0 
bn 49691936; cn 24263 tn 0 sn 1312)panic: kernel diagnostic assertion "0" 
failed: file "/readonly/release/testing/src/sys/kern/subr_disk.c", line 163 


What's kind of interesting about that, is that the DomU filesystem is ...

format  FFSv1
endian  little-endian
magic   11954   timeWed Aug  8 03:57:00 2018
superblock location 8192id  [ 57248bd0 6db5a772 ]
cylgrp  dynamic inodes  4.4BSD  sblock  FFSv2   fslevel 4
nbfree  4037762 ndir2334nifree  2009341 nffree  3116
ncg 624 size33554432blocks  33289830

(no idea how I managed to make it FFSv1, but that doesn't matter).

What is interesting is "blocks  33289830" when compared with the
I/O error  "error writing fsbn 49691936" which is 16402106 blocks
beyond the end of the filesystem

This suggests to me that something is getting totally scrambled in
the buf headers when things get busy.

kre



Re: panic: biodone2 already

2018-08-07 Thread Emmanuel Dreyfus
Jaromír Dole?ek  wrote:

> Thanks. Could  you please try a -current kernel for DOMU and see if it
> crashes the same? If possible a DOMU kernel from daily builds, to rule
> out local compiler issue.

It crashes the same way with a kernel from 201808050730. here is uname
-a output:
NetBSD bacasable 8.99.23 NetBSD 8.99.23 (XEN3PAE_DOMU) #0: Sun Aug  5
06:48:50 UTC 2018
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/xen/compile/XEN3PAE_DOMU
i386

> xbd is not mpsafe, so it shouldn't be even race due to parallell
> processing on different CPUs. Maybe it would be useful to check if the
> problem still happens when you assign just single CPU to the DOMU.

I get the crash with vcpu = 1 for the domU. I also tried to pin a single
cpu for the test domU, I still get it to crash:

xl vcpu-pin bacasable 0 0
xl vcpu-pin $other_domU all 1-3

An interesting point: Adding the -X flag to dump seems to let it work
without a panic. It may be luck, but that did not crash so far.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: panic: biodone2 already

2018-08-07 Thread Jaromír Doleček
2018-08-07 18:42 GMT+02:00 Emmanuel Dreyfus :
>  kern/53506

Thanks. Could  you please try a -current kernel for DOMU and see if it
crashes the same? If possible a DOMU kernel from daily builds, to rule
out local compiler issue.

There are not really many differences in xbd/evtchn code itself
between 8.0 and -current however. There was some interrupt code
reorganization which might affected this, but this happened after
netbsd-8 was created.

xbd is not mpsafe, so it shouldn't be even race due to parallell
processing on different CPUs. Maybe it would be useful to check if the
problem still happens when you assign just single CPU to the DOMU.

Jaromir


Re: panic: biodone2 already

2018-08-07 Thread Emmanuel Dreyfus
Martin Husemann  wrote:

> Please file a PR.

 kern/53506

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: panic: biodone2 already

2018-08-07 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> /sbin/dump -a0f /dev/null /
> sh /etc/daily

The second command can be replaced by a simple
grep -r something /etc

But so far I did not managed to crash without running dump.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: panic: biodone2 already

2018-08-07 Thread Martin Husemann
On Tue, Aug 07, 2018 at 05:30:27PM +0200, Emmanuel Dreyfus wrote:
> I can reproduce the crash at will: running at the same time the two
> following commands reliabily trigger  "panic biodone2 already"
> 
> /sbin/dump -a0f /dev/null /
> sh /etc/daily

Please file a PR.

Martin


Re: panic: biodone2 already

2018-08-07 Thread Emmanuel Dreyfus
Jaromír Dole?ek  wrote:

> This is always a bug, driver processes same buf twice. It can do harm.
> If the buf is reused for some other I/O, system can fail to store
> data, or claim to read data when it didn't.

I can reproduce the crash at will: running at the same time the two
following commands reliabily trigger  "panic biodone2 already"

/sbin/dump -a0f /dev/null /
sh /etc/daily

That crash on NetBSD-8.0 domU, either i386 or amd64. At mine the only
machines that managed to spare the bug so far were the one where the
dump is finished when /etc/daily starts. All the other one keep
rebooting at least once a day.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: panic: biodone2 already

2018-08-06 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> Here it is. 

And here is another flavor below

I am now convinced the problem came with NetBSD 8.0:
I found that two other domU crashed on daily backup since 
NetBSD 8.0 upgrade, and the panic is also biodone2 already.
I start downgrading today.

uvm_fault(0xc06d4960, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip 0xc0360835 cs 0x9 eflags 0x210282 cr2 0x1c ilevel 0 esp 
0xc59010b4
curlwp 0xc56af000 pid 0 lid 39 lowest kstack 0xdd8bc2c0
panic: trap
cpu0: Begin traceback...
vpanic(c04da74f,dd8bde0c,dd8bde90,c010bb65,c04da74f,dd8bde9c,dd8bde9c,27,dd8bc2c0,210282)
 at netbsd:vpanic+0x12d
panic(c04da74f,dd8bde9c,dd8bde9c,27,dd8bc2c0,210282,1c,0,c59010b4,dd8bde9c) at 
netbsd:panic+0x18
trap() at netbsd:trap+0xc75
--- trap (number 6) ---
uvm_aio_aiodone_pages(dd8bdf20,8,0,0,0,0,0,0,0,0) at 
netbsd:uvm_aio_aiodone_pages+0x15
uvm_aio_aiodone(c59010b4,0,c52b0118,c52b0110,c52b0104,c52b00c0,c03ca660,c56af000,0,c0102031)
 at netbsd:uvm_aio_aiodone+0x9a
workqueue_worker(c52b00c0,b6f000,c0595200,0,c0100075,0,0,0,0,0) at 
netbsd:workqueue_worker+0xbe
cpu0: End traceback...

dumping to dev 142,1 offset 8
dump uvm_fault(0xc06d2be0, 0xfd894000, 2) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0x2 eip 0xc010806d cs 0x9 eflags 0x210202 cr2 0xfd894fff 
ilevel 0x8 esp 0xc06ba580
curlwp 0xc56af000 pid 0 lid 39 lowest kstack 0xdd8bc2c0
Skipping crash dump on recursive panic
panic: trap
cpu0: Begin traceback...
vpanic(c04da74f,dd8bdc98,dd8bdd1c,c010bb65,c04da74f,dd8bdd28,dd8bdd28,27,dd8bc2c0,210202)
 at netbsd:vpanic+0x12d
panic(c04da74f,dd8bdd28,dd8bdd28,27,dd8bc2c0,210202,fd894fff,8,c06ba580,dd8bdd28)
 at netbsd:panic+0x18
trap() at netbsd:trap+0xc75
--- trap (number 6) ---
dodumpsys(dd8bde0c,0,104,c01091fa,8,0,fffa,104,c04da74f,dd8bddf0) at 
netbsd:dodumpsys+0x39d
dumpsys(104,0,c04da74f,dd8bde0c,5,c56af000,6,dd8bde00,c03c4e48,c04da74f) at 
netbsd:dumpsys+0x14
vpanic(c04da74f,dd8bde0c,dd8bde90,c010bb65,c04da74f,dd8bde9c,dd8bde9c,27,dd8bc2c0,210282)
 at netbsd:vpanic+0x13b
panic(c04da74f,dd8bde9c,dd8bde9c,27,dd8bc2c0,210282,1c,0,c59010b4,dd8bde9c) at 
netbsd:panic+0x18
trap() at netbsd:trap+0xc75
--- trap (number 6) ---
uvm_aio_aiodone_pages(dd8bdf20,8,0,0,0,0,0,0,0,0) at 
netbsd:uvm_aio_aiodone_pages+0x15
uvm_aio_aiodone(c59010b4,0,c52b0118,c52b0110,c52b0104,c52b00c0,c03ca660,c56af000,0,c0102031)
 at netbsd:uvm_aio_aiodone+0x9a
workqueue_worker(c52b00c0,b6f000,c0595200,0,c0100075,0,0,0,0,0) at 
netbsd:workqueue_worker+0xbe
cpu0: End traceback...



-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: panic: biodone2 already

2018-08-06 Thread Emmanuel Dreyfus
Martin Husemann  wrote:

> What driver is this?

xbd, this is an NetBSD-8.0/i386 Xen domU on top of a NetBSD-8.0/amd64
dom0 running on Xen 4.8.3.

In the dom0, the disk image is in a file in a FFSv2 filesystem on a
RAIDframe RAID 1, with two wd disks.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: panic: biodone2 already

2018-08-06 Thread Martin Husemann
On Mon, Aug 06, 2018 at 08:37:56PM +0200, Emmanuel Dreyfus wrote:
> cpu0: Begin traceback...
> vpanic(c04da74f,dcbdfdd4,dcbdfe58,c010bb65,c04da74f,dcbdfe64,dcbdfe64,4,dcbde2c0,210202)
>  at netbsd:vpanic+0x12d
> panic(c04da74f,dcbdfe64,dcbdfe64,4,dcbde2c0,210202,fd893fff,8,c06ba580,ff491) 
> at netbsd:panic+0x18
> trap() at netbsd:trap+0xc75
> --- trap (number 6) ---
> dodumpsys(dcbdff48,0,104,c01091fa,8,0,fffb,104,c04f9e2c,dcbdff2c) at 
> netbsd:dodumpsys+0x39d
> dumpsys(104,0,c04f9e2c,dcbdff48,c5bf9950,c057fe00,dcc182cc,dcbdff3c,c03c4e48,c04f9e2c)
>  at netbsd:dumpsys+0x14
> vpanic(c04f9e2c,dcbdff48,dcbdff54,c03fdd2f,c04f9e2c,c0114934,c5bf9950,c057fe00,dcbdff68,c03fdd5d)
>  at netbsd:vpanic+0x13b
> panic(c04f9e2c,c0114934,c5bf9950,c057fe00,dcbdff68,c03fdd5d,0,0,dcc18004,dcbdff8c)
>  at netbsd:panic+0x18
> biodone2(0,0,dcc18004,dcbdff8c,c039e66d,0,dcbdff88,c52ae7e0,0,dcc18004) at 
> netbsd:biodone2+0xff
> biointr(0,dcbdff88,c52ae7e0,0,dcc18004,c039e610,c52ae7e0,0,c0102031,dcc18004) 
> at netbsd:biointr+0x2d
> softint_thread(dcc18004,b6f000,c0595200,0,c0100075,0,0,0,0,0) at 
> netbsd:softint_thread+0x5d
> cpu0: End traceback...

What driver is this?

Martin



Re: panic: biodone2 already

2018-08-06 Thread Emmanuel Dreyfus
Jaromír Dole?ek  wrote:

> Can you give full backtrace?

Here it is. I wonder if it could change things without -o log

cpu0: Begin traceback...
vpanic(c04da74f,dcbdfdd4,dcbdfe58,c010bb65,c04da74f,dcbdfe64,dcbdfe64,4,dcbde2c0,210202)
 at netbsd:vpanic+0x12d
panic(c04da74f,dcbdfe64,dcbdfe64,4,dcbde2c0,210202,fd893fff,8,c06ba580,ff491) 
at netbsd:panic+0x18
trap() at netbsd:trap+0xc75
--- trap (number 6) ---
dodumpsys(dcbdff48,0,104,c01091fa,8,0,fffb,104,c04f9e2c,dcbdff2c) at 
netbsd:dodumpsys+0x39d
dumpsys(104,0,c04f9e2c,dcbdff48,c5bf9950,c057fe00,dcc182cc,dcbdff3c,c03c4e48,c04f9e2c)
 at netbsd:dumpsys+0x14
vpanic(c04f9e2c,dcbdff48,dcbdff54,c03fdd2f,c04f9e2c,c0114934,c5bf9950,c057fe00,dcbdff68,c03fdd5d)
 at netbsd:vpanic+0x13b
panic(c04f9e2c,c0114934,c5bf9950,c057fe00,dcbdff68,c03fdd5d,0,0,dcc18004,dcbdff8c)
 at netbsd:panic+0x18
biodone2(0,0,dcc18004,dcbdff8c,c039e66d,0,dcbdff88,c52ae7e0,0,dcc18004) at 
netbsd:biodone2+0xff
biointr(0,dcbdff88,c52ae7e0,0,dcc18004,c039e610,c52ae7e0,0,c0102031,dcc18004) 
at netbsd:biointr+0x2d
softint_thread(dcc18004,b6f000,c0595200,0,c0100075,0,0,0,0,0) at 
netbsd:softint_thread+0x5d
cpu0: End traceback...


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: panic: biodone2 already

2018-08-06 Thread Jaromír Doleček
This is always a bug, driver processes same buf twice. It can do harm.
If the buf is reused for some other I/O, system can fail to store
data, or claim to read data when it didn't.

Can you give full backtrace?

Jaromir

2018-08-06 17:56 GMT+02:00 Emmanuel Dreyfus :
> Hello
>
> I have a Xen domU that now frequently panics with "biodone2 already". I
> suspect it started after upgrading to NetBSD 8.0.
>
> The offending code is in src/sys/kern/vfs_bio.c (see below). I wonder if
> we could not just release the mutex and exit in that case. Can it make
> some harm?
>
> static void
> biodone2(buf_t *bp)
> {
> void (*callout)(buf_t *);
>
> SDT_PROBE1(io, kernel, ,done, bp);
>
> BIOHIST_FUNC(__func__);
> BIOHIST_CALLARGS(biohist, "bp=%#jx", (uintptr_t)bp, 0, 0, 0);
>
> mutex_enter(bp->b_objlock);
> /* Note that the transfer is done. */
> if (ISSET(bp->b_oflags, BO_DONE))
> panic("biodone2 already");
>
> Cou
>
> --
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> m...@netbsd.org


panic: biodone2 already

2018-08-06 Thread Emmanuel Dreyfus
Hello

I have a Xen domU that now frequently panics with "biodone2 already". I
suspect it started after upgrading to NetBSD 8.0.

The offending code is in src/sys/kern/vfs_bio.c (see below). I wonder if
we could not just release the mutex and exit in that case. Can it make
some harm?

static void
biodone2(buf_t *bp)
{
void (*callout)(buf_t *);

SDT_PROBE1(io, kernel, ,done, bp);

BIOHIST_FUNC(__func__);
BIOHIST_CALLARGS(biohist, "bp=%#jx", (uintptr_t)bp, 0, 0, 0);

mutex_enter(bp->b_objlock);
/* Note that the transfer is done. */
if (ISSET(bp->b_oflags, BO_DONE))
        panic("biodone2 already");

Cou

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org