Re: panic: biodone2 already

2018-08-09 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> > xbd is not mpsafe, so it shouldn't be even race due to parallell
> > processing on different CPUs. Maybe it would be useful to check if the
> > problem still happens when you assign just single CPU to the DOMU.
> 
> I get the crash with vcpu = 1 for the domU. I also tried to pin a single
> cpu for the test domU, I still get it to crash:

I started tracing the code to see where the problem comes from. So far,
I can tell that in vfs_bio.c, bread() -> bio_doread() will call
VOP_STRATEGY once for the offendinf buf_t, but biodone() is called twice
in interrupt context for the buf_t, leading to the biodone2 already
panic later.

Since you know the xbd code you could save me some time: where do we go
below VOP_SRATEGY? 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: verbose vflushbuf()

2018-08-09 Thread Emmanuel Dreyfus
J. Hannken-Illjes  wrote:

> For me it triggers for mounted block devices only and I suppose the
> vnode lock doesn't help here.

I have not yet fully understood the thing, but I suspect it is related
to snapshots.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Mindaugas Rasiukevicius
Jason Thorpe  wrote:
> 
> 
> > On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon  wrote:
> > 
> > Actually, I wonder if we could kill off the time spent by fileassoc.  Is
> > it still used only by veriexec?  We can easily option that out of the
> > build box kernels.
> 
> Indeed.  And there are better ways to do what veriexec does, in any case.
> 

Many years ago I wrote a diff to make fileassoc MP-safe:

http://www.netbsd.org/~rmind/fileassoc.diff

If somebody wants to finish -- I am glad to help.

-- 
Mindaugas


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Joerg Sonnenberger
On Fri, Aug 10, 2018 at 12:29:49AM +0200, Joerg Sonnenberger wrote:
> On Thu, Aug 09, 2018 at 08:14:57PM +0200, Jaromír Doleček wrote:
> > 2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon :
> > > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote:
> > >> 100.002054 14.18 kernel_lock
> > >>  47.43 846  6.72 kernel_lockfileassoc_file_delete+20
> > >>  23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
> > >>  16.01 203  2.27 kernel_lockscsipi_adapter_request+63
> > > Actually, I wonder if we could kill off the time spent by fileassoc.  Is
> > > it still used only by veriexec?  We can easily option that out of the 
> > > build
> > > box kernels.
> > 
> > Or even better, make it less heavy?
> > 
> > It's not really intuitive that you could improve filesystem
> > performance by removing this obscure component.
> 
> If it is not in use, fileassoc_file_delete will short cut already.

...and of course, the check seems to be just useless. So yes, it should
be possible to make it much less heavy.

Joerg


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Joerg Sonnenberger
On Thu, Aug 09, 2018 at 08:14:57PM +0200, Jaromír Doleček wrote:
> 2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon :
> > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote:
> >> 100.002054 14.18 kernel_lock
> >>  47.43 846  6.72 kernel_lockfileassoc_file_delete+20
> >>  23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
> >>  16.01 203  2.27 kernel_lockscsipi_adapter_request+63
> > Actually, I wonder if we could kill off the time spent by fileassoc.  Is
> > it still used only by veriexec?  We can easily option that out of the build
> > box kernels.
> 
> Or even better, make it less heavy?
> 
> It's not really intuitive that you could improve filesystem
> performance by removing this obscure component.

If it is not in use, fileassoc_file_delete will short cut already.

Joerg


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Thor Lancelot Simon
On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote:
> With the patch applied:
> 
> Elapsed time: 1564.93 seconds.
> 
> -- Kernel lock spin
> 
> Total%  Count   Time/ms  Lock   Caller
> -- --- - -- --
> 100.002054 14.18 kernel_lock
>  47.43 846  6.72 kernel_lockfileassoc_file_delete+20
>  23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
>  16.01 203  2.27 kernel_lockscsipi_adapter_request+63
>   5.29 662  0.75 kernel_lockVOP_POLL+93
>   5.29  95  0.75 kernel_lockbiodone2+81
>   0.91  15  0.13 kernel_locksleepq_block+1c5
>   0.60  21  0.08 kernel_lockfrag6_fasttimo+1a
>   0.29   9  0.04 kernel_lockip_slowtimo+1a
>   0.27   2  0.04 kernel_lockVFS_SYNC+65
>   0.07   2  0.01 kernel_lockcallout_softclock+42c
>   0.06   3  0.01 kernel_locknd6_timer_work+49
>   0.05   4  0.01 kernel_lockfrag6_slowtimo+1f
>   0.01   4  0.00 kernel_lockkevent1+698
> 
> so .. no need to worry about kernel_lock for this load any more.

Actually, I wonder if we could kill off the time spent by fileassoc.  Is
it still used only by veriexec?  We can easily option that out of the build
box kernels.

-- 
 Thor Lancelot Simon t...@panix.com
  "Whether or not there's hope for change is not the question.  If you
   want to be a free person, you don't stand up for human rights because
   it will work, but because it is right."  --Andrei Sakharov


Re: verbose vflushbuf()

2018-08-09 Thread David Holland
On Thu, Aug 09, 2018 at 12:44:28PM +, Emmanuel Dreyfus wrote:
 > It seems we have something like a debug message left in 
 > src/sys/kern/vfs_subr.c:vflushbuf()
 > 
 >  if (dirty) {
 >  vprint("vflushbuf: dirty", vp);
 >  goto loop;
 >  }
 > 
 > It has been there for a while (7 years). Is there a reason 
 > why it remains always enabled? I have a machine that hit 
 > the place in a loop, getting stuck for hours printing
 > messages on the console. If it safe to #ifdef DEBUG this
 > printf?

Probably, but I don't think it's supposed to happen and possibly it
should be a panic:

/*
 * Called with the underlying vnode locked, which should prevent new dirty
 * buffers from being queued.
 */


-- 
David A. Holland
dholl...@netbsd.org


verbose vflushbuf()

2018-08-09 Thread Emmanuel Dreyfus
Hello

It seems we have something like a debug message left in 
src/sys/kern/vfs_subr.c:vflushbuf()

if (dirty) {
vprint("vflushbuf: dirty", vp);
goto loop;
}

It has been there for a while (7 years). Is there a reason 
why it remains always enabled? I have a machine that hit 
the place in a loop, getting stuck for hours printing
messages on the console. If it safe to #ifdef DEBUG this
printf?

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: repeated panics in mutex_vector_enter (from unp_thread)

2018-08-09 Thread Edgar Fuß
> Reader / writer lock error: lockdebug_wantlock: locking against myself
Turns out this is an entirely different problem.
The backtrace is 
lockdebug_more<-rw_enter<-fr_check<-pfil_run_hooks<-ip6_output<-nd6_ns_output<-nd6_output<-fr_fastroute<-fr_send_ip<-fr_send_reset<-fr_check<-pfi_run_hooks<-ip6_input<-ip6intr<-softint_dispatch.

I guess this might long be fixed.

So I'll try building and running an -8 kernel.


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Martin Husemann
With the patch applied:

Elapsed time: 1564.93 seconds.

-- Kernel lock spin

Total%  Count   Time/ms  Lock   Caller
-- --- - -- --
100.002054 14.18 kernel_lock
 47.43 846  6.72 kernel_lockfileassoc_file_delete+20
 23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
 16.01 203  2.27 kernel_lockscsipi_adapter_request+63
  5.29 662  0.75 kernel_lockVOP_POLL+93
  5.29  95  0.75 kernel_lockbiodone2+81
  0.91  15  0.13 kernel_locksleepq_block+1c5
  0.60  21  0.08 kernel_lockfrag6_fasttimo+1a
  0.29   9  0.04 kernel_lockip_slowtimo+1a
  0.27   2  0.04 kernel_lockVFS_SYNC+65
  0.07   2  0.01 kernel_lockcallout_softclock+42c
  0.06   3  0.01 kernel_locknd6_timer_work+49
  0.05   4  0.01 kernel_lockfrag6_slowtimo+1f
  0.01   4  0.00 kernel_lockkevent1+698

so .. no need to worry about kernel_lock for this load any more.

Mindaugas, can you please commit your patch and request pullup?

Martin