On Thu, Sep 15, 2016 at 11:54:12AM -0700, John Baldwin wrote:
> On Thursday, September 15, 2016 08:49:48 PM Slawa Olhovchenkov wrote:
> > On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote:
> > 
> > > On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote:
> > > > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote:
> > > > 
> > > > > I am have strange issuse with nginx on FreeBSD11.
> > > > > I am have FreeBSD11 instaled over STABLE-10.
> > > > > nginx build for FreeBSD10 and run w/o recompile work fine.
> > > > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node
> > > > > totaly craped.
> > > > > 
> > > > > I am see next potential cause:
> > > > > 
> > > > > 1) clang 3.8 code generation issuse
> > > > > 2) system library issuse
> > > > > 
> > > > > may be i am miss something?
> > > > > 
> > > > > How to find real cause?
> > > > 
> > > > I find real cause and this like show-stopper for RELEASE.
> > > > I am use nginx with AIO and AIO from one nginx process corrupt memory
> > > > from other nginx process. Yes, this is cross-process memory
> > > > corruption.
> > > > 
> > > > Last case, core dumped proccess with pid 1060 at 15:45:14.
> > > > Corruped memory at 0x860697000.
> > > > I am know about good memory at 0x86067f800.
> > > > Dumping (form core) this region to file and analyze by hexdump I am
> > > > found start of corrupt region -- offset 0000c8c0 from 0x86067f800.
> > > > 0x86067f800+0xc8c0 = 0x86068c0c0
> > > > 
> > > > I am preliminary enabled debuggin of AIO started operation to nginx
> > > > error log (memory address, file name, offset and size of transfer).
> > > > 
> > > > grep -i 86068c0c0 error.log near 15:45:14 give target file.
> > > > grep ce949665cbcd.hls error.log near 15:45:14 give next result:
> > > > 
> > > > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 000000082065DB60 
> > > > start 000000086068C0C0 561b0   2646736 ce949665cbcd.hls
> > > > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 000000081F1FFB60 
> > > > start 000000086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls
> > > > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 00000008216B6B60 
> > > > start 000000086472B7C0 7ff70   2999424 ce949665cbcd.hls
> > > 
> > > Does nginx only use AIO for regular files or does it also use it with 
> > > sockets?
> > 
> > Only for regular files.
> > 
> > > You can try using this patch as a diagnostic (you will need to
> > > run with INVARIANTS enabled,
> > 
> > How much debugs produced?
> > I am have about 5-10K aio's per second.
> > 
> > > or at least enabled for vfs_aio.c):
> > 
> > How I can do this (enable INVARIANTS for vfs_aio.c)?
> 
> Include INVARIANT_SUPPORT in your kernel and add a line with:
> 
> #define INVARIANTS
> 
> at the top of sys/kern/vfs_aio.c.
> 
> > 
> > > Index: vfs_aio.c
> > > ===================================================================
> > > --- vfs_aio.c     (revision 305811)
> > > +++ vfs_aio.c     (working copy)
> > > @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job)
> > >    * aio_aqueue() acquires a reference to the file that is
> > >    * released in aio_free_entry().
> > >    */
> > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> > > +     ("%s: vmspace mismatch", __func__));
> > >   if (cb->aio_lio_opcode == LIO_READ) {
> > >           auio.uio_rw = UIO_READ;
> > >           if (auio.uio_resid == 0)
> > > @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job)
> > >  {
> > >  
> > >   vmspace_switch_aio(job->userproc->p_vmspace);
> > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> > > +     ("%s: vmspace mismatch", __func__));
> > >  }
> > > 
> > > If this panics, then vmspace_switch_aio() is not working for
> > > some reason.
> > 
> > This issuse caused rare, this panic produced with issuse or on any aio
> > request? (this is production server)
> 
> It would panic in the case that we are going to write into the wrong
> process (so about as rare as your issue).
> 

vmspace_switch_aio() allows context switching with old curpmap
and new proc->p_vmspace. This is a weird condition, where
curproc->p_vmspace->vm_pmap is not equal to curcpu->pc_curpmap. I do
not see an obvious place which would immediately break, e.g. even
for context switch between assignment of newvm to p_vmspace and
pmap_activate(), the context-switch call to pmap_activate_sw() seems to
do right thing.

Still, just in case, try this

diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
index a23468e..fbaa6c1 100644
--- a/sys/vm/vm_map.c
+++ b/sys/vm/vm_map.c
@@ -481,6 +481,7 @@ vmspace_switch_aio(struct vmspace *newvm)
        if (oldvm == newvm)
                return;
 
+       critical_enter();
        /*
         * Point to the new address space and refer to it.
         */
@@ -489,6 +490,7 @@ vmspace_switch_aio(struct vmspace *newvm)
 
        /* Activate the new mapping. */
        pmap_activate(curthread);
+       critical_exit();
 
        /* Remove the daemon's reference to the old address space. */
        KASSERT(oldvm->vm_refcnt > 1,
_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to