On Fri, Jul 12, 2013 at 08:05:27AM +0200, Andre Albsmeier wrote:
> On Fri, 12-Jul-2013 at 08:01:12 +0200, Konstantin Belousov wrote:
> > On Fri, Jul 12, 2013 at 07:24:40AM +0200, Andre Albsmeier wrote:
> > > On Thu, 04-Jul-2013 at 19:25:28 +0200, Konstantin Belousov wrote:
> > > > On Thu, Jul 04, 2013 at 04:29:19PM +0200, Andre Albsmeier wrote:
> > > > > OK, patch is applied. I will reboot the machine later
> > > > > and see what happens tomorrow in the morning. However,
> > > > > it might take a few days since the last 2 weeks all was
> > > > > fine.
> > > > > 
> > > > > BTW, should this patch be used in general or is it just
> > > > > for debugging? My understanding is that it is something
> > > > > which could stay in the code...
> > > > 
> > > > Patch is to improve debugging.
> > > > 
> > > > I probably commit it after the issue is closed.  Arguments against
> > > > the commit is that the change imposes small performance penalty
> > > > due to save and restore of the %ebp (I doubt that this is measureable
> > > > by any means).  Also, arguably, such change should be done for all
> > > > functions in support.s, but bcopy() is the hot spot.
> > > 
> > > Got a new one, 2 hours old ;-)
> > > 
> > > GNU gdb 6.1.1 [FreeBSD]
> > > Copyright 2004 Free Software Foundation, Inc.
> > > GDB is free software, covered by the GNU General Public License, and you 
> > > are
> > > welcome to change it and/or distribute copies of it under certain 
> > > conditions.
> > > Type "show copying" to see the conditions.
> > > There is absolutely no warranty for GDB.  Type "show warranty" for 
> > > details.
> > > This GDB was configured as "i386-marcel-freebsd"...
> > > 
> > > Unread portion of the kernel message buffer:
> > > 
> > > 
> > > Fatal trap 12: page fault while in kernel mode
> > > fault virtual address   = 0xcd5ec000
> > > fault code              = supervisor write, page not present
> > > instruction pointer     = 0x20:0xc07cb2fe
> > > stack pointer           = 0x28:0xd82e45cc
> > > frame pointer           = 0x28:0xd82e45d4
> > > code segment            = base 0x0, limit 0xfffff, type 0x1b
> > >                         = DPL 0, pres 1, def32 1, gran 1
> > > processor eflags        = interrupt enabled, resume, IOPL = 0
> > > current process         = 18714 (mksnap_ffs)
> > > trap number             = 12
> > > panic: page fault
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper(c08207eb,d82e4418,c05fdfc9,c081df13,c08a82e0,...) 
> > > at db_trace_self_wrapper+0x26/frame 0xd82e43e8
> > > kdb_backtrace(c081df13,c08a82e0,c0801bfa,d82e4424,d82e4424,...) at 
> > > kdb_backtrace+0x29/frame 0xd82e43f4
> > > panic(c0801bfa,c0845a01,c2b067d4,1,1,...) at panic+0xc9/frame 0xd82e4418
> > > trap_fatal(c0ff6000,cd5ec000,2,0,c08b6bf4,...) at trap_fatal+0x353/frame 
> > > 0xd82e4458
> > > trap_pfault(baa8454b,21510,0,c2b06620,c08b6bf0,...) at 
> > > trap_pfault+0x2d7/frame 0xd82e44a0
> > > trap(d82e458c) at trap+0x41a/frame 0xd82e4580
> > > calltrap() at calltrap+0x6/frame 0xd82e4580
> > > --- trap 0xc, eip = 0xc07cb2fe, esp = 0xd82e45cc, ebp = 0xd82e45d4 ---
> > > bcopy(c36ed000,cd5e6000,8000,8000,c281b980,...) at bcopy+0x1a/frame 
> > > 0xd82e45d4
> > > ffs_snapshot(c2b35a90,c2ed0400,0,0,0,...) at ffs_snapshot+0x2933/frame 
> > > 0xd82e490c
> > > ffs_mount(c2b35a90,c322e200,ff,d82e4c08,c2ccbc8c,...) at 
> > > ffs_mount+0x15ee/frame 0xd82e4a3c
> > > vfs_donmount(c2b06620,10313108,0,c2b74d80,c2b74d80,...) at 
> > > vfs_donmount+0x196b/frame 0xd82e4c2c
> > > sys_nmount(c2b06620,d82e4ccc,c2b06908,d82e4c6c,c0605015,...) at 
> > > sys_nmount+0x63/frame 0xd82e4c50
> > > syscall(d82e4d08) at syscall+0x2ce/frame 0xd82e4cfc
> > > Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xd82e4cfc
> > > --- syscall (378, FreeBSD ELF32, sys_nmount), eip = 0x180bdf37, esp = 
> > > 0xbfbfd65c, ebp = 0xbfbfddd8 ---
> > > Uptime: 4d20h0m44s
> > > Physical memory: 503 MB
> > > Dumping 104 MB: 89 73 57 41 25 9
> > > 
> > > No symbol "stopped_cpus" in current context.
> > > No symbol "stoppcbs" in current context.
> > > #0  doadump (textdump=1) at pcpu.h:249
> > > 249     pcpu.h: No such file or directory.
> > >         in pcpu.h
> > > (kgdb) where
> > > #0  doadump (textdump=1) at pcpu.h:249
> > > #1  0xc05fdddd in kern_reboot (howto=260) at 
> > > /src/src-9/sys/kern/kern_shutdown.c:449
> > > #2  0xc05fe028 in panic (fmt=<value optimized out>) at 
> > > /src/src-9/sys/kern/kern_shutdown.c:637
> > > #3  0xc07cd1d3 in trap_fatal (frame=0xd82e458c, eva=3445538816)
> > >     at /src/src-9/sys/i386/i386/trap.c:1044
> > > #4  0xc07cd4b7 in trap_pfault (frame=0xd82e458c, usermode=0, 
> > > eva=3445538816)
> > >     at /src/src-9/sys/i386/i386/trap.c:957
> > > #5  0xc07ce05a in trap (frame=0xd82e458c) at 
> > > /src/src-9/sys/i386/i386/trap.c:555
> > > #6  0xc07ba88c in calltrap () at /src/src-9/sys/i386/i386/exception.s:170
> > > #7  0xc07cb2fe in bcopy () at /src/src-9/sys/i386/i386/support.s:198
> > > #8  0xc072be13 in ffs_snapshot (mp=0xc2b35a90, snapfile=0xc2ed0400 
> > > "s5-2013.07.12-03.15.01")
> > >     at /src/src-9/sys/ufs/ffs/ffs_snapshot.c:793
> > > #9  0xc0748e8e in ffs_mount (mp=0xc2b35a90) at 
> > > /src/src-9/sys/ufs/ffs/ffs_vfsops.c:483
> > > #10 0xc068a72b in vfs_donmount (td=0xc2b06620, fsflags=271659272, 
> > > fsoptions=0xc2b74d80)
> > >     at /src/src-9/sys/kern/vfs_mount.c:948
> > > #11 0xc068a8e3 in sys_nmount (td=0xc2b06620, uap=0xd82e4ccc) at 
> > > /src/src-9/sys/kern/vfs_mount.c:417
> > > #12 0xc07cd7ae in syscall (frame=0xd82e4d08) at subr_syscall.c:135
> > > #13 0xc07ba8f1 in Xint0x80_syscall () at 
> > > /src/src-9/sys/i386/i386/exception.s:270
> > > #14 0x00000033 in ?? ()
> > > Previous frame inner to this frame (corrupt stack?)
> > 
> > Please show me the first 100 lines of the output of dumpfs(8) on the
> > filesystem where snapshot creation caused the panic.
> 
> OK, dumpfs /dev/stripe/p | head -100:
> 
> magic 11954 (UFS1)    time    Fri Jul 12 08:02:40 2013
> id    [ 517fa356 4ecc9335 ]
> ncg   82      size    17774144        blocks  17737399
> bsize 32768   shift   15      mask    0xffff8000
> fsize 4096    shift   12      mask    0xfffff000
> frag  8       shift   3       fsbtodb 3
> minfree       8%      optim   time    symlinklen 60
> maxbpg        4096    maxcontig 4     contigsumsize 4
> nbfree        1958555 ndir    695     nifree  1123668 nffree  5395
> cpg   1       bpg     27415   fpg     219320  ipg     13824
> nindir        8192    inopb   256     nspf    8       maxfilesize     
> 18016597801566207
> sbsize        4096    cgsize  32768   cgoffset 0      cgmask  0xffffffff
> csaddr        456     cssize  4096
> rotdelay 0ms  rps     60      trackskew 0     interleave 1
> nsect 1754560 npsect  1754560 spc     1754560
> sblkno        8       cblkno  16      iblkno  24      dblkno  456
> cgrotor       50      fmod    0       ronly   0       clean   0
> metaspace 0   avgfpdir 64     avgfilesize 16384
> flags soft-updates 
> fsmnt /palveli
> volname               swuid   0       providersize    17774144

UFS1, weird.

I believe I see the problem.  UFS1 superblock is not aligned on the
fs block boundary, and bcopy() call tried to do the full block copy.
In fact, when the snapshotting operation did not trap, you probably
get a data corruption in the unrelated buffer.

Please try the patch below.

diff --git a/sys/ufs/ffs/ffs_snapshot.c b/sys/ufs/ffs/ffs_snapshot.c
index ad157aa..c37706b 100644
--- a/sys/ufs/ffs/ffs_snapshot.c
+++ b/sys/ufs/ffs/ffs_snapshot.c
@@ -792,7 +792,7 @@ out1:
                brelse(nbp);
        } else {
                loc = blkoff(fs, fs->fs_sblockloc);
-               bcopy((char *)copy_fs, &nbp->b_data[loc], fs->fs_bsize);
+               bcopy((char *)copy_fs, &nbp->b_data[loc], (u_int)fs->fs_sbsize);
                bawrite(nbp);
        }
        /*

Attachment: pgp96PH9myxQl.pgp
Description: PGP signature

Reply via email to