Re: FS hang when creating snapshots on a UFS SU+J setup
> First step in debugging is to find out if the problem is SU+J > specific. To find out, turn off SU+J but leave SU. This change > is done by running: > > umount > tunefs -j disable > mount > cd > rm .sujournal Success! Thanks Mr. McKusick. I posted having this problem to the FreeBSD forum http://forums.freebsd.org/showthread.php?t=25787, but wanted to emphasize that in two VirtualBox VMs that were created in exactly the same way, the dump issue didn't occur in the absolutely fresh FreeBSD-9.0 install (not even portsnap yet), but it did occur in the system I had installed some ports on (an Apache/MySQL/Python stack, a few additional GNU build tools, and some other miscellaneous ports). I don't know if this means anything, just hoping it might help - presumably SU+J would be a good thing. ;) Regards, Dale ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: FS hang when creating snapshots on a UFS SU+J setup
On Wed, Jan 11, 2012 at 11:12:35PM +0530, Gautam Mani wrote: > > Do let me know if I can try something further. > I reproduced this again and here is the core.txt crash summary if it helps. http://pastebin.com/hTGMXX6A Thanks Gautam ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: FS hang when creating snapshots on a UFS SU+J setup
On Wed, Jan 11, 2012 at 10:30:39AM +0100, Yamagi Burmeister wrote: > Hello, > I've done some tests to verify that the problem only occures when SU+J > is used, but not SU without J. In fact, I did run the following two > loops on different TTYs in parallel: I also confirm this using a similar technique. The panic is only seen with SU+J and not with just SU. I did a similar cp -R /root /var/tmp ; rm -rf /var/tmp/root and the panic was trigger with dump -0L... I got the panic (again in less than a minute of issuing the dump command) -- I also got the "giving up on dirty" kind of message. I took a picture of the screen -- I am not sure if that helps! http://picpaste.com/11012012519-LF0sWlpw.jpg > Since it's much more likely that the problems described above arrise > when the the filesystem is loaded (for example by the first loop) while > taking the snapshot this looks like some kind of race condition or > something like that. > Earlier I have seen this happen with dump without any high load -- or atleast very minimum -- again with the /var because some logs were written or cronjob was running writing to it. That didnt panic as I indicated in my previous email -- hogged the CPU and forced a power-cycle. Do let me know if I can try something further. Thanks Gautam ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: FS hang when creating snapshots on a UFS SU+J setup
Hello, I've done some tests to verify that the problem only occures when SU+J is used, but not SU without J. In fact, I did run the following two loops on different TTYs in parallel: while 1 cp -r /usr/src /root rm -Rf /root/src end while 1 mksnap_ffs / /.snap/snap rm -f /.snap/snap end With SU without J the system survives this for at least 1 hour. But as soon as SU+J is used it most likely deadlocks or even panics in the first 1 or 2 minutes. What extactly happens seems to vary... In most cases the system just deadlocks, sometimes like al...@bsdgate.org descripes and sometimes it's completely unresponsive to any input. I've seen kernel messages like "fsync: giving up on dirty". Several times the system paniced. In most cases printing the generic "panic: page fault while in kernel mode" and one time printing "panic: snapacct_ufs2: bad block". I've never seen the same backtrace twice. One time the system suddenly rebooted, like a tripple fault or something like that happend. Since it's much more likely that the problems described above arrise when the the filesystem is loaded (for example by the first loop) while taking the snapshot this looks like some kind of race condition or something like that. Some more information from an older debug session can be found at: http://deponie.yamagi.org/freebsd/debug/snapshots_panic/ On Tue, 10 Jan 2012 10:30:13 -0800 Kirk McKusick wrote: > > Date: Mon, 9 Jan 2012 18:30:51 +0100 > > From: Yamagi Burmeister > > To: j...@freebsd.org, mckus...@freebsd.org > > Cc: freebsd-current@freebsd.org, br...@bryce.net > > Subject: Re: FS hang when creating snapshots on a UFS SU+J setup > > > > Hello, > > > > I'm sorry to bother you, but you may not be aware of this thread and > > this problem. We are several people experiencing deadlocks, kernel > > panics and other problems when creating sanpshots on file systems > > with SU+J. It would be nice to get some feedback, e.g. how can we > > help debugging and / or fixing this problem. > > > > Thank you, > > Yamagi > > First step in debugging is to find out if the problem is SU+J > specific. To find out, turn off SU+J but leave SU. This change > is done by running: > > umount > tunefs -j disable > mount > cd > rm .sujournal > > You may want to run `fsck -f' on the filesystem while you have > it unmounted just to be sure that it is clean. Then run your > snapshot request to see if it still fails. If it works, then > we have narrowed the problem down to something related to SU+J. > If it fails then we have a broader issue to deal with. > > If you wish to go back to using SU+J after the test, you can > reenable SU+J by running: > > umount > tunefs -j enable > mount > > When responding to me, it is best to use my > email as I tend to read it more regularly. > > Kirk McKusick > -- Homepage: www.yamagi.org XMPP: yam...@yamagi.org GnuPG/GPG: 0xEFBCCBCB pgpCLdO5w7GlU.pgp Description: PGP signature
Re: FS hang when creating snapshots on a UFS SU+J setup
Le mar 10/01/12 19:30, "Kirk McKusick" mckus...@mckusick.com a écrit: > > Date: Mon, 9 Jan 2012 18:30:51 +0100 > > From: Yamagi Burmeister .org> > To: jeff@freebsd > .org, mckusick > @freebsd.org > Cc: f > reebsd-curr...@freebsd.org, bryce@bryce.n > et > Subject: Re: FS hang when creating snapshots on a > UFS SU+J setup > > > Hello, > > > > I'm sorry to bother you, but you may not be aware of > this thread and > this problem. We are several people experiencing > deadlocks, kernel > panics and other problems when creating sanpshots on > file systems > with SU+J. It would be nice to get some feedback, > e.g. how can we > help debugging and / or fixing this > problem. > > > Thank you, > > Yamagi > > First step in debugging is to find out if the problem is SU+J > specific. To find out, turn off SU+J but leave SU. This change > is done by running: > > umount > tunefs -j disable > mount > cd > rm .sujournal > > You may want to run `fsck -f' on the filesystem while you have > it unmounted just to be sure that it is clean. Then run your > snapshot request to see if it still fails. If it works, then > we have narrowed the problem down to something related to SU+J. > If it fails then we have a broader issue to deal with. > > If you wish to go back to using SU+J after the test, you can > reenable SU+J by running: > > umount > tunefs -j enable > mount > > When responding to me, it is best to use my k...@mckusick.com> email as I tend to read it more regularly. > > Kirk McKusick > ___ > f > reebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > > Hi, Agree that i wasn't disable journaling completely before doing a clean full FSCK. Taking actions requested, i wasn't able to recover this race condition with SUJ. but snapshot still OK with only SU : So here are some few investigations i have taken: (Sorry being too long) This test system was freshly installed by ISO 9.0 RC1 (18 OCTOBER / after the fix) and is csuped on 9_RELENG (40G avail) very basic setup, just dovecot running, on GENERIC. Operations and results: Since it's the rootfs: Clean shutdown, boot single, disable SUJ, mount in RW and remove the .sujournal and the bad snapshots file, clean halt. I reboot in single again, then fsck_ufs -y /dev/ufs/ROOTFS I got some very minor fixup with freeblock count wrong and summary information bad and BLK missing in bitmaps. After a normal reboot, issue a successful snapshot without softupdate journaling just su. I reboot in single again, and reactivate SUJ then reboot in normal mode. Issue snapshot: and again mksnap_ffs eating all cpu, not suspendable, not killable. So i try to figure out what's going on: with systat -v / gstat / top -SCHP and strace / truss / ktrace on ramfs and nfs for tracking mksanp_ffs: Here some results: gstat : 26 seconds intense io activity: like normal snapshot. Bad spare snapshot file created ( UFS label (ROOTFS) not present and some garbage on the beginning. real and sparse size of file 'very' near to a normal snapshot file. Truss begin showing info then hang before being usefull. mksnap_ffs is in running / runnable mode eating 100% cpu in kernel mode, 0% in user mode. systat : hang top still running correctly : 15 to 25 % CPU in interrupt SWI4 : CLOCK ( CPU 2 cores ) strace : only for i386 :-( ktrace: block before showing valuable info, even on remote nfs. regular process hanging on suspfs. hard power cycle: After normal reboot , after regular SUJ FIXUP: Got Panic at the login prompt: ( bg_fsck not started ) panic: ffs_sync: rofs mod ( it's physical machine , no screen shots, ) backtrace show ffs_write_suspend+0x...before the ffs_sync So i retry to reboot with the 9 RC1 CD in live mode, disable suj, disable su, fsck, renable su, suj, mount the fs,without doing something on it, issue a snapshot ( still in live mode) , and this time, the snapshot was OK even with SUJ. So i wrongly figure out that touching the root fs in single user is not as best as touching it with a live CD. But after returning in normal operation, this race is still there. After various tracking tests, and rebooting in normal mode after the SUJ standard recovery: I sometime got a double panic after the login prompt panic:ffs_blkfree_cg and just after the backtrace softdep_process_worklist ... -> panic: bufwrite: bufwrite is not busy. I also saw, when there is more io activity while taking snapshot, a kernel panic saying: panic: softdep_deallocate_dependencies: dangling deps Sure something wrong in this setup, because SUJ snapsho
Re: FS hang when creating snapshots on a UFS SU+J setup
> Date: Mon, 9 Jan 2012 18:30:51 +0100 > From: Yamagi Burmeister > To: j...@freebsd.org, mckus...@freebsd.org > Cc: freebsd-current@freebsd.org, br...@bryce.net > Subject: Re: FS hang when creating snapshots on a UFS SU+J setup > > Hello, > > I'm sorry to bother you, but you may not be aware of this thread and > this problem. We are several people experiencing deadlocks, kernel > panics and other problems when creating sanpshots on file systems > with SU+J. It would be nice to get some feedback, e.g. how can we > help debugging and / or fixing this problem. > > Thank you, > Yamagi First step in debugging is to find out if the problem is SU+J specific. To find out, turn off SU+J but leave SU. This change is done by running: umount tunefs -j disable mount cd rm .sujournal You may want to run `fsck -f' on the filesystem while you have it unmounted just to be sure that it is clean. Then run your snapshot request to see if it still fails. If it works, then we have narrowed the problem down to something related to SU+J. If it fails then we have a broader issue to deal with. If you wish to go back to using SU+J after the test, you can reenable SU+J by running: umount tunefs -j enable mount When responding to me, it is best to use my email as I tend to read it more regularly. Kirk McKusick ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: FS hang when creating snapshots on a UFS SU+J setup
Hello, I'm sorry to bother you, but you may not be aware of this thread and this problem. We are several people experiencing deadlocks, kernel panics and other problems when creating sanpshots on file systems with SU+J. It would be nice to get some feedback, e.g. how can we help debugging and / or fixing this problem. Thank you, Yamagi On Mon, 2 Jan 2012 23:27:57 -0600 Bryce Edwards wrote: > I have a RELENG_9 machine that hangs when a snapshot is created on the > root fs (UFS, with SU+J). More accurately, all the processes show a > state of "suspfs" (with ^T) and no fs activity is completed from then > on. A hard reboot (power cycle) was the only way to proceed. > > Here's some reference info - let me know what else I should provide. > > $uname -a > FreeBSD xxx.xxx.net 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #0: Sun Dec > 25 05:04:37 UTC 2011 r...@xxx.xxx.net:/usr/obj/usr/src/sys/GENERIC > amd64 > > csup was run just before build[world|kernel] so you have reference on > the version information. > > $mount > /dev/gpt/root on / (ufs, local, journaled soft-updates) > devfs on /dev (devfs, local, multilabel) > linprocfs on /compat/linux/proc (linprocfs, local) > { zfs info removed } > > $df -h > Filesystem Size Used Avail Capacity Mounted on > /dev/gpt/root 454G 9.1G 409G 2% / > devfs 1.0k 1.0k 0B 100% /dev > linprocfs 4.0k 4.0k 0B 100% /compat/linux/proc > { zfs info removed } > > After the hard reset, there was a snapshot file listed in /.snap and > it was ~465 GB, iirc. Unfortunately, I needed to get things going > again so I was not able to debug or diagnose further. I may be able > to schedule a time that I could recreate the issue and diagnose > better, but I wanted to get your input on what data points and/or > command you would be interested in. > > Thanks in advance, > > Bryce > ___ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > -- Homepage: www.yamagi.org XMPP: yam...@yamagi.org GnuPG/GPG: 0xEFBCCBCB pgpjO4CjwHfj4.pgp Description: PGP signature
Re: FS hang when creating snapshots on a UFS SU+J setup
On Tue, Jan 03, 2012 at 12:55:36PM +, Alain BRAUNER wrote: > > May be i overlooked something but i can confirm the two precedents reports > and PR kern/163310, i have the same freeze when trying to issue snapshot on > the > root fs when SUJ is ON. > I confirm seeing this problem on my box. $ uname -a FreeBSD linbox 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #0: Fri Dec 30 19:49:47 IST 2011 root@linbox:/usr/obj/usr/src/sys/GENERIC i386 The source was csupped from 9-STABLE after the Christmas advisories, so it doesnt have the commits after that. This is the GENERIC kernel. > I never be able to create a snapshot when SUJ is activated. In my case, I am trying to take a backup using dump, and I was able to for e.g., take a backup of /, but failed with /var. Since I use tmux, I know that mksnap_ffs had taken over the machine -- the box was only slightly interactive -- I could type ps axl, but did not get any output. CPU utilisation was at 100% and the only way I could get out of it was to hit the reset button the the box. > > Also no problems when SUJ is disable. +1, I have switched SUJ off and now just have SU on like in 8-STABLE, and am seeing no problems with my backups. Is this a known issue with SUJ -- and is SUJ not yet ready to be used on 9-STABLE? Cheers, Gautam > ___ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: FS hang when creating snapshots on a UFS SU+J setup
Bryce Edwards bryce.net> writes: > > I have a RELENG_9 machine that hangs when a snapshot is created on the > root fs (UFS, with SU+J). More accurately, all the processes show a > state of "suspfs" (with ^T) and no fs activity is completed from then > on. A hard reboot (power cycle) was the only way to proceed. > > Here's some reference info - let me know what else I should provide. > > $uname -a > FreeBSD xxx.xxx.net 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #0: Sun Dec > 25 05:04:37 UTC 2011 root xxx.xxx.net:/usr/obj/usr/src/sys/GENERIC > amd64 > > csup was run just before build[world|kernel] so you have reference on > the version information. > > $mount > /dev/gpt/root on / (ufs, local, journaled soft-updates) > devfs on /dev (devfs, local, multilabel) > linprocfs on /compat/linux/proc (linprocfs, local) > { zfs info removed } > > $df -h > Filesystem Size Used Avail Capacity Mounted on > /dev/gpt/root 454G 9.1G 409G 2% / > devfs 1.0k 1.0k 0B 100% /dev > linprocfs 4.0k 4.0k 0B 100% /compat/linux/proc > { zfs info removed } > > After the hard reset, there was a snapshot file listed in /.snap and > it was ~465 GB, iirc. Unfortunately, I needed to get things going > again so I was not able to debug or diagnose further. I may be able > to schedule a time that I could recreate the issue and diagnose > better, but I wanted to get your input on what data points and/or > command you would be interested in. > > Thanks in advance, > > Bryce > ___ > freebsd-current freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe freebsd.org" > > Hi, May be i overlooked something but i can confirm the two precedents reports and PR kern/163310, i have the same freeze when trying to issue snapshot on the root fs when SUJ is ON. With 9-PRERELEASE and 10-CURRENT There was an old closed PR (may be or not) related to this PB: http://www.freebsd.org/cgi/query-pr.cgi?pr=160662 I never be able to create a snapshot when SUJ is activated. I use the STOCK GENERIC KERNEL ( System build form OFFICIAL RC ISO or from make world / no special make.conf) This PB occurs on several hardware and also in VM under VBox4 After the freeze i need to halt the system by pressing 5 seconds the power switch. Sometimes, the SUJ recovery is not enough, i have a PANIC with DUP ALLOC when i issue a full fsck -yf in single user, i got some files reconnected in lost+found and some rare recovery messages. To reproduce: Prior doing snapshot, i have fully checked with FSCK the integrity of the fs in single user mode. And just issue : mksnap_ffs /.snap/backup ( dump -L may also suffer from this ) My setup: ( NO ZFS / 4 GB / CORE 2 DUO / SATA 7.2k in ahci mode) FreeBSD test.test.test 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #0: Sun Jan 1 13:35:33 CET 2012 r...@test.test.test:/usr/obj/usr/src/sys/GENERIC amd64 /dev/ufs/ROOTFS on / (ufs, local, journaled soft-updates) devfs on /dev (devfs, local, multilabel) fdescfs on /dev/fd (fdescfs) procfs on /proc (procfs, local) Notice that nearly no fs activity occurring while doing this snapshot. Also no problems when SUJ is disable. Anyway, thanks so much for your wonderful and heavy work. It will be great to merge SUJ on 8.3 RELEASE when things got stable. Best wishes of happiness and success for this new year ! Alain from Paris. In love with FreeBSD since 386BSD 0.1 :-) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: FS hang when creating snapshots on a UFS SU+J setup
Hi, I've seen this too (and other problems with SU+J and snapshots) and was able to reproduce it fairly easy. I wrote a PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=163310 Never received any feedback until now... On Mon, 2 Jan 2012 23:27:57 -0600 Bryce Edwards wrote: > I have a RELENG_9 machine that hangs when a snapshot is created on the > root fs (UFS, with SU+J). More accurately, all the processes show a > state of "suspfs" (with ^T) and no fs activity is completed from then > on. A hard reboot (power cycle) was the only way to proceed. > > Here's some reference info - let me know what else I should provide. > > $uname -a > FreeBSD xxx.xxx.net 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #0: Sun Dec > 25 05:04:37 UTC 2011 r...@xxx.xxx.net:/usr/obj/usr/src/sys/GENERIC > amd64 > > csup was run just before build[world|kernel] so you have reference on > the version information. > > $mount > /dev/gpt/root on / (ufs, local, journaled soft-updates) > devfs on /dev (devfs, local, multilabel) > linprocfs on /compat/linux/proc (linprocfs, local) > { zfs info removed } > > $df -h > Filesystem Size Used Avail Capacity Mounted on > /dev/gpt/root 454G 9.1G 409G 2% / > devfs 1.0k 1.0k 0B 100% /dev > linprocfs 4.0k 4.0k 0B 100% /compat/linux/proc > { zfs info removed } > > After the hard reset, there was a snapshot file listed in /.snap and > it was ~465 GB, iirc. Unfortunately, I needed to get things going > again so I was not able to debug or diagnose further. I may be able > to schedule a time that I could recreate the issue and diagnose > better, but I wanted to get your input on what data points and/or > command you would be interested in. > > Thanks in advance, > > Bryce > ___ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > -- Homepage: www.yamagi.org XMPP: yam...@yamagi.org GnuPG/GPG: 0xEFBCCBCB pgpHfKniVQqN2.pgp Description: PGP signature