Re: FS hang when creating snapshots on a UFS SU+J setup

2012-01-20 Thread Dale Scott
> First step in debugging is to find out if the problem is SU+J

> specific. To find out, turn off SU+J but leave SU. This change

> is done by running:

> 

> umount 

> tunefs -j disable 

> mount 

> cd 

> rm .sujournal

 

Success! Thanks Mr. McKusick.

 

I posted having this problem to the FreeBSD forum
http://forums.freebsd.org/showthread.php?t=25787, but wanted to emphasize
that in two VirtualBox VMs that were created in exactly the same way, the
dump issue didn't occur in the absolutely fresh FreeBSD-9.0 install (not
even portsnap yet), but it did occur in the system I had installed some
ports on (an Apache/MySQL/Python stack, a few additional GNU build tools,
and some other miscellaneous ports). I don't know if this means anything,
just hoping it might help - presumably SU+J would be a good thing.  ;)

 

Regards,

Dale

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FS hang when creating snapshots on a UFS SU+J setup

2012-01-12 Thread Gautam Mani
On Wed, Jan 11, 2012 at 11:12:35PM +0530, Gautam Mani wrote:
> 
> Do let me know if I can try something further.
> 
I reproduced this again and here is the core.txt crash summary if it
helps. 

http://pastebin.com/hTGMXX6A

Thanks
Gautam
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FS hang when creating snapshots on a UFS SU+J setup

2012-01-11 Thread Gautam Mani
On Wed, Jan 11, 2012 at 10:30:39AM +0100, Yamagi Burmeister wrote:
> Hello,
> I've done some tests to verify that the problem only occures when SU+J
> is used, but not SU without J. In fact, I did run the following two
> loops on different TTYs in parallel:

I also confirm this using a similar technique. The panic is only seen
with SU+J and not with just SU. 

I did a similar cp -R /root /var/tmp ; rm -rf /var/tmp/root and the
panic was trigger with dump -0L...
I got the panic (again in less than a minute of issuing the dump command)
-- I also got the "giving up on dirty" kind of message. 

I took a picture of the screen -- I am not sure if that helps!

http://picpaste.com/11012012519-LF0sWlpw.jpg

> Since it's much more likely that the problems described above arrise
> when the the filesystem is loaded (for example by the first loop) while
> taking the snapshot this looks like some kind of race condition or
> something like that. 
> 

Earlier I have seen this happen with dump without any high load -- or
atleast very minimum -- again with the /var because some logs were
written or cronjob was running writing to it. That didnt panic as I
indicated in my previous email -- hogged the CPU and forced a
power-cycle.

Do let me know if I can try something further.

Thanks
Gautam


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FS hang when creating snapshots on a UFS SU+J setup

2012-01-11 Thread Yamagi Burmeister
Hello,
I've done some tests to verify that the problem only occures when SU+J
is used, but not SU without J. In fact, I did run the following two
loops on different TTYs in parallel:

while 1
 cp -r /usr/src /root
 rm -Rf /root/src
end

while 1
 mksnap_ffs / /.snap/snap
 rm -f /.snap/snap
end

With SU without J the system survives this for at least 1 hour. But as
soon as SU+J is used it most likely deadlocks or even panics in the
first 1 or 2 minutes. What extactly happens seems to vary... In most
cases the system just deadlocks, sometimes like al...@bsdgate.org
descripes and sometimes it's completely unresponsive to any input. 
I've seen kernel messages like "fsync: giving up on dirty".

Several times the system paniced. In most cases printing the generic
"panic: page fault while in kernel mode" and one time printing 
"panic: snapacct_ufs2: bad block". I've never seen the same
backtrace twice. One time the system suddenly rebooted, like a tripple
fault or something like that happend.

Since it's much more likely that the problems described above arrise
when the the filesystem is loaded (for example by the first loop) while
taking the snapshot this looks like some kind of race condition or
something like that. 

Some more information from an older debug session can be found at:
http://deponie.yamagi.org/freebsd/debug/snapshots_panic/

On Tue, 10 Jan 2012 10:30:13 -0800
Kirk McKusick  wrote:

> > Date: Mon, 9 Jan 2012 18:30:51 +0100
> > From: Yamagi Burmeister 
> > To: j...@freebsd.org, mckus...@freebsd.org
> > Cc: freebsd-current@freebsd.org, br...@bryce.net
> > Subject: Re: FS hang when creating snapshots on a UFS SU+J setup
> > 
> > Hello,
> > 
> > I'm sorry to bother you, but you may not be aware of this thread and
> > this problem. We are several people experiencing deadlocks, kernel
> > panics and other problems when creating sanpshots on file systems
> > with SU+J. It would be nice to get some feedback, e.g. how can we
> > help debugging and / or fixing this problem.
> > 
> > Thank you,
> > Yamagi
> 
> First step in debugging is to find out if the problem is SU+J
> specific. To find out, turn off SU+J but leave SU. This change
> is done by running:
> 
>   umount 
>   tunefs -j disable 
>   mount 
>   cd 
>   rm .sujournal
> 
> You may want to run `fsck -f' on the filesystem while you have
> it unmounted just to be sure that it is clean. Then run your
> snapshot request to see if it still fails. If it works, then
> we have narrowed the problem down to something related to SU+J.
> If it fails then we have a broader issue to deal with.
> 
> If you wish to go back to using SU+J after the test, you can
> reenable SU+J by running:
> 
>   umount 
>   tunefs -j enable 
>   mount 
> 
> When responding to me, it is best to use my 
> email as I tend to read it more regularly.
> 
>   Kirk McKusick
> 


-- 
Homepage:  www.yamagi.org
XMPP:  yam...@yamagi.org
GnuPG/GPG: 0xEFBCCBCB


pgpCLdO5w7GlU.pgp
Description: PGP signature


Re: FS hang when creating snapshots on a UFS SU+J setup

2012-01-10 Thread alain




Le mar 10/01/12 19:30, "Kirk McKusick" mckus...@mckusick.com a écrit:
> > Date: Mon, 9 Jan 2012 18:30:51 +0100
> > From: Yamagi Burmeister  .org>
> To: jeff@freebsd
> .org, mckusick
> @freebsd.org
> Cc: f
> reebsd-curr...@freebsd.org, bryce@bryce.n
> et
> Subject: Re: FS hang when creating snapshots on a
> UFS SU+J setup
> 
> > Hello,
> > 
> > I'm sorry to bother you, but you may not be aware of
> this thread and
> this problem. We are several people experiencing
> deadlocks, kernel
> panics and other problems when creating sanpshots on
> file systems
> with SU+J. It would be nice to get some feedback,
> e.g. how can we
> help debugging and / or fixing this
> problem.
> 
> > Thank you,
> > Yamagi
> 
> First step in debugging is to find out if the problem is SU+J
> specific. To find out, turn off SU+J but leave SU. This change
> is done by running:
> 
> umount 
> tunefs -j disable 
> mount 
> cd 
> rm .sujournal
> 
> You may want to run `fsck -f' on the filesystem while you have
> it unmounted just to be sure that it is clean. Then run your
> snapshot request to see if it still fails. If it works, then
> we have narrowed the problem down to something related to SU+J.
> If it fails then we have a broader issue to deal with.
> 
> If you wish to go back to using SU+J after the test, you can
> reenable SU+J by running:
> 
> umount 
> tunefs -j enable 
> mount 
> 
> When responding to me, it is best to use my  k...@mckusick.com>
email as I tend to read it more regularly.
> 
> Kirk McKusick
> ___
> f
> reebsd-curr...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 
> 

Hi,

Agree that i wasn't disable journaling completely before doing a clean full 
FSCK.

Taking actions requested, i wasn't able to recover this race condition with SUJ.
but snapshot still OK with only SU :

So here are some few investigations i have taken: (Sorry being too long) 

This test system was freshly installed by ISO 9.0 RC1 (18 OCTOBER  / after the 
fix) and is csuped on 9_RELENG 
(40G avail) very basic setup, just dovecot running, on GENERIC.

Operations and results:

Since it's the rootfs:
Clean shutdown, boot single, disable SUJ, mount in RW and remove the .sujournal 
and the bad snapshots file,
clean halt.

I reboot in single again, then fsck_ufs -y /dev/ufs/ROOTFS
I got some very minor fixup with freeblock count wrong and summary information 
bad and BLK missing in bitmaps.

After a normal reboot, issue a successful snapshot without softupdate 
journaling just su.

I reboot in single again, and reactivate SUJ then reboot in normal mode.

Issue snapshot: and again mksnap_ffs eating all cpu, not suspendable, not 
killable.

So i try to figure out what's going on: with systat -v / gstat / top -SCHP
and  strace / truss / ktrace on ramfs and nfs for tracking mksanp_ffs:

Here some results:

gstat : 26 seconds intense io activity: like normal snapshot.
Bad spare snapshot file created ( UFS label (ROOTFS) not present and some 
garbage on the beginning.
real and sparse size of file 'very' near to a normal snapshot file.

Truss begin showing info then hang before being usefull.
mksnap_ffs is in running / runnable mode eating 100% cpu in kernel mode, 0% in 
user mode.
systat : hang
top still running correctly : 15 to 25 % CPU in interrupt SWI4 : CLOCK  ( CPU 2 
cores )

strace : only for i386 :-(
ktrace: block before showing valuable info, even on remote nfs.
regular process hanging on suspfs.

hard power cycle:

After normal reboot , after regular SUJ FIXUP:
Got Panic  at the login prompt: ( bg_fsck not started )

panic: ffs_sync: rofs mod ( it's physical machine , no screen shots, )

backtrace show ffs_write_suspend+0x...before the ffs_sync

So i retry to reboot with the 9 RC1 CD in live mode, disable suj, disable su, 
fsck, renable su, suj, 
mount the fs,without doing something on it, issue a snapshot ( still in live 
mode) , 
and this time, the snapshot was OK even with SUJ.

So i wrongly figure out that touching the root fs in single user is not as best 
as touching it with a live CD.

But after returning in normal operation, this race is still there.

After various tracking tests, and rebooting in normal mode after the SUJ 
standard recovery:

I sometime got a double panic after the login prompt

panic:ffs_blkfree_cg 
and just after the backtrace softdep_process_worklist ...
-> panic: bufwrite: bufwrite is not busy.

I also saw, when there is more io activity while taking snapshot, a kernel 
panic saying:

panic: softdep_deallocate_dependencies: dangling deps

Sure something wrong in this setup, because SUJ snapsho

Re: FS hang when creating snapshots on a UFS SU+J setup

2012-01-10 Thread Kirk McKusick
> Date: Mon, 9 Jan 2012 18:30:51 +0100
> From: Yamagi Burmeister 
> To: j...@freebsd.org, mckus...@freebsd.org
> Cc: freebsd-current@freebsd.org, br...@bryce.net
> Subject: Re: FS hang when creating snapshots on a UFS SU+J setup
> 
> Hello,
> 
> I'm sorry to bother you, but you may not be aware of this thread and
> this problem. We are several people experiencing deadlocks, kernel
> panics and other problems when creating sanpshots on file systems
> with SU+J. It would be nice to get some feedback, e.g. how can we
> help debugging and / or fixing this problem.
> 
> Thank you,
> Yamagi

First step in debugging is to find out if the problem is SU+J
specific. To find out, turn off SU+J but leave SU. This change
is done by running:

umount 
tunefs -j disable 
mount 
cd 
rm .sujournal

You may want to run `fsck -f' on the filesystem while you have
it unmounted just to be sure that it is clean. Then run your
snapshot request to see if it still fails. If it works, then
we have narrowed the problem down to something related to SU+J.
If it fails then we have a broader issue to deal with.

If you wish to go back to using SU+J after the test, you can
reenable SU+J by running:

umount 
tunefs -j enable 
mount 

When responding to me, it is best to use my 
email as I tend to read it more regularly.

Kirk McKusick
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FS hang when creating snapshots on a UFS SU+J setup

2012-01-09 Thread Yamagi Burmeister
Hello,
I'm sorry to bother you, but you may not be aware of this thread and
this problem. We are several people experiencing deadlocks, kernel 
panics and other problems when creating sanpshots on file systems
with SU+J. It would be nice to get some feedback, e.g. how can we
help debugging and / or fixing this problem.

Thank you,
Yamagi

On Mon, 2 Jan 2012 23:27:57 -0600
Bryce Edwards  wrote:

> I have a RELENG_9 machine that hangs when a snapshot is created on the
> root fs (UFS, with SU+J).  More accurately, all the processes show a
> state of "suspfs" (with ^T) and no fs activity is completed from then
> on.  A hard reboot (power cycle) was the only way to proceed.
> 
> Here's some reference info - let me know what else I should provide.
> 
> $uname -a
> FreeBSD xxx.xxx.net 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #0: Sun Dec
> 25 05:04:37 UTC 2011     r...@xxx.xxx.net:/usr/obj/usr/src/sys/GENERIC
> amd64
> 
> csup was run just before build[world|kernel] so you have reference on
> the version information.
> 
> $mount
> /dev/gpt/root on / (ufs, local, journaled soft-updates)
> devfs on /dev (devfs, local, multilabel)
> linprocfs on /compat/linux/proc (linprocfs, local)
> { zfs info removed }
> 
> $df -h
> Filesystem                  Size    Used   Avail Capacity  Mounted on
> /dev/gpt/root               454G    9.1G    409G     2%    /
> devfs                       1.0k    1.0k      0B   100%    /dev
> linprocfs                   4.0k    4.0k      0B   100%    /compat/linux/proc
> { zfs info removed }
> 
> After the hard reset, there was a snapshot file listed in /.snap and
> it was ~465 GB, iirc.  Unfortunately, I needed to get things going
> again so I was not able to debug or diagnose further.  I may be able
> to schedule a time that I could recreate the issue and diagnose
> better, but I wanted to get your input on what data points and/or
> command you would be interested in.
> 
> Thanks in advance,
> 
> Bryce
> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 


-- 
Homepage:  www.yamagi.org
XMPP:  yam...@yamagi.org
GnuPG/GPG: 0xEFBCCBCB


pgpjO4CjwHfj4.pgp
Description: PGP signature


Re: FS hang when creating snapshots on a UFS SU+J setup

2012-01-08 Thread Gautam Mani

On Tue, Jan 03, 2012 at 12:55:36PM +, Alain BRAUNER wrote:
> 
> May be i overlooked something but i can confirm the two precedents reports
> and PR kern/163310, i have the same freeze when trying to issue snapshot on 
> the 
> root fs when SUJ is ON.
> 

I confirm seeing this problem on my box. 

$ uname -a
FreeBSD linbox 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #0: Fri Dec 30
19:49:47 IST 2011 root@linbox:/usr/obj/usr/src/sys/GENERIC  i386

The source was csupped from 9-STABLE after the Christmas advisories, so
it doesnt have the commits after that. This is the GENERIC kernel.

> I never be able to create a snapshot when SUJ is activated.

In my case, I am trying to take a backup using dump, and I was able to
for e.g., take a backup of /, but failed with /var. Since I use tmux, I
know that mksnap_ffs had taken over the machine -- the box was only
slightly interactive -- I could type ps axl, but did not get any output.
CPU utilisation was at 100% and the only way I could get out of it was
to hit the reset button the the box. 

> 
> Also no problems when SUJ is disable.

+1, I have switched SUJ off and now just have SU on like in 8-STABLE,
and am seeing no problems with my backups. 

Is this a known issue with SUJ -- and is SUJ not yet ready to be used on
9-STABLE?

Cheers,
Gautam

> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FS hang when creating snapshots on a UFS SU+J setup

2012-01-03 Thread Alain BRAUNER
Bryce Edwards  bryce.net> writes:

> 
> I have a RELENG_9 machine that hangs when a snapshot is created on the
> root fs (UFS, with SU+J).  More accurately, all the processes show a
> state of "suspfs" (with ^T) and no fs activity is completed from then
> on.  A hard reboot (power cycle) was the only way to proceed.
> 
> Here's some reference info - let me know what else I should provide.
> 
> $uname -a
> FreeBSD xxx.xxx.net 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #0: Sun Dec
> 25 05:04:37 UTC 2011     root  xxx.xxx.net:/usr/obj/usr/src/sys/GENERIC
> amd64
> 
> csup was run just before build[world|kernel] so you have reference on
> the version information.
> 
> $mount
> /dev/gpt/root on / (ufs, local, journaled soft-updates)
> devfs on /dev (devfs, local, multilabel)
> linprocfs on /compat/linux/proc (linprocfs, local)
> { zfs info removed }
> 
> $df -h
> Filesystem                  Size    Used   Avail Capacity  Mounted on
> /dev/gpt/root               454G    9.1G    409G     2%    /
> devfs                       1.0k    1.0k      0B   100%    /dev
> linprocfs                   4.0k    4.0k      0B   100%    /compat/linux/proc
> { zfs info removed }
> 
> After the hard reset, there was a snapshot file listed in /.snap and
> it was ~465 GB, iirc.  Unfortunately, I needed to get things going
> again so I was not able to debug or diagnose further.  I may be able
> to schedule a time that I could recreate the issue and diagnose
> better, but I wanted to get your input on what data points and/or
> command you would be interested in.
> 
> Thanks in advance,
> 
> Bryce
> ___
> freebsd-current  freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe  
freebsd.org"
> 
> 


Hi,

May be i overlooked something but i can confirm the two precedents reports
and PR kern/163310, i have the same freeze when trying to issue snapshot on the 
root fs when SUJ is ON.

With 9-PRERELEASE and 10-CURRENT 

There was an old closed PR (may be or not) related to this PB:

http://www.freebsd.org/cgi/query-pr.cgi?pr=160662

I never be able to create a snapshot when SUJ is activated.

I use the STOCK GENERIC KERNEL ( System build form OFFICIAL RC ISO or from make 
world / no special make.conf)

This PB occurs on several hardware and also in VM under VBox4 

After the freeze i need to halt the system by pressing 5 seconds the power 
switch.
Sometimes, the SUJ recovery is not enough, i have a PANIC with DUP ALLOC
when i issue a full fsck -yf in single user, i got some files reconnected in 
lost+found and some rare recovery messages.

To reproduce:

Prior doing snapshot, i have fully checked with FSCK the integrity of the fs in 
single user mode.

And just issue : mksnap_ffs /.snap/backup
( dump -L may also suffer from this )

My setup: ( NO ZFS / 4 GB / CORE 2 DUO / SATA 7.2k in ahci mode) 

FreeBSD test.test.test 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #0: Sun Jan  1 
13:35:33 CET 2012 r...@test.test.test:/usr/obj/usr/src/sys/GENERIC  amd64

/dev/ufs/ROOTFS on / (ufs, local, journaled soft-updates)
devfs on /dev (devfs, local, multilabel)
fdescfs on /dev/fd (fdescfs)
procfs on /proc (procfs, local)

Notice that nearly no fs activity occurring while doing this snapshot.

Also no problems when SUJ is disable.

Anyway, thanks so much for your wonderful and heavy work.

It will be great to merge SUJ on 8.3 RELEASE when things got stable.

Best wishes of happiness and success for this new year !

Alain from Paris.
In love with FreeBSD since 386BSD 0.1 :-)



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FS hang when creating snapshots on a UFS SU+J setup

2012-01-03 Thread Yamagi Burmeister
Hi,
I've seen this too (and other problems with SU+J and snapshots) and was
able to reproduce it fairly easy. I wrote a PR:

 http://www.freebsd.org/cgi/query-pr.cgi?pr=163310 

Never received any feedback until now...

On Mon, 2 Jan 2012 23:27:57 -0600
Bryce Edwards  wrote:

> I have a RELENG_9 machine that hangs when a snapshot is created on the
> root fs (UFS, with SU+J).  More accurately, all the processes show a
> state of "suspfs" (with ^T) and no fs activity is completed from then
> on.  A hard reboot (power cycle) was the only way to proceed.
> 
> Here's some reference info - let me know what else I should provide.
> 
> $uname -a
> FreeBSD xxx.xxx.net 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #0: Sun Dec
> 25 05:04:37 UTC 2011     r...@xxx.xxx.net:/usr/obj/usr/src/sys/GENERIC
> amd64
> 
> csup was run just before build[world|kernel] so you have reference on
> the version information.
> 
> $mount
> /dev/gpt/root on / (ufs, local, journaled soft-updates)
> devfs on /dev (devfs, local, multilabel)
> linprocfs on /compat/linux/proc (linprocfs, local)
> { zfs info removed }
> 
> $df -h
> Filesystem                  Size    Used   Avail Capacity  Mounted on
> /dev/gpt/root               454G    9.1G    409G     2%    /
> devfs                       1.0k    1.0k      0B   100%    /dev
> linprocfs                   4.0k    4.0k      0B   100%    /compat/linux/proc
> { zfs info removed }
> 
> After the hard reset, there was a snapshot file listed in /.snap and
> it was ~465 GB, iirc.  Unfortunately, I needed to get things going
> again so I was not able to debug or diagnose further.  I may be able
> to schedule a time that I could recreate the issue and diagnose
> better, but I wanted to get your input on what data points and/or
> command you would be interested in.
> 
> Thanks in advance,
> 
> Bryce
> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 


-- 
Homepage:  www.yamagi.org
XMPP:  yam...@yamagi.org
GnuPG/GPG: 0xEFBCCBCB


pgpHfKniVQqN2.pgp
Description: PGP signature