Re: RELENG_5, snapshots and disk lock time

2005-04-01 Thread Kris Kennaway
On Thu, Mar 31, 2005 at 06:32:27PM -0800, Dave Knight wrote:

 That document also says:
 As is detailed in the operational information below, snapshots are 
 definitely alpha-test code and are NOT yet ready for production use.
 Is this the current opinion of snapshots ?

Not really, you just have to be aware of the inbuilt limitations, as
you are.

Kris


pgpjiMVMzx2q4.pgp
Description: PGP signature


Re: RELENG_5, snapshots and disk lock time

2005-03-31 Thread Dave Knight
On Mon, Mar 07, 2005 at 11:58:02AM -0500, Paul Mather wrote:
On Mon, 2005-03-07 at 15:21 +0300, Dmitry Morozovsky wrote:
 Dear colleagues,

 dumping the snapshot of 140G ufs2 fyle system under contemporary
 RELENG_5 I found that during mksnap_ffs file system is 
unresponsible  even for reading for more than 3 minutes (it's on 
modern SATA disk
 with 50+ MBps linear transfer).
 Is it normal?

 Oddly enough, this happened to me last night on a RELENG_5 system. In
 my case, things were so bad that mksnap_ffs appeared to wedge
 everything, meaning I'll have to make a trek in to where the machine
 is located and press the ol' reset button to get things going again. 
 :-(

I am investigating using snapshots for backup purposes and am running 
into similar difficulties, on a 1TB FS it takes over an hour to create
a snapshot, during which time an errant ls or two can lock up the 
system. Reading through list archives suggests that the the amount of 
time it takes to create the snapshot is not something that is going to 
go away and that the issue of an ls in the .snap directory during 
snapshot creation lacks a fix and that best current practise is 'try to 
avoid that'.

 Yes, this is normal.  See the documentation about the snapshots
 implementation (a README in the kernel source tree, I think, and paper
 written by Kirk).
That document also says:
As is detailed in the operational information below, snapshots are 
definitely alpha-test code and are NOT yet ready for production use.
Is this the current opinion of snapshots ?

 The machine in question makes and mounts snapshots of all its
 filesystems for backup each night via Tivoli TSM.  This has worked
 flawlessly for many months.  Last night, I had many BitTorrent
 sessions active on the filesystem that wedged.  I guess the activity
 broke the snapshot mechanism. :-(  The odd thing is that it survived
 the night before, when there were also BitTorrent sessions active.

 It's possible there are still deadlock conditions in the snapshot
 code.  Some familiarity with DDB would help to diagnose this (see the
 chapter on kernel debugging in the developers' handbook).  You'd need
 to work with Kirk to debug these, if you're willing.

 I wonder how much activity mksnap_ffs can take?

 I don't think this is the issue, directly.


signature.asc
Description: OpenPGP digital signature


RELENG_5, snapshots and disk lock time

2005-03-07 Thread Dmitry Morozovsky
Dear colleagues,

dumping the snapshot of 140G ufs2 fyle system under contemporary RELENG_5 I 
found that during mksnap_ffs file system is unresponsible even for reading for 
more than 3 minutes (it's on modern SATA disk with 50+ MBps linear transfer). 
Is it normal?

Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: RELENG_5, snapshots and disk lock time

2005-03-07 Thread Dmitry Morozovsky
On Mon, 7 Mar 2005, Xin LI wrote:

XL  dumping the snapshot of 140G ufs2 fyle system under contemporary RELENG_5 
I 
XL  found that during mksnap_ffs file system is unresponsible even for 
reading for 
XL  more than 3 minutes (it's on modern SATA disk with 50+ MBps linear 
transfer). 
XL  Is it normal?
XL 
XL mksnap_ffs is expected to suspend your write access, but I think 3
XL minutes is too long for a 140G file system.  Would you please send the
XL dumpfs output of the said file system?

Well, as I said, it was even for read access. I checked this with the simple 
shell script

#!/bin/sh

while true; do
sleep 5
date
ls /lh/.snap
done

when dump -L executes mksnap_ffs for /lh, there is 3:20 pause between dates.

dumpfs output is available at http://woozle.hole.ru/misc/dumpfs-lh.gz (83k)

Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: RELENG_5, snapshots and disk lock time

2005-03-07 Thread Dmitry Morozovsky
Following up to myself

DM XL  dumping the snapshot of 140G ufs2 fyle system under contemporary 
RELENG_5 I 
DM XL  found that during mksnap_ffs file system is unresponsible even for 
reading for 
DM XL  more than 3 minutes (it's on modern SATA disk with 50+ MBps linear 
transfer). 
DM XL  Is it normal?
DM XL 
DM XL mksnap_ffs is expected to suspend your write access, but I think 3
DM XL minutes is too long for a 140G file system.  Would you please send the
DM XL dumpfs output of the said file system?
DM 
DM Well, as I said, it was even for read access. I checked this with the 
simple 
DM shell script
DM 
DM #!/bin/sh
DM 
DM while true; do
DM sleep 5
DM date
DM ls /lh/.snap
DM done

It seems accessing snapshot directory is blocked for much more time than to 
other fs parts. Changing ls line to ``ls /lh'' leads to 

Mon Mar  7 18:02:58 MSK 2005
backup  homelocal   pgsql   ports   src
Mon Mar  7 18:03:41 MSK 2005

so file system has been locked for approx 45 seconds. This seems more 
reasonable, but still seems a bit too long for me.


Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: RELENG_5, snapshots and disk lock time

2005-03-07 Thread Xin LI
 2005-03-07 18:00 +0300Dmitry Morozovsky
 On Mon, 7 Mar 2005, Xin LI wrote:
 
 Well, as I said, it was even for read access. I checked this with the simple 
 shell script
 
 #!/bin/sh
 
 while true; do
   sleep 5
   date
   ls /lh/.snap
 done
 
 when dump -L executes mksnap_ffs for /lh, there is 3:20 pause between dates.
 
 dumpfs output is available at http://woozle.hole.ru/misc/dumpfs-lh.gz (83k)

The dumpfs output seems quite normal.  I'll try to figure out what was
happening tomorrow with some large volume equipment.

Cheers,
-- 
Xin LI delphij delphij net  http://www.delphij.net/


signature.asc
Description: 	=?UTF-8?Q?=E8=BF=99=E6=98=AF=E4=BF=A1=E4=BB=B6=E7=9A=84=E6=95=B0?=	=?UTF-8?Q?=E5=AD=97=E7=AD=BE=E5=90=8D=E9=83=A8?= =?UTF-8?Q?=E5=88=86?=


Re: RELENG_5, snapshots and disk lock time

2005-03-07 Thread Paul Mather
On Mon, 2005-03-07 at 15:21 +0300, Dmitry Morozovsky wrote:
 Dear colleagues,
 
 dumping the snapshot of 140G ufs2 fyle system under contemporary RELENG_5 I 
 found that during mksnap_ffs file system is unresponsible even for reading 
 for 
 more than 3 minutes (it's on modern SATA disk with 50+ MBps linear transfer). 
 Is it normal?

Oddly enough, this happened to me last night on a RELENG_5 system.  In
my case, things were so bad that mksnap_ffs appeared to wedge
everything, meaning I'll have to make a trek in to where the machine is
located and press the ol' reset button to get things going again. :-(

The machine in question makes and mounts snapshots of all its
filesystems for backup each night via Tivoli TSM.  This has worked
flawlessly for many months.  Last night, I had many BitTorrent sessions
active on the filesystem that wedged.  I guess the activity broke the
snapshot mechanism. :-(  The odd thing is that it survived the night
before, when there were also BitTorrent sessions active.

I wonder how much activity mksnap_ffs can take?

Cheers,

Paul.

PS: The problematic file system was not low on space, which could be an
issue for snapshot creation.
-- 
e-mail: [EMAIL PROTECTED]

Without music to decorate it, time is just a bunch of boring production
 deadlines or dates by which bills must be paid.
--- Frank Vincent Zappa
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: RELENG_5, snapshots and disk lock time

2005-03-07 Thread Kris Kennaway
On Mon, Mar 07, 2005 at 11:58:02AM -0500, Paul Mather wrote:
 On Mon, 2005-03-07 at 15:21 +0300, Dmitry Morozovsky wrote:
  Dear colleagues,
  
  dumping the snapshot of 140G ufs2 fyle system under contemporary RELENG_5 I 
  found that during mksnap_ffs file system is unresponsible even for reading 
  for 
  more than 3 minutes (it's on modern SATA disk with 50+ MBps linear 
  transfer). 
  Is it normal?
 
 Oddly enough, this happened to me last night on a RELENG_5 system.  In
 my case, things were so bad that mksnap_ffs appeared to wedge
 everything, meaning I'll have to make a trek in to where the machine is
 located and press the ol' reset button to get things going again. :-(

Yes, this is normal.  See the documentation about the snapshots
implementation (a README in the kernel source tree, I think, and paper
written by Kirk).

 The machine in question makes and mounts snapshots of all its
 filesystems for backup each night via Tivoli TSM.  This has worked
 flawlessly for many months.  Last night, I had many BitTorrent sessions
 active on the filesystem that wedged.  I guess the activity broke the
 snapshot mechanism. :-(  The odd thing is that it survived the night
 before, when there were also BitTorrent sessions active.

It's possible there are still deadlock conditions in the snapshot
code.  Some familiarity with DDB would help to diagnose this (see the
chapter on kernel debugging in the developers' handbook).  You'd need
to work with Kirk to debug these, if you're willing.

 I wonder how much activity mksnap_ffs can take?

I don't think this is the issue, directly.

Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]