Re: Stiil a regression with jails/IPv6/pf?

2013-09-02 Thread Tim Bishop
Hi,

On Mon, Sep 02, 2013 at 12:22:11PM +0200, Ruben van Staveren wrote:
 On 31 Aug 2013, at 21:49, Tim Bishop t...@bishnet.net wrote:
  This is regarding kern/170070 and these two threads from last year:
  
  http://lists.freebsd.org/pipermail/freebsd-stable/2012-July/068987.html
  http://lists.freebsd.org/pipermail/freebsd-stable/2012-August/069043.html
  
  I'm running stable/9 r255017 and I'm seeing the same issue, even with
  the fix Bjoern committed in r238876.
 
 This is still with modulate state in some rules that also hit ipv6
 traffic ?

No, I'm not using modulate state. Only keep state.

 It almost looks like doing this kind of traffic alteration is
 considered harmful for IPv6
 http://forums.freebsd.org/showthread.php?t=36595

So it doesn't look like that's the same problem. It's certainly similar
(IPv6 and pf), but doesn't involve the rdr rule or jails. IPv6 is
otherwise working fine through pf.

Tim.

 If that is the case, then this should be applicable only to ipv4
 traffic, without requiring specific knowledge from the user
 
  
  My setup is a dual stack one (IPv6 is done through an IPv4 tunnel) and
  the problem is only with IPv6. I have jails with both IPv4 and IPv6
  addresses, and I use pf to rdr certain ports to certain jails. With IPv6
  I'm seeing failed checksums on the packets coming back out of my system,
  both with UDP and TCP.
  
  If I connect over IPv6 to the jail host it works fine. If I connect over
  IPv6 to a jail directly (they have routable addresses, but I prefer them
  to all be masked behind the single jail host normally), it works fine.
  So the only failure case is when it goes through a rdr rule in pf.
  
  This system replaces a previous one running stable/8 which worked fine
  with the same pf config file.
  
  Has anyone got any suggestions on what I can do to fix this or to debug
  it further?
  
  Thanks,
  
  Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x6C226B37FDF38D55



pgpznON5LHBNL.pgp
Description: PGP signature


Stiil a regression with jails/IPv6/pf?

2013-08-31 Thread Tim Bishop
Hi all,

This is regarding kern/170070 and these two threads from last year:

http://lists.freebsd.org/pipermail/freebsd-stable/2012-July/068987.html
http://lists.freebsd.org/pipermail/freebsd-stable/2012-August/069043.html

I'm running stable/9 r255017 and I'm seeing the same issue, even with
the fix Bjoern committed in r238876.

My setup is a dual stack one (IPv6 is done through an IPv4 tunnel) and
the problem is only with IPv6. I have jails with both IPv4 and IPv6
addresses, and I use pf to rdr certain ports to certain jails. With IPv6
I'm seeing failed checksums on the packets coming back out of my system,
both with UDP and TCP.

If I connect over IPv6 to the jail host it works fine. If I connect over
IPv6 to a jail directly (they have routable addresses, but I prefer them
to all be masked behind the single jail host normally), it works fine.
So the only failure case is when it goes through a rdr rule in pf.

This system replaces a previous one running stable/8 which worked fine
with the same pf config file.

Has anyone got any suggestions on what I can do to fix this or to debug
it further?

Thanks,

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x6C226B37FDF38D55



pgpQUB3PcdTGf.pgp
Description: PGP signature


MFC misc/124164 (Add SHA-256/512 hash algorithm to crypt(3)) to stable/8?

2012-02-08 Thread Tim Bishop
Are there any committers willing to merge PR misc/124164 to stable/8
before the 8.3 release freeze? It's already in HEAD and stable/9 so it's
had some testing.

misc/124164 adds support for SHA256/512 to crypt(3). This is something
we make use of on Linux and FreeBSD 9, and it'd be great to have the
same support on FreeBSD 8.

http://www.freebsd.org/cgi/query-pr.cgi?pr=124164

SVN Revs: 220496 220497

I've tried markm@ already and had no response.

Thanks,

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.1R ZFS almost locking up system

2010-09-01 Thread Tim Bishop
On Tue, Aug 31, 2010 at 10:58:29AM -0500, Dan Nelson wrote:
 In the last episode (Aug 31), Tim Bishop said:
  On Sat, Aug 21, 2010 at 05:24:29PM -0500, Dan Nelson wrote:
   In the last episode (Aug 21), Tim Bishop said:
A few items from top, including zfskern:

  PID USERNAME  THR PRI NICE   SIZERES STATE   C   TIME   WCPU 
COMMAND
5 root4  -8- 0K60K zio-i  0  54:38  3.47% 
zfskern
91775 70  1  440 53040K 31144K tx-tx  1   2:11  0.00% 
postgres
39661 tdb 1  440 55776K 32968K tx-tx  0   0:39  0.00% mutt
14828 root1  470 14636K  1572K tx-tx  1   0:03  0.00% zfs
11188 root1  510 14636K  1572K tx-tx  0   0:03  0.00% zfs

At some point during this process my zfs snapshots have been failing to
complete:

root5  0.8  0.0 060  ??  DL7Aug10  54:43.83 
[zfskern]
root 8265  0.0  0.0 14636  1528  ??  D10:00AM   0:03.12 zfs 
snapshot -r po...@2010-08-21_10:00:01--1d
root11188  0.0  0.1 14636  1572  ??  D11:00AM   0:02.93 zfs 
snapshot -r po...@2010-08-21_11:00:01--1d
root14828  0.0  0.1 14636  1572  ??  D12:00PM   0:03.04 zfs 
snapshot -r po...@2010-08-21_12:00:00--1d
root17862  0.0  0.1 14636  1572  ??  D 1:00PM   0:01.96 zfs 
snapshot -r po...@2010-08-21_13:00:01--1d
root20986  0.0  0.1 14636  1572  ??  D 2:00PM   0:02.07 zfs 
snapshot -r po...@2010-08-21_14:00:01--1d
   
   procstat -k on some of these processes might help to pinpoint what part of
   the zfs code they're all waiting in.
  
  It happened again this Saturday (clearly something in the weekly
  periodic run is triggering the issue). procstat -kk shows the following
  for processes doing something zfs related (where zfs related means the
  string 'zfs' in the procstat -kk output):
  
  0 100084 kernel   zfs_vn_rele_task mi_switch+0x16f 
  sleepq_wait+0x42 _sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 
  fork_trampoline+0xe 
  5 100031 zfskern  arc_reclaim_thre mi_switch+0x16f 
  sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 
  fork_exit+0x118 fork_trampoline+0xe 
  5 100032 zfskern  l2arc_feed_threa mi_switch+0x16f 
  sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be 
  fork_exit+0x118 fork_trampoline+0xe 
  5 100085 zfskern  txg_thread_enter mi_switch+0x16f 
  sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 
  txg_quiesce_thread+0xb5 fork_exit+0x118 fork_trampoline+0xe 
  5 100086 zfskern  txg_thread_enter mi_switch+0x16f 
  sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dsl_pool_sync+0xea 
  spa_sync+0x355 txg_sync_thread+0x195 fork_exit+0x118 fork_trampoline+0xe 
 17 100040 syncer   -mi_switch+0x16f 
  sleepq_wait+0x42 _cv_wait+0x111 txg_wait_synced+0x7c zil_commit+0x416 
  zfs_sync+0xa6 sync_fsync+0x184 sync_vnode+0x16b sched_sync+0x1c9 
  fork_exit+0x118 fork_trampoline+0xe 
   2210 100156 syslogd  -mi_switch+0x16f 
  sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 
  VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 
  writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 
   3500 100177 syslogd  -mi_switch+0x16f 
  sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 
  VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 
  writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 
   3783 100056 syslogd  -mi_switch+0x16f 
  sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 
  VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 
  writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 
   4064 100165 mysqld   initial thread   mi_switch+0x16f 
  sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c 
  zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc 
  vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 closef+0x3b kern_close+0x14d 
  syscall+0x1e7 Xfast_syscall+0xe1 
   4441 100224 python2.6initial thread   mi_switch+0x16f 
  sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c 
  zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc 
  null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f 
  vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
    100227 python2.6initial thread   mi_switch+0x16f 
  sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c 
  zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc 
  null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f 
  vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
   4445 100228 python2.6initial thread   mi_switch+0x16f 
  sleepq_wait+0x42 _cv_wait+0x111

Re: 8.1R ZFS almost locking up system

2010-08-31 Thread Tim Bishop
On Sat, Aug 21, 2010 at 05:24:29PM -0500, Dan Nelson wrote:
 In the last episode (Aug 21), Tim Bishop said:
  A few items from top, including zfskern:
  
PID USERNAME  THR PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
  5 root4  -8- 0K60K zio-i  0  54:38  3.47% zfskern
  91775 70  1  440 53040K 31144K tx-tx  1   2:11  0.00% postgres
  39661 tdb 1  440 55776K 32968K tx-tx  0   0:39  0.00% mutt
  14828 root1  470 14636K  1572K tx-tx  1   0:03  0.00% zfs
  11188 root1  510 14636K  1572K tx-tx  0   0:03  0.00% zfs
  
  At some point during this process my zfs snapshots have been failing to
  complete:
  
  root5  0.8  0.0 060  ??  DL7Aug10  54:43.83 [zfskern]
  root 8265  0.0  0.0 14636  1528  ??  D10:00AM   0:03.12 zfs 
  snapshot -r po...@2010-08-21_10:00:01--1d
  root11188  0.0  0.1 14636  1572  ??  D11:00AM   0:02.93 zfs 
  snapshot -r po...@2010-08-21_11:00:01--1d
  root14828  0.0  0.1 14636  1572  ??  D12:00PM   0:03.04 zfs 
  snapshot -r po...@2010-08-21_12:00:00--1d
  root17862  0.0  0.1 14636  1572  ??  D 1:00PM   0:01.96 zfs 
  snapshot -r po...@2010-08-21_13:00:01--1d
  root20986  0.0  0.1 14636  1572  ??  D 2:00PM   0:02.07 zfs 
  snapshot -r po...@2010-08-21_14:00:01--1d
 
 procstat -k on some of these processes might help to pinpoint what part of
 the zfs code they're all waiting in.

It happened again this Saturday (clearly something in the weekly
periodic run is triggering the issue). procstat -kk shows the following
for processes doing something zfs related (where zfs related means the
string 'zfs' in the procstat -kk output):

0 100084 kernel   zfs_vn_rele_task mi_switch+0x16f sleepq_wait+0x42 
_sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 fork_trampoline+0xe 
5 100031 zfskern  arc_reclaim_thre mi_switch+0x16f 
sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 
fork_exit+0x118 fork_trampoline+0xe 
5 100032 zfskern  l2arc_feed_threa mi_switch+0x16f 
sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be 
fork_exit+0x118 fork_trampoline+0xe 
5 100085 zfskern  txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 
_cv_wait+0x111 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 fork_exit+0x118 
fork_trampoline+0xe 
5 100086 zfskern  txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 
_cv_wait+0x111 zio_wait+0x61 dsl_pool_sync+0xea spa_sync+0x355 
txg_sync_thread+0x195 fork_exit+0x118 fork_trampoline+0xe 
   17 100040 syncer   -mi_switch+0x16f sleepq_wait+0x42 
_cv_wait+0x111 txg_wait_synced+0x7c zil_commit+0x416 zfs_sync+0xa6 
sync_fsync+0x184 sync_vnode+0x16b sched_sync+0x1c9 fork_exit+0x118 
fork_trampoline+0xe 
 2210 100156 syslogd  -mi_switch+0x16f sleepq_wait+0x42 
_cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 
vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 
Xfast_syscall+0xe1 
 3500 100177 syslogd  -mi_switch+0x16f sleepq_wait+0x42 
_cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 
vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 
Xfast_syscall+0xe1 
 3783 100056 syslogd  -mi_switch+0x16f sleepq_wait+0x42 
_cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 
vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 
Xfast_syscall+0xe1 
 4064 100165 mysqld   initial thread   mi_switch+0x16f sleepq_wait+0x42 
_cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 
zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc vn_close+0xa1 
vn_closefile+0x5a _fdrop+0x23 closef+0x3b kern_close+0x14d syscall+0x1e7 
Xfast_syscall+0xe1 
 4441 100224 python2.6initial thread   mi_switch+0x16f sleepq_wait+0x42 
_cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 
zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc null_reclaim+0xbc 
vgonel+0x12e vrecycle+0x7d null_inactive+0x1f vinactive+0x6a vputx+0x1cc 
vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
  100227 python2.6initial thread   mi_switch+0x16f sleepq_wait+0x42 
_cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 
zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc null_reclaim+0xbc 
vgonel+0x12e vrecycle+0x7d null_inactive+0x1f vinactive+0x6a vputx+0x1cc 
vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
 4445 100228 python2.6initial thread   mi_switch+0x16f sleepq_wait+0x42 
_cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 
zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc null_reclaim+0xbc 
vgonel+0x12e vrecycle+0x7d null_inactive+0x1f vinactive+0x6a vputx+0x1cc 
vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
 4446 100229 python2.6initial thread

Re: 8.1R ZFS almost locking up system

2010-08-23 Thread Tim Bishop
On Sat, Aug 21, 2010 at 05:24:29PM -0500, Dan Nelson wrote:
 In the last episode (Aug 21), Tim Bishop said:
  I've had a problem on a FreeBSD 8.1R system for a few weeks. It seems
  that ZFS gets in to an almost unresponsive state. Last time it did it
  (two weeks ago) I couldn't even log in, although the system was up, this
  time I could manage a reboot but couldn't stop any applications (they
  were likely hanging on I/O).
 
 Could your pool be very close to full?  Zfs will throttle itself when it's
 almost out of disk space.  I know it's saved me from filling up my
 filesystems a couple times :)

It's not close to full, so I don't think that's the issue.

  A few items from top, including zfskern:
  
PID USERNAME  THR PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
  5 root4  -8- 0K60K zio-i  0  54:38  3.47% zfskern
  91775 70  1  440 53040K 31144K tx-tx  1   2:11  0.00% postgres
  39661 tdb 1  440 55776K 32968K tx-tx  0   0:39  0.00% mutt
  14828 root1  470 14636K  1572K tx-tx  1   0:03  0.00% zfs
  11188 root1  510 14636K  1572K tx-tx  0   0:03  0.00% zfs
  
  At some point during this process my zfs snapshots have been failing to
  complete:
  
  root5  0.8  0.0 060  ??  DL7Aug10  54:43.83 [zfskern]
  root 8265  0.0  0.0 14636  1528  ??  D10:00AM   0:03.12 zfs 
  snapshot -r po...@2010-08-21_10:00:01--1d
  root11188  0.0  0.1 14636  1572  ??  D11:00AM   0:02.93 zfs 
  snapshot -r po...@2010-08-21_11:00:01--1d
  root14828  0.0  0.1 14636  1572  ??  D12:00PM   0:03.04 zfs 
  snapshot -r po...@2010-08-21_12:00:00--1d
  root17862  0.0  0.1 14636  1572  ??  D 1:00PM   0:01.96 zfs 
  snapshot -r po...@2010-08-21_13:00:01--1d
  root20986  0.0  0.1 14636  1572  ??  D 2:00PM   0:02.07 zfs 
  snapshot -r po...@2010-08-21_14:00:01--1d
 
 procstat -k on some of these processes might help to pinpoint what part of
 the zfs code they're all waiting in.

I'll do that. Thanks for the pointer :-)

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.1R ZFS almost locking up system

2010-08-23 Thread Tim Bishop
On Tue, Aug 24, 2010 at 06:49:23AM +1000, Peter Jeremy wrote:
 On 2010-Aug-21 23:04:35 +0100, Tim Bishop t...@bishnet.net wrote:
 I've had a problem on a FreeBSD 8.1R system for a few weeks. It seems
 that ZFS gets in to an almost unresponsive state. Last time it did it
 (two weeks ago) I couldn't even log in, although the system was up, this
 time I could manage a reboot but couldn't stop any applications (they
 were likely hanging on I/O).
 
 Unless you have a ZFS-only system, it's possible you are running out
 of free memory (see the free entry in top(1) or 'systat -v') - in
 which case r211581 (and r211599 which fixes a mismerge) should help.
 Your very high kstat.zfs.misc.arcstats.memory_throttle_count suggests
 this is your problem.

Thanks. At the time I had a reasonable amount free (~450MB from 3GB),
but it had dropped lower than that at some points previously.

I'll take a closer look at that next time, and look at that patch (or
upgrade to 8-STABLE).

And the system has a UFS root, but all the apps/data are stored in ZFS.

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


8.1R ZFS almost locking up system

2010-08-21 Thread Tim Bishop
I've had a problem on a FreeBSD 8.1R system for a few weeks. It seems
that ZFS gets in to an almost unresponsive state. Last time it did it
(two weeks ago) I couldn't even log in, although the system was up, this
time I could manage a reboot but couldn't stop any applications (they
were likely hanging on I/O).

Here's some details I collected prior to reboot.

The zpool output, including iostat and gstat for the disks:

# zpool status
  pool: pool0
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
pool0   ONLINE   0 0 0
  mirrorONLINE   0 0 0
ad4s3   ONLINE   0 0 0
ad6s3   ONLINE   0 0 0

errors: No known data errors

# zpool iostat -v 5
...

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
pool0117G  16.7G248114   865K   269K
  mirror 117G  16.7G248114   865K   269K
ad4s3   -  - 43 56  2.47M   269K
ad6s3   -  - 39 56  2.41M   269K
--  -  -  -  -  -  -

# gstat
...
L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
1 48 48   30429.8  0  00.0   47.6| ad4
0 38 38   2406   10.5  0  00.0   39.5| ad6
0  0  0  00.0  0  00.00.0| ad4s1
0  0  0  00.0  0  00.00.0| ad4s2
1 48 48   30429.8  0  00.0   47.6| ad4s3
0  0  0  00.0  0  00.00.0| ad6s1
0  0  0  00.0  0  00.00.0| ad6s2
0 38 38   2406   11.8  0  00.0   44.4| ad6s3

I've seen this before when I've had poor ZFS performance. There's more
I/O on the disks than on the pool itself. It's not particularly busy
though.

A few items from top, including zfskern:

  PID USERNAME  THR PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
5 root4  -8- 0K60K zio-i  0  54:38  3.47% zfskern
91775 70  1  440 53040K 31144K tx-tx  1   2:11  0.00% postgres
39661 tdb 1  440 55776K 32968K tx-tx  0   0:39  0.00% mutt
14828 root1  470 14636K  1572K tx-tx  1   0:03  0.00% zfs
11188 root1  510 14636K  1572K tx-tx  0   0:03  0.00% zfs

At some point during this process my zfs snapshots have been failing to
complete:

root5  0.8  0.0 060  ??  DL7Aug10  54:43.83 [zfskern]
root 8265  0.0  0.0 14636  1528  ??  D10:00AM   0:03.12 zfs snapshot -r 
po...@2010-08-21_10:00:01--1d
root11188  0.0  0.1 14636  1572  ??  D11:00AM   0:02.93 zfs snapshot -r 
po...@2010-08-21_11:00:01--1d
root14828  0.0  0.1 14636  1572  ??  D12:00PM   0:03.04 zfs snapshot -r 
po...@2010-08-21_12:00:00--1d
root17862  0.0  0.1 14636  1572  ??  D 1:00PM   0:01.96 zfs snapshot -r 
po...@2010-08-21_13:00:01--1d
root20986  0.0  0.1 14636  1572  ??  D 2:00PM   0:02.07 zfs snapshot -r 
po...@2010-08-21_14:00:01--1d

It all seems to point at ZFS getting to the point of being almost
unresponsive. It's been exactly two weeks since the last time this
happened and therefore the last reboot, so it'll be interesting to see
if the same happens again after the same period of time.

I noticed this given in a few other ZFS related messages:

vfs.worklist_len: 15

I have attached all (hopefully) ZFS-related sysctl output.

Finally, the reboot log:

Aug 21 22:13:06 server kernel: Aug 21 22:13:06 server reboot: rebooted by tdb
Aug 21 22:19:47 server kernel: Waiting (max 60 seconds) for system process 
`vnlru' to stop...done
Aug 21 22:19:47 server kernel: Waiting (max 60 seconds) for system process 
`bufdaemon' to stop...
Aug 21 22:19:48 server kernel: done
Aug 21 22:19:48 server kernel: Waiting (max 60 seconds) for system process 
`syncer' to stop...
Aug 21 22:20:03 server kernel:
Aug 21 22:20:03 server kernel: Syncing disks, vnodes remaining...14
Aug 21 22:20:48 server kernel: timed out
Aug 21 22:21:55 server kernel: Waiting (max 60 seconds) for system process 
`vnlru' to stop...
Aug 21 22:22:39 server kernel: 1
Aug 21 22:22:55 server kernel: timed out
Aug 21 22:22:55 server kernel: Waiting (max 60 seconds) for system process 
`bufdaemon' to stop...

I've undoubtedly missed some important information, so please let me
know if there's anything more useful I can collect next time (I'm quite
sure it'll happen again).

Thanks,

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984
vfs.zfs.l2c_only_size: 0
vfs.zfs.mfu_ghost_data_lsize: 40245248
vfs.zfs.mfu_ghost_metadata_lsize: 87331328
vfs.zfs.mfu_ghost_size: 127576576
vfs.zfs.mfu_data_lsize: 99885056
vfs.zfs.mfu_metadata_lsize: 146944
vfs.zfs.mfu_size: 101330432
vfs.zfs.mru_ghost_data_lsize: 181200896
vfs.zfs.mru_ghost_metadata_lsize: 25819648

Re: System deadlock when using mksnap_ffs

2008-11-13 Thread Tim Bishop
Jeremy,

On Wed, Nov 12, 2008 at 08:42:00PM -0800, Jeremy Chadwick wrote:
 On Thu, Nov 13, 2008 at 12:41:02AM +, Tim Bishop wrote:
  On Wed, Nov 12, 2008 at 09:47:35PM +0200, Kostik Belousov wrote:
   On Wed, Nov 12, 2008 at 05:58:26PM +, Tim Bishop wrote:
I run the mksnap_ffs command to take the snapshot and some time later
the system completely freezes up:

paladin# cd /u2/.snap/
paladin# mksnap_ffs /u2 test.1
   
   You need to provide information described in the
   http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html
   and especially
   http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
  
  Ok, I've done that, and removed the patch that seemed to fix things.
  
  The first thing I notice after doing this on the console is that I can
  still ctrl+t the process:
  
  load: 0.14  cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k
  
  But the top and ps I left running on other ttys have all stopped
  responding.
 
 Then in my book, the patch didn't fix anything.  :-)  The system is
 still deadlocking; snapshot generation **should not** wedge the system
 hard like this.

You missed the part where I said I removed the patch. I did that so I
could provide details with it wedged.

I agree that there's still some fundamental speed issues with
snapshotting though. And I'm sure the FS itself will still be locked out
for a while during the snapshot. But with the patch at least the whole
thing doesn't lock up.

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


System deadlock when using mksnap_ffs

2008-11-12 Thread Tim Bishop
I've been playing around with snapshots lately but I've got a problem on
one of my servers running 7-STABLE amd64:

FreeBSD paladin 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #8: Mon Nov 10 20:49:51 
GMT 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PALADIN  amd64

I run the mksnap_ffs command to take the snapshot and some time later
the system completely freezes up:

paladin# cd /u2/.snap/
paladin# mksnap_ffs /u2 test.1

It only happens on this one filesystem, though, which might be to do
with its size. It's not over the 2TB marker, but it's pretty close. It's
also backed by a hardware RAID system, although a smaller filesystem on
the same RAID has no issues.

Filesystem  1K-blocks   Used Avail Capacity  Mounted on
/dev/da0s1a 2078881084 921821396 99074920248%/u2

To clarify completely freezes up: unresponsive to all services over
the network, except ping. On the console I can switch between the ttys,
but none of them respond. The only way out is to hit the reset button.

Any advice? I'm happy to help debug this further to get to the bottom of
it.

Thanks,

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: System deadlock when using mksnap_ffs

2008-11-12 Thread Tim Bishop
On Wed, Nov 12, 2008 at 05:58:26PM +, Tim Bishop wrote:
 I run the mksnap_ffs command to take the snapshot and some time later
 the system completely freezes up:
 
 paladin# cd /u2/.snap/
 paladin# mksnap_ffs /u2 test.1

Someone (not named because they choose not to reply to the list) gave me
the following patch:

--- sys/ufs/ffs/ffs_snapshot.c.orig Wed Mar 22 09:42:31 2006
+++ sys/ufs/ffs/ffs_snapshot.c  Mon Nov 20 14:59:13 2006
@@ -282,6 +282,8 @@ restart:
if (error)
goto out;
bawrite(nbp);
+   if (cg % 10 == 0)
+   ffs_syncvnode(vp, MNT_WAIT);
}
/*
 * Copy all the cylinder group maps. Although the
@@ -303,6 +305,8 @@ restart:
goto out;
error = cgaccount(cg, vp, nbp, 1);
bawrite(nbp);
+   if (cg % 10 == 0)
+   ffs_syncvnode(vp, MNT_WAIT);
if (error)
goto out;
}

With the description:

What can happen is on a big file system it will fill up the buffer
cache with I/O and then run out.  When the buffer cache fills up then no
more disk I/O can happen :-(  When you do a sync, it flushes that out to
disk so things don't hang.

It seems to work too. But it seems more like a workaround than a fix?

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: System deadlock when using mksnap_ffs

2008-11-12 Thread Tim Bishop
On Wed, Nov 12, 2008 at 08:10:50PM +0200, David Peall wrote:
  FreeBSD paladin 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #8: Mon Nov 10
  20:49:51 GMT 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PALADIN  amd64
  
  I run the mksnap_ffs command to take the snapshot and some time later
  the system completely freezes up:
 
 If the file system is UFS2 it's a known problem but should have been
 fixed.
 http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues
 
 ident /boot/kernel/kernel | grep subr_sleepqueue
 
 version should be greater than 1.39.2.3?

Yes it's UFS2, and yes it's greater than 1.39.2.3:

$FreeBSD: src/sys/kern/subr_sleepqueue.c,v 1.39.2.5 2008/09/16 20:01:57 jhb Exp 
$

Are you sure the problem referenced on that page is the same? It talks
about dog slow snapshotting, which I see on other filesystems and
machines. But in this particular case the system is dead, and does not
recover.

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: System deadlock when using mksnap_ffs

2008-11-12 Thread Tim Bishop
On Wed, Nov 12, 2008 at 09:47:35PM +0200, Kostik Belousov wrote:
 On Wed, Nov 12, 2008 at 05:58:26PM +, Tim Bishop wrote:
  I've been playing around with snapshots lately but I've got a problem on
  one of my servers running 7-STABLE amd64:
  
  FreeBSD paladin 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #8: Mon Nov 10 
  20:49:51 GMT 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PALADIN  amd64
  
  I run the mksnap_ffs command to take the snapshot and some time later
  the system completely freezes up:
  
  paladin# cd /u2/.snap/
  paladin# mksnap_ffs /u2 test.1
  
  It only happens on this one filesystem, though, which might be to do
  with its size. It's not over the 2TB marker, but it's pretty close. It's
  also backed by a hardware RAID system, although a smaller filesystem on
  the same RAID has no issues.
  
  Filesystem  1K-blocks   Used Avail Capacity  Mounted on
  /dev/da0s1a 2078881084 921821396 99074920248%/u2
  
  To clarify completely freezes up: unresponsive to all services over
  the network, except ping. On the console I can switch between the ttys,
  but none of them respond. The only way out is to hit the reset button.
 
 You need to provide information described in the
 http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html
 and especially
 http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html

Ok, I've done that, and removed the patch that seemed to fix things.

The first thing I notice after doing this on the console is that I can
still ctrl+t the process:

load: 0.14  cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k

But the top and ps I left running on other ttys have all stopped
responding.

Also the following kernel message came out:

Expensive timeout(9) function: 0x802ce380(0xff000677ca50) 
0.006121001 s

There is also still some disk I/O.

Dropping to ddb worked, but I don't have a serial console so I can't
paste the output.

ps shows mksnap_ffs in newbuf, as we already saw. A trace of mksnap_ffs
looks like this:

Tracing pid 2603 tid 100214 td 0xff0006a0e370
sched_switch() at sched_switch+0x2a1
mi_switch() at mi_switch+0x233
sleepq_switch() at sleepq_switch+0xe9
sleepq_wait() at sleepq_wait+0x44
_sleep() at _sleep+0x351
getnewbuf() at getnewbuf+0x2e1
getblk() at getblk+0x30d
setup_allocindir_phase2() at setup_allocindir_phase2+0x338
softdep_setup_allocindir_page() at softdep_setup_allocindir_page+0xa7
ffs_balloc_ufs2() at ffs_balloc_ufs2+0x121e
ffs_snapshot() at ffs_snapshot+0xc52
ffs_mount() at ffs_mount+0x735
vfs_donmount() at vfs_donmount+0xeb5
kernel_mount() at kernel_mount+0xa1
ffs_cmount() at ffs_cmount+0x92
mount() at mount+0x1cc
syscall() at syscall+0x1f6
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (21, FreeBSD ELF64, mount), rip = 0x80068636c, rsp = 
0x7fffe518, rbp = 0x8008447a0 ---

show pcpu shows cpuid 3 (quad core machine) in thread swi6: Giant taskq.
All the other cpus are idle.

show locks shows:

exclusive sleep mutex Giant r = 0 (0x806ae040) locked @ 
/usr/src/sys/kern/kern_intr.c:1087

There are two other locks shown by show all locks, one for sshd and one
for mysqld, both in kern/uipc_sockbuf.c.

show lockedvnods shows mksnap_ffs has a lock on da0s1a with ffs_vget at
the top of the stack.

Sorry for any typos. I'll sort out a serial cable if more is needed :-)

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: /etc/rc.d after cvsup yesterday?

2007-05-25 Thread Tim Bishop
On Fri, May 25, 2007 at 07:42:10AM -0500, JD Bronson wrote:
 I noticed after cvsup'ing the other day (6.2-stable) that /etc/rc.d 
 now has some issues:
 
 # rcorder /usr/src/etc/rc.d/*
 rcorder: requirement `zfs' in file `/usr/src/etc/rc.d/FILESYSTEMS' 
 has no providers.
 
 (removing 'zfs' in FILESYSTEMS fixes this)

Looks like that bit was MFCed in error. zfs only exists on CURRENT.

 and then this later on with rcorder:
 ..
 ...
 /usr/src/etc/rc.d/routed
 rcorder: Circular dependency on provision `mountcritremote' in file 
 `/usr/src/etc/rc.d/archdep'.
 
 
 So to fix this (for now) I just changed the REQUIRE from 
 'mountcritremote' to 'routed' and that seems to be OK for me.
 
 Anyone else notice this?

Not sure about that one though :-)

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: buildworld broken?

2007-05-24 Thread Tim Bishop
On Thu, May 24, 2007 at 11:57:46PM +0300, Abdullah Ibn Hamad Al-Marri wrote:
 /usr/src/sys/modules/procfs/../../fs/procfs/procfs_regs.c: In function
 `procfs_doprocregs':
 /usr/src/sys/modules/procfs/../../fs/procfs/procfs_regs.c:96: warning:
 implicit declaration of function `PROC_ASSERT_HELD'
 /usr/src/sys/modules/procfs/../../fs/procfs/procfs_regs.c:96: warning:
 nested extern declaration of `PROC_ASSERT_HELD'
 *** Error code 1

It looks like this has already been fixed. See Dag-Erling Smørgrav's
email a short while ago. Wait an hour or so, then update your sources
and try again.

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Kernel Dumps on to gmirror device?

2007-04-01 Thread Tim Bishop
I've got a 6.1 server that's panicing and I'd like to debug it. The
problem is that I can't get a kernel dump on to my gmirror device. It
looks like since 6.1 this has been supported (it says so in the release
notes). In my rc.conf I have:

dumpdev=/dev/mirror/gm0s1b$a

Which is my swap partition. On booting it says:

kernel dumps on /dev/mirror/gm0s1b

But when it panics it says:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x14
fault code  = supervisor write, page not present
instruction pointer = 0x20:0xc057ff9f
stack pointer   = 0x28:0xe2df3c44
frame pointer   = 0x28:0xe2df3c4c
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 13 (swi4: clock sio)
trap number = 12
panic: page fault
Uptime: 1d4h15m55s
GEOM_MIRROR: Device gm0: provider mirror/gm0 destroyed.
GEOM_MIRROR: Device gm0 destroyed.
Cannot dump. No dump device defined.
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...

I'm guessing it's because the GEOM_MIRROR device has been destroyed just
before it wants to dump? Any suggestions on a way forward to getting a
dump out?

Thanks,
Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984


pgp9T1dbCb1cm.pgp
Description: PGP signature


Re: vinum to gvinum help

2006-06-27 Thread Tim Bishop
On Mon, Jun 26, 2006 at 12:22:07PM -0400, Sven Willenberger wrote:
 I have an i386 system currently running 5.2.1-RELEASE with a vinum
 mirror array (2 drives comprising /usr ). I want to upgrade this to
 5.5-RELEASE which, if I understand correctly, no longer supports vinum
 arrays. Would simply chaning /boot/loader.conf to read gvinum_load
 instead of vinum_load work or would the geom layer prevent this from
 working properly? If not, is there a recommended way of upgrading a
 vinum array to a gvinum or gmirror array?

I did this upgrade not long ago (and later to 6.1). The process of
switching from vinum to gvinum is pretty easy, although the specifics
escape me now. Changing loader.conf and fstab are the main ones, but
assuming you have console access you should easily be able to fix
anything else that crops up.

As Mark said, there were shared library version changes between these
versions, so you'll end up needing to rebuild all your apps. I did this
with portupgrade, but I got everything updated and working on 5.2.1
first so I wouldn't have to worry about that during the upgrade.

A word of warning though. I'm currently left with no raid because
somewhere along the line vinum/gvinum corrupted the metadata. This
only happened after a disk failure and after switching to gvinum
(could be coincidence), but has left me looking elsewhere - gmirror
looks good - for a RAID system. See the archives of this list for
details - summary is that the kernel module locks up when loading.

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Problem with geom_vinum on 5.5 and 6.1

2006-05-31 Thread Tim Bishop
I'm running 6.1-RELEASE (and previously 5.5) with gvinum to mirror
two internal root disks.  At the time of this problem the second
disk is physically disconnected.

[with 5.5]

Whilst copying data off the first disk, from a gvinum volume, I had
a single disk error. This put the volume in the down state. I
rebooted the machine (probably not the best move in hindsight!),
and when booting it said the following:

ad0: 78167MB Maxtor 6Y080P0/YAR41BW0 [158816/16/63] at ata0-master UDMA133
GEOM_VINUM: subdisk swap.p1.s0 state change: down - stale
GEOM_VINUM: subdisk root.p1.s0 state change: down - stale
GEOM_VINUM: subdisk var.p1.s0 state change: down - stale
GEOM_VINUM: subdisk usr.p1.s0 state change: down - stale

And then completely hangs. I would have expected the gvinum volumes to
be unavailable and be given the choice of which root fs to mount.

I've currently got round this by booting a different kernel which
stops geom_vinum.ko from being loaded, and consequently allows me
to choose a root fs. The filesystems are now mounted directly from
/dev/ad0s1x.

If I do gvinum start in single user it locks up too.

[now with 6.1]

After an upgrade to 6.1 if I do gvinum start in single user I get the
same GEOM_VINUM lines as above, and gvinum hangs and becomes
uninterruptable. Unlike with 5.5, the machine is still vaguely
responsive and a ctrl+alt+del forces a reboot fine.

Maybe my configuration on disk is corrupt? or something like that?

Has anyone got any ideas, or should I maybe just start from scratch with
a new gvinum config?

Thanks,

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984

- End forwarded message -


Cheers,
Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


ataraid - RAID5 in RELENG_6?

2005-07-12 Thread Tim Bishop
I'm having a fiddle with RELENG_6 and while setting up a RAID1 system
disk I noticed that atacontrol now lets you create a RAID5 device.

I gave it a whirl and it seemed to work - I have a device I can use.
But is this working properly? I don't have a hardware raid card, just a
plain old SATA card.

Having searched the archives I noticed that Søren said it wasn't
handling parity. Maybe this was fixed?

Anyway - the bottom line is this. Can I create an entirely software
RAID5 setup using ataraid? I'm personally not finding gvinum to be that
polished...

Cheers,
Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]