Re: Stiil a regression with jails/IPv6/pf?
Hi, On Mon, Sep 02, 2013 at 12:22:11PM +0200, Ruben van Staveren wrote: On 31 Aug 2013, at 21:49, Tim Bishop t...@bishnet.net wrote: This is regarding kern/170070 and these two threads from last year: http://lists.freebsd.org/pipermail/freebsd-stable/2012-July/068987.html http://lists.freebsd.org/pipermail/freebsd-stable/2012-August/069043.html I'm running stable/9 r255017 and I'm seeing the same issue, even with the fix Bjoern committed in r238876. This is still with modulate state in some rules that also hit ipv6 traffic ? No, I'm not using modulate state. Only keep state. It almost looks like doing this kind of traffic alteration is considered harmful for IPv6 http://forums.freebsd.org/showthread.php?t=36595 So it doesn't look like that's the same problem. It's certainly similar (IPv6 and pf), but doesn't involve the rdr rule or jails. IPv6 is otherwise working fine through pf. Tim. If that is the case, then this should be applicable only to ipv4 traffic, without requiring specific knowledge from the user My setup is a dual stack one (IPv6 is done through an IPv4 tunnel) and the problem is only with IPv6. I have jails with both IPv4 and IPv6 addresses, and I use pf to rdr certain ports to certain jails. With IPv6 I'm seeing failed checksums on the packets coming back out of my system, both with UDP and TCP. If I connect over IPv6 to the jail host it works fine. If I connect over IPv6 to a jail directly (they have routable addresses, but I prefer them to all be masked behind the single jail host normally), it works fine. So the only failure case is when it goes through a rdr rule in pf. This system replaces a previous one running stable/8 which worked fine with the same pf config file. Has anyone got any suggestions on what I can do to fix this or to debug it further? Thanks, Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x6C226B37FDF38D55 pgpznON5LHBNL.pgp Description: PGP signature
Stiil a regression with jails/IPv6/pf?
Hi all, This is regarding kern/170070 and these two threads from last year: http://lists.freebsd.org/pipermail/freebsd-stable/2012-July/068987.html http://lists.freebsd.org/pipermail/freebsd-stable/2012-August/069043.html I'm running stable/9 r255017 and I'm seeing the same issue, even with the fix Bjoern committed in r238876. My setup is a dual stack one (IPv6 is done through an IPv4 tunnel) and the problem is only with IPv6. I have jails with both IPv4 and IPv6 addresses, and I use pf to rdr certain ports to certain jails. With IPv6 I'm seeing failed checksums on the packets coming back out of my system, both with UDP and TCP. If I connect over IPv6 to the jail host it works fine. If I connect over IPv6 to a jail directly (they have routable addresses, but I prefer them to all be masked behind the single jail host normally), it works fine. So the only failure case is when it goes through a rdr rule in pf. This system replaces a previous one running stable/8 which worked fine with the same pf config file. Has anyone got any suggestions on what I can do to fix this or to debug it further? Thanks, Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x6C226B37FDF38D55 pgpQUB3PcdTGf.pgp Description: PGP signature
MFC misc/124164 (Add SHA-256/512 hash algorithm to crypt(3)) to stable/8?
Are there any committers willing to merge PR misc/124164 to stable/8 before the 8.3 release freeze? It's already in HEAD and stable/9 so it's had some testing. misc/124164 adds support for SHA256/512 to crypt(3). This is something we make use of on Linux and FreeBSD 9, and it'd be great to have the same support on FreeBSD 8. http://www.freebsd.org/cgi/query-pr.cgi?pr=124164 SVN Revs: 220496 220497 I've tried markm@ already and had no response. Thanks, Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.1R ZFS almost locking up system
On Tue, Aug 31, 2010 at 10:58:29AM -0500, Dan Nelson wrote: In the last episode (Aug 31), Tim Bishop said: On Sat, Aug 21, 2010 at 05:24:29PM -0500, Dan Nelson wrote: In the last episode (Aug 21), Tim Bishop said: A few items from top, including zfskern: PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 5 root4 -8- 0K60K zio-i 0 54:38 3.47% zfskern 91775 70 1 440 53040K 31144K tx-tx 1 2:11 0.00% postgres 39661 tdb 1 440 55776K 32968K tx-tx 0 0:39 0.00% mutt 14828 root1 470 14636K 1572K tx-tx 1 0:03 0.00% zfs 11188 root1 510 14636K 1572K tx-tx 0 0:03 0.00% zfs At some point during this process my zfs snapshots have been failing to complete: root5 0.8 0.0 060 ?? DL7Aug10 54:43.83 [zfskern] root 8265 0.0 0.0 14636 1528 ?? D10:00AM 0:03.12 zfs snapshot -r po...@2010-08-21_10:00:01--1d root11188 0.0 0.1 14636 1572 ?? D11:00AM 0:02.93 zfs snapshot -r po...@2010-08-21_11:00:01--1d root14828 0.0 0.1 14636 1572 ?? D12:00PM 0:03.04 zfs snapshot -r po...@2010-08-21_12:00:00--1d root17862 0.0 0.1 14636 1572 ?? D 1:00PM 0:01.96 zfs snapshot -r po...@2010-08-21_13:00:01--1d root20986 0.0 0.1 14636 1572 ?? D 2:00PM 0:02.07 zfs snapshot -r po...@2010-08-21_14:00:01--1d procstat -k on some of these processes might help to pinpoint what part of the zfs code they're all waiting in. It happened again this Saturday (clearly something in the weekly periodic run is triggering the issue). procstat -kk shows the following for processes doing something zfs related (where zfs related means the string 'zfs' in the procstat -kk output): 0 100084 kernel zfs_vn_rele_task mi_switch+0x16f sleepq_wait+0x42 _sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 fork_trampoline+0xe 5 100031 zfskern arc_reclaim_thre mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 fork_exit+0x118 fork_trampoline+0xe 5 100032 zfskern l2arc_feed_threa mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be fork_exit+0x118 fork_trampoline+0xe 5 100085 zfskern txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 fork_exit+0x118 fork_trampoline+0xe 5 100086 zfskern txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dsl_pool_sync+0xea spa_sync+0x355 txg_sync_thread+0x195 fork_exit+0x118 fork_trampoline+0xe 17 100040 syncer -mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_synced+0x7c zil_commit+0x416 zfs_sync+0xa6 sync_fsync+0x184 sync_vnode+0x16b sched_sync+0x1c9 fork_exit+0x118 fork_trampoline+0xe 2210 100156 syslogd -mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 3500 100177 syslogd -mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 3783 100056 syslogd -mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 4064 100165 mysqld initial thread mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 closef+0x3b kern_close+0x14d syscall+0x1e7 Xfast_syscall+0xe1 4441 100224 python2.6initial thread mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 100227 python2.6initial thread mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 4445 100228 python2.6initial thread mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111
Re: 8.1R ZFS almost locking up system
On Sat, Aug 21, 2010 at 05:24:29PM -0500, Dan Nelson wrote: In the last episode (Aug 21), Tim Bishop said: A few items from top, including zfskern: PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 5 root4 -8- 0K60K zio-i 0 54:38 3.47% zfskern 91775 70 1 440 53040K 31144K tx-tx 1 2:11 0.00% postgres 39661 tdb 1 440 55776K 32968K tx-tx 0 0:39 0.00% mutt 14828 root1 470 14636K 1572K tx-tx 1 0:03 0.00% zfs 11188 root1 510 14636K 1572K tx-tx 0 0:03 0.00% zfs At some point during this process my zfs snapshots have been failing to complete: root5 0.8 0.0 060 ?? DL7Aug10 54:43.83 [zfskern] root 8265 0.0 0.0 14636 1528 ?? D10:00AM 0:03.12 zfs snapshot -r po...@2010-08-21_10:00:01--1d root11188 0.0 0.1 14636 1572 ?? D11:00AM 0:02.93 zfs snapshot -r po...@2010-08-21_11:00:01--1d root14828 0.0 0.1 14636 1572 ?? D12:00PM 0:03.04 zfs snapshot -r po...@2010-08-21_12:00:00--1d root17862 0.0 0.1 14636 1572 ?? D 1:00PM 0:01.96 zfs snapshot -r po...@2010-08-21_13:00:01--1d root20986 0.0 0.1 14636 1572 ?? D 2:00PM 0:02.07 zfs snapshot -r po...@2010-08-21_14:00:01--1d procstat -k on some of these processes might help to pinpoint what part of the zfs code they're all waiting in. It happened again this Saturday (clearly something in the weekly periodic run is triggering the issue). procstat -kk shows the following for processes doing something zfs related (where zfs related means the string 'zfs' in the procstat -kk output): 0 100084 kernel zfs_vn_rele_task mi_switch+0x16f sleepq_wait+0x42 _sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 fork_trampoline+0xe 5 100031 zfskern arc_reclaim_thre mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 fork_exit+0x118 fork_trampoline+0xe 5 100032 zfskern l2arc_feed_threa mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be fork_exit+0x118 fork_trampoline+0xe 5 100085 zfskern txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 fork_exit+0x118 fork_trampoline+0xe 5 100086 zfskern txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dsl_pool_sync+0xea spa_sync+0x355 txg_sync_thread+0x195 fork_exit+0x118 fork_trampoline+0xe 17 100040 syncer -mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_synced+0x7c zil_commit+0x416 zfs_sync+0xa6 sync_fsync+0x184 sync_vnode+0x16b sched_sync+0x1c9 fork_exit+0x118 fork_trampoline+0xe 2210 100156 syslogd -mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 3500 100177 syslogd -mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 3783 100056 syslogd -mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 4064 100165 mysqld initial thread mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 closef+0x3b kern_close+0x14d syscall+0x1e7 Xfast_syscall+0xe1 4441 100224 python2.6initial thread mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 100227 python2.6initial thread mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 4445 100228 python2.6initial thread mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 4446 100229 python2.6initial thread
Re: 8.1R ZFS almost locking up system
On Sat, Aug 21, 2010 at 05:24:29PM -0500, Dan Nelson wrote: In the last episode (Aug 21), Tim Bishop said: I've had a problem on a FreeBSD 8.1R system for a few weeks. It seems that ZFS gets in to an almost unresponsive state. Last time it did it (two weeks ago) I couldn't even log in, although the system was up, this time I could manage a reboot but couldn't stop any applications (they were likely hanging on I/O). Could your pool be very close to full? Zfs will throttle itself when it's almost out of disk space. I know it's saved me from filling up my filesystems a couple times :) It's not close to full, so I don't think that's the issue. A few items from top, including zfskern: PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 5 root4 -8- 0K60K zio-i 0 54:38 3.47% zfskern 91775 70 1 440 53040K 31144K tx-tx 1 2:11 0.00% postgres 39661 tdb 1 440 55776K 32968K tx-tx 0 0:39 0.00% mutt 14828 root1 470 14636K 1572K tx-tx 1 0:03 0.00% zfs 11188 root1 510 14636K 1572K tx-tx 0 0:03 0.00% zfs At some point during this process my zfs snapshots have been failing to complete: root5 0.8 0.0 060 ?? DL7Aug10 54:43.83 [zfskern] root 8265 0.0 0.0 14636 1528 ?? D10:00AM 0:03.12 zfs snapshot -r po...@2010-08-21_10:00:01--1d root11188 0.0 0.1 14636 1572 ?? D11:00AM 0:02.93 zfs snapshot -r po...@2010-08-21_11:00:01--1d root14828 0.0 0.1 14636 1572 ?? D12:00PM 0:03.04 zfs snapshot -r po...@2010-08-21_12:00:00--1d root17862 0.0 0.1 14636 1572 ?? D 1:00PM 0:01.96 zfs snapshot -r po...@2010-08-21_13:00:01--1d root20986 0.0 0.1 14636 1572 ?? D 2:00PM 0:02.07 zfs snapshot -r po...@2010-08-21_14:00:01--1d procstat -k on some of these processes might help to pinpoint what part of the zfs code they're all waiting in. I'll do that. Thanks for the pointer :-) Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.1R ZFS almost locking up system
On Tue, Aug 24, 2010 at 06:49:23AM +1000, Peter Jeremy wrote: On 2010-Aug-21 23:04:35 +0100, Tim Bishop t...@bishnet.net wrote: I've had a problem on a FreeBSD 8.1R system for a few weeks. It seems that ZFS gets in to an almost unresponsive state. Last time it did it (two weeks ago) I couldn't even log in, although the system was up, this time I could manage a reboot but couldn't stop any applications (they were likely hanging on I/O). Unless you have a ZFS-only system, it's possible you are running out of free memory (see the free entry in top(1) or 'systat -v') - in which case r211581 (and r211599 which fixes a mismerge) should help. Your very high kstat.zfs.misc.arcstats.memory_throttle_count suggests this is your problem. Thanks. At the time I had a reasonable amount free (~450MB from 3GB), but it had dropped lower than that at some points previously. I'll take a closer look at that next time, and look at that patch (or upgrade to 8-STABLE). And the system has a UFS root, but all the apps/data are stored in ZFS. Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
8.1R ZFS almost locking up system
I've had a problem on a FreeBSD 8.1R system for a few weeks. It seems that ZFS gets in to an almost unresponsive state. Last time it did it (two weeks ago) I couldn't even log in, although the system was up, this time I could manage a reboot but couldn't stop any applications (they were likely hanging on I/O). Here's some details I collected prior to reboot. The zpool output, including iostat and gstat for the disks: # zpool status pool: pool0 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM pool0 ONLINE 0 0 0 mirrorONLINE 0 0 0 ad4s3 ONLINE 0 0 0 ad6s3 ONLINE 0 0 0 errors: No known data errors # zpool iostat -v 5 ... capacity operationsbandwidth pool used avail read write read write -- - - - - - - pool0117G 16.7G248114 865K 269K mirror 117G 16.7G248114 865K 269K ad4s3 - - 43 56 2.47M 269K ad6s3 - - 39 56 2.41M 269K -- - - - - - - # gstat ... L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 1 48 48 30429.8 0 00.0 47.6| ad4 0 38 38 2406 10.5 0 00.0 39.5| ad6 0 0 0 00.0 0 00.00.0| ad4s1 0 0 0 00.0 0 00.00.0| ad4s2 1 48 48 30429.8 0 00.0 47.6| ad4s3 0 0 0 00.0 0 00.00.0| ad6s1 0 0 0 00.0 0 00.00.0| ad6s2 0 38 38 2406 11.8 0 00.0 44.4| ad6s3 I've seen this before when I've had poor ZFS performance. There's more I/O on the disks than on the pool itself. It's not particularly busy though. A few items from top, including zfskern: PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 5 root4 -8- 0K60K zio-i 0 54:38 3.47% zfskern 91775 70 1 440 53040K 31144K tx-tx 1 2:11 0.00% postgres 39661 tdb 1 440 55776K 32968K tx-tx 0 0:39 0.00% mutt 14828 root1 470 14636K 1572K tx-tx 1 0:03 0.00% zfs 11188 root1 510 14636K 1572K tx-tx 0 0:03 0.00% zfs At some point during this process my zfs snapshots have been failing to complete: root5 0.8 0.0 060 ?? DL7Aug10 54:43.83 [zfskern] root 8265 0.0 0.0 14636 1528 ?? D10:00AM 0:03.12 zfs snapshot -r po...@2010-08-21_10:00:01--1d root11188 0.0 0.1 14636 1572 ?? D11:00AM 0:02.93 zfs snapshot -r po...@2010-08-21_11:00:01--1d root14828 0.0 0.1 14636 1572 ?? D12:00PM 0:03.04 zfs snapshot -r po...@2010-08-21_12:00:00--1d root17862 0.0 0.1 14636 1572 ?? D 1:00PM 0:01.96 zfs snapshot -r po...@2010-08-21_13:00:01--1d root20986 0.0 0.1 14636 1572 ?? D 2:00PM 0:02.07 zfs snapshot -r po...@2010-08-21_14:00:01--1d It all seems to point at ZFS getting to the point of being almost unresponsive. It's been exactly two weeks since the last time this happened and therefore the last reboot, so it'll be interesting to see if the same happens again after the same period of time. I noticed this given in a few other ZFS related messages: vfs.worklist_len: 15 I have attached all (hopefully) ZFS-related sysctl output. Finally, the reboot log: Aug 21 22:13:06 server kernel: Aug 21 22:13:06 server reboot: rebooted by tdb Aug 21 22:19:47 server kernel: Waiting (max 60 seconds) for system process `vnlru' to stop...done Aug 21 22:19:47 server kernel: Waiting (max 60 seconds) for system process `bufdaemon' to stop... Aug 21 22:19:48 server kernel: done Aug 21 22:19:48 server kernel: Waiting (max 60 seconds) for system process `syncer' to stop... Aug 21 22:20:03 server kernel: Aug 21 22:20:03 server kernel: Syncing disks, vnodes remaining...14 Aug 21 22:20:48 server kernel: timed out Aug 21 22:21:55 server kernel: Waiting (max 60 seconds) for system process `vnlru' to stop... Aug 21 22:22:39 server kernel: 1 Aug 21 22:22:55 server kernel: timed out Aug 21 22:22:55 server kernel: Waiting (max 60 seconds) for system process `bufdaemon' to stop... I've undoubtedly missed some important information, so please let me know if there's anything more useful I can collect next time (I'm quite sure it'll happen again). Thanks, Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 vfs.zfs.l2c_only_size: 0 vfs.zfs.mfu_ghost_data_lsize: 40245248 vfs.zfs.mfu_ghost_metadata_lsize: 87331328 vfs.zfs.mfu_ghost_size: 127576576 vfs.zfs.mfu_data_lsize: 99885056 vfs.zfs.mfu_metadata_lsize: 146944 vfs.zfs.mfu_size: 101330432 vfs.zfs.mru_ghost_data_lsize: 181200896 vfs.zfs.mru_ghost_metadata_lsize: 25819648
Re: System deadlock when using mksnap_ffs
Jeremy, On Wed, Nov 12, 2008 at 08:42:00PM -0800, Jeremy Chadwick wrote: On Thu, Nov 13, 2008 at 12:41:02AM +, Tim Bishop wrote: On Wed, Nov 12, 2008 at 09:47:35PM +0200, Kostik Belousov wrote: On Wed, Nov 12, 2008 at 05:58:26PM +, Tim Bishop wrote: I run the mksnap_ffs command to take the snapshot and some time later the system completely freezes up: paladin# cd /u2/.snap/ paladin# mksnap_ffs /u2 test.1 You need to provide information described in the http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html and especially http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Ok, I've done that, and removed the patch that seemed to fix things. The first thing I notice after doing this on the console is that I can still ctrl+t the process: load: 0.14 cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k But the top and ps I left running on other ttys have all stopped responding. Then in my book, the patch didn't fix anything. :-) The system is still deadlocking; snapshot generation **should not** wedge the system hard like this. You missed the part where I said I removed the patch. I did that so I could provide details with it wedged. I agree that there's still some fundamental speed issues with snapshotting though. And I'm sure the FS itself will still be locked out for a while during the snapshot. But with the patch at least the whole thing doesn't lock up. Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
System deadlock when using mksnap_ffs
I've been playing around with snapshots lately but I've got a problem on one of my servers running 7-STABLE amd64: FreeBSD paladin 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #8: Mon Nov 10 20:49:51 GMT 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PALADIN amd64 I run the mksnap_ffs command to take the snapshot and some time later the system completely freezes up: paladin# cd /u2/.snap/ paladin# mksnap_ffs /u2 test.1 It only happens on this one filesystem, though, which might be to do with its size. It's not over the 2TB marker, but it's pretty close. It's also backed by a hardware RAID system, although a smaller filesystem on the same RAID has no issues. Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/da0s1a 2078881084 921821396 99074920248%/u2 To clarify completely freezes up: unresponsive to all services over the network, except ping. On the console I can switch between the ttys, but none of them respond. The only way out is to hit the reset button. Any advice? I'm happy to help debug this further to get to the bottom of it. Thanks, Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: System deadlock when using mksnap_ffs
On Wed, Nov 12, 2008 at 05:58:26PM +, Tim Bishop wrote: I run the mksnap_ffs command to take the snapshot and some time later the system completely freezes up: paladin# cd /u2/.snap/ paladin# mksnap_ffs /u2 test.1 Someone (not named because they choose not to reply to the list) gave me the following patch: --- sys/ufs/ffs/ffs_snapshot.c.orig Wed Mar 22 09:42:31 2006 +++ sys/ufs/ffs/ffs_snapshot.c Mon Nov 20 14:59:13 2006 @@ -282,6 +282,8 @@ restart: if (error) goto out; bawrite(nbp); + if (cg % 10 == 0) + ffs_syncvnode(vp, MNT_WAIT); } /* * Copy all the cylinder group maps. Although the @@ -303,6 +305,8 @@ restart: goto out; error = cgaccount(cg, vp, nbp, 1); bawrite(nbp); + if (cg % 10 == 0) + ffs_syncvnode(vp, MNT_WAIT); if (error) goto out; } With the description: What can happen is on a big file system it will fill up the buffer cache with I/O and then run out. When the buffer cache fills up then no more disk I/O can happen :-( When you do a sync, it flushes that out to disk so things don't hang. It seems to work too. But it seems more like a workaround than a fix? Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: System deadlock when using mksnap_ffs
On Wed, Nov 12, 2008 at 08:10:50PM +0200, David Peall wrote: FreeBSD paladin 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #8: Mon Nov 10 20:49:51 GMT 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PALADIN amd64 I run the mksnap_ffs command to take the snapshot and some time later the system completely freezes up: If the file system is UFS2 it's a known problem but should have been fixed. http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues ident /boot/kernel/kernel | grep subr_sleepqueue version should be greater than 1.39.2.3? Yes it's UFS2, and yes it's greater than 1.39.2.3: $FreeBSD: src/sys/kern/subr_sleepqueue.c,v 1.39.2.5 2008/09/16 20:01:57 jhb Exp $ Are you sure the problem referenced on that page is the same? It talks about dog slow snapshotting, which I see on other filesystems and machines. But in this particular case the system is dead, and does not recover. Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: System deadlock when using mksnap_ffs
On Wed, Nov 12, 2008 at 09:47:35PM +0200, Kostik Belousov wrote: On Wed, Nov 12, 2008 at 05:58:26PM +, Tim Bishop wrote: I've been playing around with snapshots lately but I've got a problem on one of my servers running 7-STABLE amd64: FreeBSD paladin 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #8: Mon Nov 10 20:49:51 GMT 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PALADIN amd64 I run the mksnap_ffs command to take the snapshot and some time later the system completely freezes up: paladin# cd /u2/.snap/ paladin# mksnap_ffs /u2 test.1 It only happens on this one filesystem, though, which might be to do with its size. It's not over the 2TB marker, but it's pretty close. It's also backed by a hardware RAID system, although a smaller filesystem on the same RAID has no issues. Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/da0s1a 2078881084 921821396 99074920248%/u2 To clarify completely freezes up: unresponsive to all services over the network, except ping. On the console I can switch between the ttys, but none of them respond. The only way out is to hit the reset button. You need to provide information described in the http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html and especially http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Ok, I've done that, and removed the patch that seemed to fix things. The first thing I notice after doing this on the console is that I can still ctrl+t the process: load: 0.14 cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k But the top and ps I left running on other ttys have all stopped responding. Also the following kernel message came out: Expensive timeout(9) function: 0x802ce380(0xff000677ca50) 0.006121001 s There is also still some disk I/O. Dropping to ddb worked, but I don't have a serial console so I can't paste the output. ps shows mksnap_ffs in newbuf, as we already saw. A trace of mksnap_ffs looks like this: Tracing pid 2603 tid 100214 td 0xff0006a0e370 sched_switch() at sched_switch+0x2a1 mi_switch() at mi_switch+0x233 sleepq_switch() at sleepq_switch+0xe9 sleepq_wait() at sleepq_wait+0x44 _sleep() at _sleep+0x351 getnewbuf() at getnewbuf+0x2e1 getblk() at getblk+0x30d setup_allocindir_phase2() at setup_allocindir_phase2+0x338 softdep_setup_allocindir_page() at softdep_setup_allocindir_page+0xa7 ffs_balloc_ufs2() at ffs_balloc_ufs2+0x121e ffs_snapshot() at ffs_snapshot+0xc52 ffs_mount() at ffs_mount+0x735 vfs_donmount() at vfs_donmount+0xeb5 kernel_mount() at kernel_mount+0xa1 ffs_cmount() at ffs_cmount+0x92 mount() at mount+0x1cc syscall() at syscall+0x1f6 Xfast_syscall() at Xfast_syscall+0xab --- syscall (21, FreeBSD ELF64, mount), rip = 0x80068636c, rsp = 0x7fffe518, rbp = 0x8008447a0 --- show pcpu shows cpuid 3 (quad core machine) in thread swi6: Giant taskq. All the other cpus are idle. show locks shows: exclusive sleep mutex Giant r = 0 (0x806ae040) locked @ /usr/src/sys/kern/kern_intr.c:1087 There are two other locks shown by show all locks, one for sshd and one for mysqld, both in kern/uipc_sockbuf.c. show lockedvnods shows mksnap_ffs has a lock on da0s1a with ffs_vget at the top of the stack. Sorry for any typos. I'll sort out a serial cable if more is needed :-) Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: /etc/rc.d after cvsup yesterday?
On Fri, May 25, 2007 at 07:42:10AM -0500, JD Bronson wrote: I noticed after cvsup'ing the other day (6.2-stable) that /etc/rc.d now has some issues: # rcorder /usr/src/etc/rc.d/* rcorder: requirement `zfs' in file `/usr/src/etc/rc.d/FILESYSTEMS' has no providers. (removing 'zfs' in FILESYSTEMS fixes this) Looks like that bit was MFCed in error. zfs only exists on CURRENT. and then this later on with rcorder: .. ... /usr/src/etc/rc.d/routed rcorder: Circular dependency on provision `mountcritremote' in file `/usr/src/etc/rc.d/archdep'. So to fix this (for now) I just changed the REQUIRE from 'mountcritremote' to 'routed' and that seems to be OK for me. Anyone else notice this? Not sure about that one though :-) Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: buildworld broken?
On Thu, May 24, 2007 at 11:57:46PM +0300, Abdullah Ibn Hamad Al-Marri wrote: /usr/src/sys/modules/procfs/../../fs/procfs/procfs_regs.c: In function `procfs_doprocregs': /usr/src/sys/modules/procfs/../../fs/procfs/procfs_regs.c:96: warning: implicit declaration of function `PROC_ASSERT_HELD' /usr/src/sys/modules/procfs/../../fs/procfs/procfs_regs.c:96: warning: nested extern declaration of `PROC_ASSERT_HELD' *** Error code 1 It looks like this has already been fixed. See Dag-Erling Smørgrav's email a short while ago. Wait an hour or so, then update your sources and try again. Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Kernel Dumps on to gmirror device?
I've got a 6.1 server that's panicing and I'd like to debug it. The problem is that I can't get a kernel dump on to my gmirror device. It looks like since 6.1 this has been supported (it says so in the release notes). In my rc.conf I have: dumpdev=/dev/mirror/gm0s1b$a Which is my swap partition. On booting it says: kernel dumps on /dev/mirror/gm0s1b But when it panics it says: Fatal trap 12: page fault while in kernel mode fault virtual address = 0x14 fault code = supervisor write, page not present instruction pointer = 0x20:0xc057ff9f stack pointer = 0x28:0xe2df3c44 frame pointer = 0x28:0xe2df3c4c code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 13 (swi4: clock sio) trap number = 12 panic: page fault Uptime: 1d4h15m55s GEOM_MIRROR: Device gm0: provider mirror/gm0 destroyed. GEOM_MIRROR: Device gm0 destroyed. Cannot dump. No dump device defined. Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... I'm guessing it's because the GEOM_MIRROR device has been destroyed just before it wants to dump? Any suggestions on a way forward to getting a dump out? Thanks, Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 pgp9T1dbCb1cm.pgp Description: PGP signature
Re: vinum to gvinum help
On Mon, Jun 26, 2006 at 12:22:07PM -0400, Sven Willenberger wrote: I have an i386 system currently running 5.2.1-RELEASE with a vinum mirror array (2 drives comprising /usr ). I want to upgrade this to 5.5-RELEASE which, if I understand correctly, no longer supports vinum arrays. Would simply chaning /boot/loader.conf to read gvinum_load instead of vinum_load work or would the geom layer prevent this from working properly? If not, is there a recommended way of upgrading a vinum array to a gvinum or gmirror array? I did this upgrade not long ago (and later to 6.1). The process of switching from vinum to gvinum is pretty easy, although the specifics escape me now. Changing loader.conf and fstab are the main ones, but assuming you have console access you should easily be able to fix anything else that crops up. As Mark said, there were shared library version changes between these versions, so you'll end up needing to rebuild all your apps. I did this with portupgrade, but I got everything updated and working on 5.2.1 first so I wouldn't have to worry about that during the upgrade. A word of warning though. I'm currently left with no raid because somewhere along the line vinum/gvinum corrupted the metadata. This only happened after a disk failure and after switching to gvinum (could be coincidence), but has left me looking elsewhere - gmirror looks good - for a RAID system. See the archives of this list for details - summary is that the kernel module locks up when loading. Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Problem with geom_vinum on 5.5 and 6.1
I'm running 6.1-RELEASE (and previously 5.5) with gvinum to mirror two internal root disks. At the time of this problem the second disk is physically disconnected. [with 5.5] Whilst copying data off the first disk, from a gvinum volume, I had a single disk error. This put the volume in the down state. I rebooted the machine (probably not the best move in hindsight!), and when booting it said the following: ad0: 78167MB Maxtor 6Y080P0/YAR41BW0 [158816/16/63] at ata0-master UDMA133 GEOM_VINUM: subdisk swap.p1.s0 state change: down - stale GEOM_VINUM: subdisk root.p1.s0 state change: down - stale GEOM_VINUM: subdisk var.p1.s0 state change: down - stale GEOM_VINUM: subdisk usr.p1.s0 state change: down - stale And then completely hangs. I would have expected the gvinum volumes to be unavailable and be given the choice of which root fs to mount. I've currently got round this by booting a different kernel which stops geom_vinum.ko from being loaded, and consequently allows me to choose a root fs. The filesystems are now mounted directly from /dev/ad0s1x. If I do gvinum start in single user it locks up too. [now with 6.1] After an upgrade to 6.1 if I do gvinum start in single user I get the same GEOM_VINUM lines as above, and gvinum hangs and becomes uninterruptable. Unlike with 5.5, the machine is still vaguely responsive and a ctrl+alt+del forces a reboot fine. Maybe my configuration on disk is corrupt? or something like that? Has anyone got any ideas, or should I maybe just start from scratch with a new gvinum config? Thanks, Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 - End forwarded message - Cheers, Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
ataraid - RAID5 in RELENG_6?
I'm having a fiddle with RELENG_6 and while setting up a RAID1 system disk I noticed that atacontrol now lets you create a RAID5 device. I gave it a whirl and it seemed to work - I have a device I can use. But is this working properly? I don't have a hardware raid card, just a plain old SATA card. Having searched the archives I noticed that Søren said it wasn't handling parity. Maybe this was fixed? Anyway - the bottom line is this. Can I create an entirely software RAID5 setup using ataraid? I'm personally not finding gvinum to be that polished... Cheers, Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]