Re: FreeBSD 9.1-RC1 Available...
Jim Pingle li...@pingle.org writes: On 8/23/2012 11:43 AM, Ian Lepore wrote: On Thu, 2012-08-23 at 11:17 -0400, Ken Menzel wrote: I found two good primers: http://mebsd.com/configure-freebsd-servers/update-freebsd-source-tree-using-subversion-svn.html http://www.freebsd.org/doc/en/articles/committers-guide/article.html#SUBVERSION-PRIMER The second primer in the committer handbook seems to indicate that it is difficult to run an SVN mirror. This appears to me to be the biggest drawback. I have been using CVS and perforce for years, but subversion is new to me. It may be difficult to run an svn mirror that allows you to commit locally and get those changes back to the project, but running a read-only mirror is trivial. The script I run nightly from cron to sync my local mirror is: #!/bin/sh # # svnsync to pull in changes from FreeBSD to my local mirror. # svnsync sync file:///local/vc/svn/base I can't remember how I initially created and populated the mirror, but it's likely I grabbed a snapshot of the mirror at work and brought it home on a thumb drive (just to avoid initial network DL time). I spent a little time today setting up an SVN mirror after reading this thread and wrote up a how-to for those looking to do the same. http://www.pingle.org/2012/08/24/freebsd-svn-mirror Comments/Flames/Corrections welcome... thanx; works out of the box for me (using the svnserve_enable path). That said : I glanced at a diff of a stable/8 checkout both from /home/ncvs repo and new /home/freebsd-svn one, and saw a (maybe well-known ..) 'feature' : diff ./src/contrib/amd/include/am_defs.h /raid1/bsd/8/src/contrib/amd/include/am_defs.h 42c42 * $FreeBSD: stable/8/contrib/amd/include/am_defs.h 174299 2007-12-05 16:03:52Z obrien $ --- * $FreeBSD: src/contrib/amd/include/am_defs.h,v 1.15.2.1 2009/08/03 08:13:06 kensmith Exp $ I wondered why the date (and commiter ...) in the expansion were different (from the svn log ): r196045 | kensmith | 2009-08-03 10:13:06 +0200 (Mon, 03 Aug 2009) | 4 lines Copy head to stable/8 as part of 8.0 Release cycle. Approved by:re (Implicit) r174299 | obrien | 2007-12-05 17:03:52 +0100 (Wed, 05 Dec 2007) | 3 lines So the 'Copy head' chain does not update the $FreeBSD tag, whereas the consequent svn to cvs chain does. FYI, Arno Jim ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: nfs-bug when server for 9-Stable becomes client as well ?
Vincent Hoffman vi...@unsane.co.uk writes: On 06/07/2012 18:51, Arno J. Klaassen wrote: Vincent Hoffman vi...@unsane.co.uk writes: On 06/07/2012 14:19, Arno J. Klaassen wrote: Hello, looks like I discouvered a probable bug in the nfs-code, very easy to reproduce in my setup : Machine-1 : Today's 9-stable, exporting /files (ufs) and /z2 (zfs) Machine-2 : 8-stable as of April the 10th exporting /raid1 On Machine-1 I mount /raid1 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768) and start a script on this mount looping something like : dd if=/dev/random of=BIG bs=1048576 count=${SIZE} cp -fp BIG BIG2 cmp -x BIG BIG2 I let this run for 24 hours (from time to time stressing Machine-1 with other scripts, including provoking heavy swapping), no problem at all. However, then I mount /z2 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768) on Machine-2, and *immediately* the above loop on Machine-1 fails : Copying file ...cp: BIG: Permission denied No console messages this time, last time I got kernel: nfs_getpages: error 13 kernel: vm_fault: pager read error, pid 87803 (cmp) on Machine-1. I repeated this scenario by replacing Machine-2 with a good old 6-4-stable one, same outcome. Please tell me what I could do to nail this down a bit more. Its possible (although not definite) that you have hit the a mountd bug as documented in PRs kern/131342 kern/136865 especially kern/131342 looks similar and quite old; funny I never hit this before, I basically do the same tests since 'ages' on each new box. Could be that faster network/cpu unreveals some race condition; I notice as well that this server is the first (IIRC) who uses 3 different IRQs for network interrupts (em(4) Intel(R) PRO/1000). Certainly possible and seems reasonable enough. just my $0.02, I glanced kern/131342, looks like the culprit should be something like a 'non-atomic'-operation in-between invalidating old /etc/exports and validating new /etc/exports. Wonder if just verifying /var/run/mountd.pid is newer than /etc/exports and if true just skip that operation would be an acceptable band-aid (if I understood correctly, a rewrite of mountd correcting this (amongst others) is close to hit -current (?)) I've recently asked on -CURRENT about this and had a patch to try from Rick, I'm testing it now but it doesnt seem to fix it for me, just improve it alothough I'm trying to get enough runs to be a valid sample. (see http://docs.freebsd.org/cgi/getmsg.cgi?fetch=377627+0+archive/2012/freebsd-current/20120701.freebsd-current ) What I did for my production nas was edit mount.c so it didnt send a SIGHUP to mountd as suggested by rick, as it was easy to do and non intrusive. hmm, this means I should patch each fbsd-client, no? May be easier to patch mountd to ignore SIHGUP and use some non-standard signal to force re-init? No just patch /sbin/mount on the nfs server so it doesnt send the SIGHUP to mountd. [In my case] it's the mount on a client which causes the server to fail, I don't see how patching /sbin/mount on the nfs server should fix this? As I don't remember if it's possible to discriminate a -1 signal send from a process against one sent from terminal, if so, another bandaid, one sent from a process could be ignored at all? Merci Arno you can manually HUP mountd if needed. Arno Vince Thanx in advance, Best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org -- Arno J. Klaassen SCITO S.A. 8 rue des Haies F-75020 Paris, France http://scito.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
nfs-bug when server for 9-Stable becomes client as well ?
Hello, looks like I discouvered a probable bug in the nfs-code, very easy to reproduce in my setup : Machine-1 : Today's 9-stable, exporting /files (ufs) and /z2 (zfs) Machine-2 : 8-stable as of April the 10th exporting /raid1 On Machine-1 I mount /raid1 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768) and start a script on this mount looping something like : dd if=/dev/random of=BIG bs=1048576 count=${SIZE} cp -fp BIG BIG2 cmp -x BIG BIG2 I let this run for 24 hours (from time to time stressing Machine-1 with other scripts, including provoking heavy swapping), no problem at all. However, then I mount /z2 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768) on Machine-2, and *immediately* the above loop on Machine-1 fails : Copying file ...cp: BIG: Permission denied No console messages this time, last time I got kernel: nfs_getpages: error 13 kernel: vm_fault: pager read error, pid 87803 (cmp) on Machine-1. I repeated this scenario by replacing Machine-2 with a good old 6-4-stable one, same outcome. Please tell me what I could do to nail this down a bit more. Thanx in advance, Best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: nfs-bug when server for 9-Stable becomes client as well ?
Vincent Hoffman vi...@unsane.co.uk writes: On 06/07/2012 14:19, Arno J. Klaassen wrote: Hello, looks like I discouvered a probable bug in the nfs-code, very easy to reproduce in my setup : Machine-1 : Today's 9-stable, exporting /files (ufs) and /z2 (zfs) Machine-2 : 8-stable as of April the 10th exporting /raid1 On Machine-1 I mount /raid1 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768) and start a script on this mount looping something like : dd if=/dev/random of=BIG bs=1048576 count=${SIZE} cp -fp BIG BIG2 cmp -x BIG BIG2 I let this run for 24 hours (from time to time stressing Machine-1 with other scripts, including provoking heavy swapping), no problem at all. However, then I mount /z2 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768) on Machine-2, and *immediately* the above loop on Machine-1 fails : Copying file ...cp: BIG: Permission denied No console messages this time, last time I got kernel: nfs_getpages: error 13 kernel: vm_fault: pager read error, pid 87803 (cmp) on Machine-1. I repeated this scenario by replacing Machine-2 with a good old 6-4-stable one, same outcome. Please tell me what I could do to nail this down a bit more. Its possible (although not definite) that you have hit the a mountd bug as documented in PRs kern/131342 kern/136865 especially kern/131342 looks similar and quite old; funny I never hit this before, I basically do the same tests since 'ages' on each new box. Could be that faster network/cpu unreveals some race condition; I notice as well that this server is the first (IIRC) who uses 3 different IRQs for network interrupts (em(4) Intel(R) PRO/1000). I've recently asked on -CURRENT about this and had a patch to try from Rick, I'm testing it now but it doesnt seem to fix it for me, just improve it alothough I'm trying to get enough runs to be a valid sample. (see http://docs.freebsd.org/cgi/getmsg.cgi?fetch=377627+0+archive/2012/freebsd-current/20120701.freebsd-current ) What I did for my production nas was edit mount.c so it didnt send a SIGHUP to mountd as suggested by rick, as it was easy to do and non intrusive. hmm, this means I should patch each fbsd-client, no? May be easier to patch mountd to ignore SIHGUP and use some non-standard signal to force re-init? Arno Vince Thanx in advance, Best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
9-STABLE and Iphone modem (tethering), anyone succeed ?
Hello, does anyone succeed in using an Iphone as modem on 9-STABLE (sources as of March 16) ? I follow the instructions from 'http://forums.freebsd.org/showthread.php?t=19995' using 'usbmuxd' and 'libimobiledevice' from ports. When I start 'usbmuxd' I indeed see in dmesg(1) : ipheth0: Apple Inc. iPhone, class 0/0, rev 2.00/0.01, addr 3 on usbus1 ue0: USB Ethernet on ipheth0 ue0: bpf attached ue0: Ethernet address: XXX I did not find 'ipheth-pair' (or something equiivalent) in ports, I build it from the sources as indicated in the forum-post, but it fails with : # ./ipheth-pair -v ./ipheth-pair: -14: cannot get lockdown The corresponsing log from 'usbmuxd -v -v ' says (stripped) : [16:29:20.490][3] usbmuxd v1.0.7 starting up [16:29:20.491][4] Creating socket [16:29:20.491][5] client_init [16:29:20.491][5] device_init [16:29:20.491][4] Initializing USB [16:29:20.491][5] usb_init for linux / libusb 1.0 [16:29:20.491][4] Found new device with v/p 05ac:1297 at 1-3 [16:29:20.491][4] Found interface 1 with endpoints 04/85 for device 1-3 [16:29:20.495][4] Using wMaxPacketSize=512 for device 1-3 [16:29:20.495][3] Connecting to new device on location 0x10003 as ID 1 [16:29:20.495][4] 1 device detected [16:29:20.495][3] Initialization complete [16:29:20.495][5] usb polling enable: 0 [16:29:20.496][3] Connected to v1.0 device 1 on location 0x10003 with serial number XXX [16:29:20.496][5] client_device_add: id 1, location 0x10003, serial XXX [16:29:46.428][4] New client on fd 9 [16:29:46.428][5] Client command in fd 9 len 16 ver 0 msg 3 tag 1 [16:29:46.428][5] send_pkt fd 9 tag 1 msg 1 payload_length 4 [16:29:46.428][5] Client 9 now LISTENING [16:29:46.428][5] Enlarging client 9 reply buffer 1024 - 1308 to make space for device notifications [16:29:46.428][5] send_pkt fd 9 tag 0 msg 4 payload_length 268 [16:29:47.437][4] Client 9 connection closed [16:29:47.437][4] Disconnecting client fd 9 [16:29:47.437][4] New client on fd 9 [16:29:47.437][5] Client command in fd 9 len 24 ver 0 msg 2 tag 2 [16:29:47.437][5] Client 9 connection request to device 1 port 62078 [16:29:47.437][5] [OUT] dev=1 sport=1 dport=62078 seq=0 ack=0 flags=0x2 window=131072[512] len=0 [16:29:47.439][5] [IN] dev=1 sport=62078 dport=1 seq=0 ack=1 flags=0x12 window=131072[512] len=0 [16:29:47.439][5] [OUT] dev=1 sport=1 dport=62078 seq=1 ack=1 flags=0x10 window=131072[512] len=0 [16:29:47.440][5] send_pkt fd 9 tag 2 msg 1 payload_length 4 [16:29:47.440][5] Client 9 switching to CONNECTED state [16:29:47.442][5] [OUT] dev=1 sport=1 dport=62078 seq=1 ack=1 flags=0x10 window=131072[512] len=4 ... (all having 'flags=0x10') [16:29:47.499][5] [IN] dev=1 sport=62078 dport=1 seq=3502 ack=14410 flags=0x10 window=131072[512] len=279 [16:29:47.501][5] [IN] dev=1 sport=62078 dport=1 seq=3781 ack=14410 flags=0x4 window=0[0] len=32 [16:29:47.501][5] RST reason: [16:29:47.501][4] Connection reset by device 1 (1-62078) [16:29:47.501][5] connection_teardown dev 1 sport 1 dport 62078 [16:29:47.501][4] Disconnecting client fd 9 [16:29:47.501][4] client_process: fd 9 not found in client list I hope anyone reading this has had more succes ;-). Thanx, Arno NB 1, Iphone not 'jailbroken' NB 2, yes 'it works' under Windows ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
9-stable: one-device ZFS fails [was: 9-stable : geli + one-disk ZFS fails]
a followup to myself Hello, Martin Simmons mar...@lispworks.com writes: Some random ideas: 1) Can you dd the whole of ada0s3.eli without errors? 2) If you scrub a few more times, does it find the same number of errors each time and are they always in that XNAT.tar file? 3) Can you try zfs without geli? yeah, and it seems to rule out geli : [ splitted original /dev/ada0s3 in equally sized /dev/ada0s3 and /dev/ada0s4 ] geli init /dev/ada0s3 geli attach /dev/ada0s3 zpool create zgeli /dev/ada0s3.eli zfs create zgeli/home zfs create zgeli/home/arno zfs create zgeli/home/arno/.priv zfs create zgeli/home/arno/.scito zfs set copies=2 zgeli/home/arno/.priv zfs set atime=off zgeli [put some files on it, wait a little : ] [root@cc ~]# zpool status -v pool: zgeli state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub in progress since Sat Feb 18 17:46:54 2012 425M scanned out of 2.49G at 85.0M/s, 0h0m to go 0 repaired, 16.64% done config: NAME STATE READ WRITE CKSUM zgeli ONLINE 0 0 1 ada0s3.eli ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: /zgeli/home/arno/8.0-CURRENT-200902-amd64-livefs.iso [root@cc ~]# zpool scrub -s zgeli [root@cc ~]# [then idem directly on next partition ] zpool create zgpart /dev/ada0s4 zfs create zgpart/home zfs create zgpart/home/arno zfs create zgpart/home/arno/.priv zfs create zgpart/home/arno/.scito zfs set copies=2 zgpart/home/arno/.priv zfs set atime=off zgpart [put some files on it, wait a little : ] pool: zgpart state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub repaired 0 in 0h0m with 1 errors on Sat Feb 18 18:04:45 2012 config: NAMESTATE READ WRITE CKSUM zgpart ONLINE 0 0 1 ada0s4ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: /zgpart/home/arno/.scito/ [root@cc ~]# I tested a bit more this afternoon : - zpool create zgpart /dev/ada0s4d = KO - split ada0s4 in two equally sized partitions and then zpool create zgpart mirror /dev/ada0s4d /dev/ada0s4e = works like a charm . ( [root@cc /zgpart]# zpool status -v zgpart pool: zgpart state: ONLINE scan: scrub repaired 0 in 0h36m with 0 errors on Sun Feb 19 17:20:34 2012 config: NAME STATE READ WRITE CKSUM zgpart ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada0s4d ONLINE 0 0 0 ada0s4e ONLINE 0 0 0 errors: No known data errors ) FYI, best, Arno I still do not particuliarly suspect the disk since I cannot reproduce similar behaviour on UFS. That said, this disk is supposed to be 'hybrid-SSD', maybe something special ZFS doesn't like ??? : ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ST95005620AS SD23 ATA-8 SATA 2.x device ada0: Serial Number 5YX0J5YD ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 GEOM: new disk ada0 Please let me know what information to provide more. Best, Arno 4) Is the slice/partition layout definitely correct? __Martin On Mon, 13 Feb 2012 23:39:06 +0100, Arno J Klaassen said: hello, to eventually gain interest in this issue : I updated to today's -stable, tested with vfs.zfs.debug=1 and vfs.zfs.prefetch_disable=0, no difference. I also tested to read the raw partition : [root@cc /usr/ports]# dd if=/dev/ada0s3 of=/dev/null bs=4096 conv=noerror 103746636+0 records in 103746636+0 records out 424946221056 bytes transferred in 13226.346738 secs (32128768 bytes/sec) [root@cc /usr/ports]# Disk is brand new, looks ok, either my setup is not good or there is a bug somewhere; I can play around with this box for some more time, please feel free to provide me with some hints what to do to be useful for you. Best, Arno Arno J. Klaassen a...@heho.snv.jussieu.fr writes: Hello, I finally decided to 'play' a bit with ZFS on a notebook, some years old, but I installed a brand new disk and memtest
Re: 9-stable : geli + one-disk ZFS fails
Hello, Martin Simmons mar...@lispworks.com writes: Some random ideas: 1) Can you dd the whole of ada0s3.eli without errors? 2) If you scrub a few more times, does it find the same number of errors each time and are they always in that XNAT.tar file? 3) Can you try zfs without geli? yeah, and it seems to rule out geli : [ splitted original /dev/ada0s3 in equally sized /dev/ada0s3 and /dev/ada0s4 ] geli init /dev/ada0s3 geli attach /dev/ada0s3 zpool create zgeli /dev/ada0s3.eli zfs create zgeli/home zfs create zgeli/home/arno zfs create zgeli/home/arno/.priv zfs create zgeli/home/arno/.scito zfs set copies=2 zgeli/home/arno/.priv zfs set atime=off zgeli [put some files on it, wait a little : ] [root@cc ~]# zpool status -v pool: zgeli state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub in progress since Sat Feb 18 17:46:54 2012 425M scanned out of 2.49G at 85.0M/s, 0h0m to go 0 repaired, 16.64% done config: NAME STATE READ WRITE CKSUM zgeli ONLINE 0 0 1 ada0s3.eli ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: /zgeli/home/arno/8.0-CURRENT-200902-amd64-livefs.iso [root@cc ~]# zpool scrub -s zgeli [root@cc ~]# [then idem directly on next partition ] zpool create zgpart /dev/ada0s4 zfs create zgpart/home zfs create zgpart/home/arno zfs create zgpart/home/arno/.priv zfs create zgpart/home/arno/.scito zfs set copies=2 zgpart/home/arno/.priv zfs set atime=off zgpart [put some files on it, wait a little : ] pool: zgpart state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub repaired 0 in 0h0m with 1 errors on Sat Feb 18 18:04:45 2012 config: NAMESTATE READ WRITE CKSUM zgpart ONLINE 0 0 1 ada0s4ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: /zgpart/home/arno/.scito/ [root@cc ~]# I still do not particuliarly suspect the disk since I cannot reproduce similar behaviour on UFS. That said, this disk is supposed to be 'hybrid-SSD', maybe something special ZFS doesn't like ??? : ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ST95005620AS SD23 ATA-8 SATA 2.x device ada0: Serial Number 5YX0J5YD ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 GEOM: new disk ada0 Please let me know what information to provide more. Best, Arno 4) Is the slice/partition layout definitely correct? __Martin On Mon, 13 Feb 2012 23:39:06 +0100, Arno J Klaassen said: hello, to eventually gain interest in this issue : I updated to today's -stable, tested with vfs.zfs.debug=1 and vfs.zfs.prefetch_disable=0, no difference. I also tested to read the raw partition : [root@cc /usr/ports]# dd if=/dev/ada0s3 of=/dev/null bs=4096 conv=noerror 103746636+0 records in 103746636+0 records out 424946221056 bytes transferred in 13226.346738 secs (32128768 bytes/sec) [root@cc /usr/ports]# Disk is brand new, looks ok, either my setup is not good or there is a bug somewhere; I can play around with this box for some more time, please feel free to provide me with some hints what to do to be useful for you. Best, Arno Arno J. Klaassen a...@heho.snv.jussieu.fr writes: Hello, I finally decided to 'play' a bit with ZFS on a notebook, some years old, but I installed a brand new disk and memtest passes OK. I installed base+ports on partition 2, using 'classical' UFS. I crypted partition 3 and created a single zpool on it containing 4 Z-file-systems : [root@cc ~]# zfs list NAME USED AVAIL REFER MOUNTPOINT zfiles 10.7G 377G 152K /zfiles zfiles/home 10.6G 377G 119M /zfiles/home zfiles/home/arno 10.5G 377G 2.35G /zfiles/home/arno zfiles/home/arno/.priv192K 377G 192K /zfiles/home/arno/.priv zfiles/home/arno/.scito 8.18G 377G 8.18G /zfiles/home/arno/.scito I export the ZFS's via nfs and rsynced on the other machine some backup of my current note-book (geli + UFS, (almost) same 9-stable version, no problem) to the ZFS's. Quite fast, I see on the notebook : [root@cc /usr/temp
Re: 9-stable : geli + one-disk ZFS fails
Hello, Martin Simmons mar...@lispworks.com writes: Some random ideas: 1) Can you dd the whole of ada0s3.eli without errors? [root@cc ~]# dd if=/dev/ada0s3.eli of=/dev/null bs=4096 conv=noerror 103746635+0 records in 103746635+0 records out 424946216960 bytes transferred in 18773.796016 secs (22635072 bytes/sec) [root@cc ~]# 2) If you scrub a few more times, does it find the same number of errors each time and are they always in that XNAT.tar file? Looks like each scrub worsens the situation : [root@cc ~]# zpool status -v pool: zfiles state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub repaired 148K in 0h14m with 26 errors on Mon Feb 13 18:54:33 2012 config: NAME STATE READ WRITE CKSUM zfilesONLINE 0 026 ada0s3.eli ONLINE 0 087 errors: Permanent errors have been detected in the following files: [ 11 files ] [root@cc ~]# zpool status -v pool: zfiles state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub in progress since Wed Feb 15 14:36:52 2012 17.7G scanned out of 28.7G at 72.1M/s, 0h2m to go 0 repaired, 61.56% done config: NAME STATE READ WRITE CKSUM zfilesONLINE 0 054 ada0s3.eli ONLINE 0 0 143 errors: Permanent errors have been detected in the following files: [ 11 files ] # [root@cc ~]# zpool status -v pool: zfiles state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub repaired 4K in 0h7m with 70 errors on Wed Feb 15 14:43:57 2012 config: NAME STATE READ WRITE CKSUM zfilesONLINE 0 096 ada0s3.eli ONLINE 0 0 228 errors: Permanent errors have been detected in the following files: [ 25 files (cannot quickly see iff it contains all old 11 files) ] [root@cc ~]# [root@cc ~]# zpool status -v pool: zfiles state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub repaired 0 in 0h6m with 70 errors on Wed Feb 15 15:19:28 2012 config: NAME STATE READ WRITE CKSUM zfilesONLINE 0 0 166 ada0s3.eli ONLINE 0 0 368 errors: Permanent errors have been detected in the following files: [ 25 files ] [root@cc ~]# 3) Can you try zfs without geli? 4) Is the slice/partition layout definitely correct? __Martin On Mon, 13 Feb 2012 23:39:06 +0100, Arno J Klaassen said: hello, to eventually gain interest in this issue : I updated to today's -stable, tested with vfs.zfs.debug=1 and vfs.zfs.prefetch_disable=0, no difference. I also tested to read the raw partition : [root@cc /usr/ports]# dd if=/dev/ada0s3 of=/dev/null bs=4096 conv=noerror 103746636+0 records in 103746636+0 records out 424946221056 bytes transferred in 13226.346738 secs (32128768 bytes/sec) [root@cc /usr/ports]# Disk is brand new, looks ok, either my setup is not good or there is a bug somewhere; I can play around with this box for some more time, please feel free to provide me with some hints what to do to be useful for you. Best, Arno Arno J. Klaassen a...@heho.snv.jussieu.fr writes: Hello, I finally decided to 'play' a bit with ZFS on a notebook, some years old, but I installed a brand new disk and memtest passes OK. I installed base+ports on partition 2, using 'classical' UFS. I crypted partition 3 and created a single zpool on it containing 4 Z-file-systems : [root@cc ~]# zfs list NAME USED AVAIL REFER MOUNTPOINT zfiles 10.7G 377G 152K /zfiles zfiles/home 10.6G 377G 119M /zfiles/home zfiles/home/arno 10.5G 377G 2.35G /zfiles/home/arno zfiles/home/arno/.priv192K 377G 192K /zfiles/home/arno/.priv zfiles/home/arno/.scito 8.18G 377G 8.18G /zfiles/home/arno/.scito I export the ZFS's via nfs and rsynced on the other machine some backup of my
Re: 9-stable : geli + one-disk ZFS fails
Hallo Aleksandr, Hello, Arno J. Klaassen! On Sat, Feb 11, 2012 at 04:53:10PM +0100 a...@heho.snv.jussieu.fr wrote about 9-stable : geli + one-disk ZFS fails: Hello, I finally decided to 'play' a bit with ZFS on a notebook, some years old, but I installed a brand new disk and memtest passes OK. I installed base+ports on partition 2, using 'classical' UFS. I crypted partition 3 and created a single zpool on it containing 4 Z-file-systems : [root@cc ~]# zfs list NAME USED AVAIL REFER MOUNTPOINT zfiles 10.7G 377G 152K /zfiles zfiles/home 10.6G 377G 119M /zfiles/home zfiles/home/arno 10.5G 377G 2.35G /zfiles/home/arno zfiles/home/arno/.priv192K 377G 192K /zfiles/home/arno/.priv zfiles/home/arno/.scito 8.18G 377G 8.18G /zfiles/home/arno/.scito I export the ZFS's via nfs and rsynced on the other machine some backup of my current note-book (geli + UFS, (almost) same 9-stable version, no problem) to the ZFS's. Quite fast, I see on the notebook : [root@cc /usr/temp]# zpool status -v pool: zfiles state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub repaired 0 in 0h1m with 11 errors on Sat Feb 11 14:55:34 2012 config: NAME STATE READ WRITE CKSUM zfilesONLINE 0 011 ada0s3.eli ONLINE 0 023 errors: Permanent errors have been detected in the following files: /zfiles/home/arno/.scito/contrib/XNAT.tar [root@cc /usr/temp]# md5 /zfiles/home/arno/.scito/contrib/XNAT.tar md5: /zfiles/home/arno/.scito/contrib/XNAT.tar: Input/output error [root@cc /usr/temp]# As said, memtest is OK, nothing is logged to the console, UFS on the same disk works OK (I did some tests copying and comparing random data) and smartctl as well seems to trust the disk : SMART Self-test log structure revision number 1 Num Test_DescriptionStatus Remaining LifeTime(hours) # 1 Extended offlineCompleted without error 00% 388 # 2 Short offline Completed without error 00% 387 Am I doing something wrong and/or let me know what I could provide as extra info to try to solve this (dmesg.boot at the end of this mail). Thanx a lot in advance, best, Arno Arno, you forgot to say how are you create geli partiotion. It is important. geli init /dev/ada0s3 (should I have used ' -s 4096 ' ???) I added later : geli attach -k /tmp/ifmemoryfails.key1 -p /dev/ada0s3 In fact, on my regular laptop on which I now use UFS on top of GELI I use /dev/ada0s3f, not the whole partition Hope this helps ;-) thanx, best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9-stable : geli + one-disk ZFS fails
Hi, Martin Simmons mar...@lispworks.com writes: Some random ideas: 1) Can you dd the whole of ada0s3.eli without errors? I just started it; will take some hours 2) If you scrub a few more times, does it find the same number of errors each time and are they always in that XNAT.tar file? I deleted the XNAT.tar; I also copied files by 'ssh tar -c | tar -xp' to rule out NFS, same type of errors; Looks like multiple scrubs give the same files but not the same number of chksum errors (to be confirmed) 3) Can you try zfs without geli? sure, I will split the place in one partition with geli and one without 4) Is the slice/partition layout definitely correct? I (still ???) use sysinstall to do the dirty computations in my place. This is what gpart says (looks OK (to me ...) : [root@cc ~]# gpart list ada0 Geom name: ada0 modified: false state: OK fwheads: 16 fwsectors: 63 last: 976773167 first: 63 entries: 4 scheme: MBR Providers: 1. Name: ada0s1 Mediasize: 40802001408 (38G) Sectorsize: 512 Stripesize: 0 Stripeoffset: 32256 Mode: r0w0e0 rawtype: 7 length: 40802001408 offset: 32256 type: ntfs index: 1 end: 79691471 start: 63 2. Name: ada0s2 Mediasize: 34359607296 (32G) Sectorsize: 512 Stripesize: 0 Stripeoffset: 2147328000 Mode: r3w3e5 attrib: active rawtype: 165 length: 34359607296 offset: 40802033664 type: freebsd index: 2 end: 146800079 start: 79691472 3. Name: ada0s3 Mediasize: 424946221056 (395G) Sectorsize: 512 Stripesize: 0 Stripeoffset: 2147196928 Mode: r1w1e1 rawtype: 165 length: 424946221056 offset: 75161640960 type: freebsd index: 3 end: 976773167 start: 146800080 Consumers: 1. Name: ada0 Mediasize: 500107862016 (465G) Sectorsize: 512 Mode: r4w4e10 Merci, Arno __Martin On Mon, 13 Feb 2012 23:39:06 +0100, Arno J Klaassen said: hello, to eventually gain interest in this issue : I updated to today's -stable, tested with vfs.zfs.debug=1 and vfs.zfs.prefetch_disable=0, no difference. I also tested to read the raw partition : [root@cc /usr/ports]# dd if=/dev/ada0s3 of=/dev/null bs=4096 conv=noerror 103746636+0 records in 103746636+0 records out 424946221056 bytes transferred in 13226.346738 secs (32128768 bytes/sec) [root@cc /usr/ports]# Disk is brand new, looks ok, either my setup is not good or there is a bug somewhere; I can play around with this box for some more time, please feel free to provide me with some hints what to do to be useful for you. Best, Arno Arno J. Klaassen a...@heho.snv.jussieu.fr writes: Hello, I finally decided to 'play' a bit with ZFS on a notebook, some years old, but I installed a brand new disk and memtest passes OK. I installed base+ports on partition 2, using 'classical' UFS. I crypted partition 3 and created a single zpool on it containing 4 Z-file-systems : [root@cc ~]# zfs list NAME USED AVAIL REFER MOUNTPOINT zfiles 10.7G 377G 152K /zfiles zfiles/home 10.6G 377G 119M /zfiles/home zfiles/home/arno 10.5G 377G 2.35G /zfiles/home/arno zfiles/home/arno/.priv192K 377G 192K /zfiles/home/arno/.priv zfiles/home/arno/.scito 8.18G 377G 8.18G /zfiles/home/arno/.scito I export the ZFS's via nfs and rsynced on the other machine some backup of my current note-book (geli + UFS, (almost) same 9-stable version, no problem) to the ZFS's. Quite fast, I see on the notebook : [root@cc /usr/temp]# zpool status -v pool: zfiles state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub repaired 0 in 0h1m with 11 errors on Sat Feb 11 14:55:34 2012 config: NAME STATE READ WRITE CKSUM zfilesONLINE 0 011 ada0s3.eli ONLINE 0 023 errors: Permanent errors have been detected in the following files: /zfiles/home/arno/.scito/contrib/XNAT.tar [root@cc /usr/temp]# md5 /zfiles/home/arno/.scito/contrib/XNAT.tar md5: /zfiles/home/arno/.scito/contrib/XNAT.tar: Input/output error [root@cc /usr/temp]# As said, memtest is OK, nothing is logged to the console, UFS on the same disk works OK (I did some tests copying and comparing random data) and smartctl as well seems to trust the disk : SMART Self-test log structure revision number 1 Num Test_DescriptionStatus Remaining LifeTime(hours) # 1 Extended offlineCompleted without error 00% 388 # 2 Short offline Completed without error 00% 387
9-stable : geli + one-disk ZFS fails
Hello, I finally decided to 'play' a bit with ZFS on a notebook, some years old, but I installed a brand new disk and memtest passes OK. I installed base+ports on partition 2, using 'classical' UFS. I crypted partition 3 and created a single zpool on it containing 4 Z-file-systems : [root@cc ~]# zfs list NAME USED AVAIL REFER MOUNTPOINT zfiles 10.7G 377G 152K /zfiles zfiles/home 10.6G 377G 119M /zfiles/home zfiles/home/arno 10.5G 377G 2.35G /zfiles/home/arno zfiles/home/arno/.priv192K 377G 192K /zfiles/home/arno/.priv zfiles/home/arno/.scito 8.18G 377G 8.18G /zfiles/home/arno/.scito I export the ZFS's via nfs and rsynced on the other machine some backup of my current note-book (geli + UFS, (almost) same 9-stable version, no problem) to the ZFS's. Quite fast, I see on the notebook : [root@cc /usr/temp]# zpool status -v pool: zfiles state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub repaired 0 in 0h1m with 11 errors on Sat Feb 11 14:55:34 2012 config: NAME STATE READ WRITE CKSUM zfilesONLINE 0 011 ada0s3.eli ONLINE 0 023 errors: Permanent errors have been detected in the following files: /zfiles/home/arno/.scito/contrib/XNAT.tar [root@cc /usr/temp]# md5 /zfiles/home/arno/.scito/contrib/XNAT.tar md5: /zfiles/home/arno/.scito/contrib/XNAT.tar: Input/output error [root@cc /usr/temp]# As said, memtest is OK, nothing is logged to the console, UFS on the same disk works OK (I did some tests copying and comparing random data) and smartctl as well seems to trust the disk : SMART Self-test log structure revision number 1 Num Test_DescriptionStatus Remaining LifeTime(hours) # 1 Extended offlineCompleted without error 00% 388 # 2 Short offline Completed without error 00% 387 Am I doing something wrong and/or let me know what I could provide as extra info to try to solve this (dmesg.boot at the end of this mail). Thanx a lot in advance, best, Arno ### demsg.boot ### Table 'FACP' at 0xbdd90200 Table 'APIC' at 0xbdd90390 APIC: Found table at 0xbdd90390 APIC: Using the MADT enumerator. MADT: Found CPU APIC ID 0 ACPI ID 1: enabled SMP: Added CPU 0 (AP) MADT: Found CPU APIC ID 1 ACPI ID 2: enabled SMP: Added CPU 1 (AP) MADT: Found CPU APIC ID 130 ACPI ID 3: disabled MADT: Found CPU APIC ID 131 ACPI ID 4: disabled Copyright (c) 1992-2012 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.0-STABLE #0: Fri Feb 3 22:48:57 CET 2012 toor@cc:/usr/obj/raid1/bsd/9/src/sys/VR603 amd64 Preloaded elf kernel /boot/kernel/kernel at 0x80bba000. Preloaded /boot/zfs/zpool.cache /boot/zfs/zpool.cache at 0x80bba200. Calibrating TSC clock ... TSC clock: 2161296371 Hz CPU: Intel(R) Pentium(R) Dual CPU T3400 @ 2.16GHz (2161.30-MHz K8-class CPU) Origin = GenuineIntel Id = 0x6fd Family = 6 Model = f Stepping = 13 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0xe39dSSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant, performance statistics real memory = 3221225472 (3072 MB) Physical memory chunk(s): 0x1000 - 0x00095fff, 610304 bytes (149 pages) 0x0010 - 0x001f, 1048576 bytes (256 pages) 0x00be9000 - 0xb8402fff, 3078725632 bytes (751642 pages) avail memory = 3057152000 (2915 MB) Event timer LAPIC quality 400 ACPI APIC Table: MSI_NB MEGABOOK INTR: Adding local APIC 1 as a target FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 x86bios: IVT 0x00-0x0004ff at 0xfe00 x86bios: SSEG 0x001000-0x001fff at 0xff800021 x86bios: EBDA 0x099000-0x09 at 0xfe099000 x86bios: ROM 0x0a-0x0fefff at 0xfe0a APIC: CPU 0 has ACPI ID 1 APIC: CPU 1 has ACPI ID 2 ULE: setup cpu 0 ULE: setup cpu 1 ACPI: RSDP 0xf9420 00014 (v00 ACPIAM) ACPI: RSDT 0xbdd9 00048 (v01 MSI_NB MEGABOOK 20091013 MSFT 0097) ACPI: FACP 0xbdd90200 00084 (v01 MSI_NB MEGABOOK 20091013 MSFT 0097) ACPI: DSDT 0xbdd905c0 072D3 (v01 1ADTS 1ADTS012 0012 INTL 20051117) ACPI: FACS 0xbdd9e000 00040 ACPI: APIC 0xbdd90390 0006C (v01
Re: 8.2-PRERELEASE freezing on reboot (-current OK)
Andriy Gapon a...@freebsd.org writes: on 14/12/2010 02:38 Jeremy Chadwick said the following: 1) [snip] Also try dropping to the debugger via serial console (serial break) or VGA (Ctrl-Alt-Esc). This is a good advice. may be ;-) but the box realy freezes, no way to drop into the debugger nor via serial (ALT_BREAK_TO_DEBUGGER compiled as well) nor via VGA/PS2 ... realy frozen Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.2-PRERELEASE freezing on reboot (-current OK)
Jeremy Chadwick free...@jdc.parodius.com writes: On Tue, Dec 14, 2010 at 11:24:52PM +0100, Arno J. Klaassen wrote: Andriy Gapon a...@freebsd.org writes: on 14/12/2010 02:38 Jeremy Chadwick said the following: 1) [snip] Also try dropping to the debugger via serial console (serial break) or VGA (Ctrl-Alt-Esc). This is a good advice. may be ;-) but the box realy freezes, no way to drop into the debugger nor via serial (ALT_BREAK_TO_DEBUGGER compiled as well) nor via VGA/PS2 ... realy frozen Bummer. It sounds like it's a regression of some kind, since your original mail stated this problem has been happening for you on RELENG_8 (8.x) for quite some time. (I have to assume it didn't happen for you at all on 7.3 or prior). yop, build this box with some 7.X; it's a spare/development/test one, doesn't get that much attention when problems, nor frequent reboots. I noticed this problem at least some months ago (I track -stable on it when I have time/envy) and adopted the anti-Murphy attitude (wait for -RELEASE and recheck) Would it be possible for you to dedicate some time narrowing down when the problem was introduced? A good starting point might be to try 8.0-RELEASE and then 8.1-RELEASE (just download + burn + boot livefs images and try that). If it happens on 8.1-RELEASE but not 8.0, then we've at least narrowed down the timeframe. I answered Attilio Rao atti...@freebsd.org in private on a patch he sent me in order to get at least more info. I will test and let you, and list, know the results. It might also be worth trying the same with 7.3-RELEASE vs. 7.4-BETA1 (which just came out) see if the same bug was introduced there as result of an MFC. Otherwise, numerous csups with different date= strings in your supfiles and multiple buildworld/kernel/installs would be required. I would probably pick intervals of 4 months at first. Quite sure many 8-XXX worlds worked great on this box including reboots, I will find the date window when things went wrong. Merci, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.2-PRERELEASE freezing on reboot (-current OK)
Hello, Jeremy Chadwick free...@jdc.parodius.com writes: On Fri, Dec 10, 2010 at 10:37:32AM +0100, Arno J. Klaassen wrote: just FYI that on an 8-way Tyan S3992-E based box, a reboot under 8.2-PRERELEASE (in fact, 8-stable since quite a while) makes the box freeze, whilst the same thing under -current works OK. Try toggling these two sysctls on the 8.2-PRERELEASE box. Be sure to check what the defaults are before toggling them, and only mess with one at a time. hw.acpi.handle_reboot hw.acpi.disable_on_reboot nope, no difference. Defaults are 0 for both sysctls, both on -current and -8-stable. I noticed that -current prints 'cpu_reset: Stopping other CPUs' at the very end were -8-stable doesn't. Thanx for your answer, best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
8.2-PRERELEASE freezing on reboot (-current OK)
Hello, just FYI that on an 8-way Tyan S3992-E based box, a reboot under 8.2-PRERELEASE (in fact, 8-stable since quite a while) makes the box freeze, whilst the same thing under -current works OK. For info the end of console output in both cases as well as dmesg.boot for -current. Feel free to contact me for more info or test patches. Best, Arno ### console log ### -current : [r...@siamesetwins ~]# reboot Dec 10 10:12:03 siamesetwins reboot: rebooted by toor Dec 10 10:12:03 siamesetwins syslogd: exiting on signal 15 Waiting (max 60 seconds) for system process `vnlru' to stop...done ts_to_ct(1291972331.482314452) = [2010-12-10 09:12:11] Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...3 3 3 2 2 1 0 0 0 done All buffers synced. Swap device aacd0s1b removed. Uptime: 7d13h4m34s bge0: link DOWN pcib1: wake_prep disabled wake for \_SB_.PCI0.P0P1 (S5) unknown: wake_prep disabled wake for \_SB_.PCI0.P0P1.P1P2.SL2X (S5) unknown: wake_prep disabled wake for \_SB_.PCI0.P0P1.P1P2.SL3X (S5) unknown: wake_prep disabled wake for \_SB_.PCI0.USB0 (S5) unknown: wake_prep disabled wake for \_SB_.PCI0.USB1 (S5) unknown: wake_prep disabled wake for \_SB_.PCI0.USB2 (S5) atkbdc0: wake_prep disabled wake for \_SB_.PCI0.SBRG.PS2K (S5) psmcpnp0: wake_prep disabled wake for \_SB_.PCI0.SBRG.PS2M (S5) pcib3: wake_prep disabled wake for \_SB_.PCI0.BR14 (S5) unknown: wake_prep disabled wake for \_SB_.PCI0.BR14.SL4X (S5) pcib4: wake_prep disabled wake for \_SB_.PCI0.BR1E (S5) bge0: wake_prep disabled wake for \_SB_.PCI0.BR1E.GBE1 (S5) bge1: wake_prep disabled wake for \_SB_.PCI0.BR1E.GBE2 (S5) pcib5: wake_prep disabled wake for \_SB_.PCI0.BR28 (S5) pcib8: wake_prep disabled wake for \_SB_.PCI0.BR32 (S5) pcib9: wake_prep disabled wake for \_SB_.PCI0.BR3C (S5) unknown: wake_prep disabled wake for \_SB_.PCI0.SL1X (S5) unknown: wake_prep disabled wake for \_SB_.PCI0.MBE1 (S5) aac0: shutting down controller...done Rebooting... cpu_reset: Stopping other CPUs 8.2-PRE : # reboot Dec 10 10:18:21 siamesetwins reboot: rebooted by root Dec 10 10:18:21 siamesetwins syslogd: exiting on signal 15 Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop...Syncing disks, vnodes remaining...1 1 0 1 0 0 0 0 done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done All buffers synced. lock order reversal: 1st 0xff004b2747e8 ufs (ufs) @ /raid1/bsd/8/src/sys/kern/vfs_mount.c:1204 2nd 0xff004b27e308 syncer (syncer) @ /raid1/bsd/8/src/sys/kern/vfs_subr.c:2231 KDB: stack backtrace: db_trace_self_wrapper() at 0x801d623a = db_trace_self_wrapper+0x2a kdb_backtrace() at 0x802e4d27 = kdb_backtrace+0x37 _witness_debugger() at 0x802f8645 = _witness_debugger+0x65 witness_checkorder() at 0x802f98f3 = witness_checkorder+0x833 __lockmgr_args() at 0x8029cd05 = __lockmgr_args+0xd75 vop_stdlock() at 0x803386c9 = vop_stdlock+0x39 VOP_LOCK1_APV() at 0x8054a38b = VOP_LOCK1_APV+0x9b _vn_lock() at 0x80355308 = _vn_lock+0x68 vputx() at 0x8034b595 = vputx+0x315 dounmount() at 0x80340adb = dounmount+0x2ab vfs_unmountall() at 0x8034851c = vfs_unmountall+0x4c boot() at 0x802b2fd6 = boot+0x7b6 reboot() at 0x802b32f8 = reboot+0x68 syscallenter() at 0x802f190f = syscallenter+0xef syscall() at 0x804fc230 = syscall+0x60 Xfast_syscall() at 0x804e4312 = Xfast_syscall+0xe2 --- syscall (55, FreeBSD ELF64, reboot), rip = 0x80078db3c, rsp = 0x7fffecf8, rbp = 0 --- lock order reversal: 1st 0xff004b2747e8 ufs (ufs) @ /raid1/bsd/8/src/sys/kern/vfs_mount.c:1204 2nd 0xff0007faf7e8 devfs (devfs) @ /raid1/bsd/8/src/sys/ufs/ffs/ffs_vfsops.c:1244 KDB: stack backtrace: db_trace_self_wrapper() at 0x801d623a = db_trace_self_wrapper+0x2a kdb_backtrace() at 0x802e4d27 = kdb_backtrace+0x37 _witness_debugger() at 0x802f8645 = _witness_debugger+0x65 witness_checkorder() at 0x802f98f3 = witness_checkorder+0x833 __lockmgr_args() at 0x8029cd05 = __lockmgr_args+0xd75 vop_stdlock() at 0x803386c9 = vop_stdlock+0x39 VOP_LOCK1_APV() at 0x8054a38b = VOP_LOCK1_APV+0x9b _vn_lock() at 0x80355308 = _vn_lock+0x68 ffs_flushfiles() at 0x804a29e5 = ffs_flushfiles+0xc5 ffs_unmount() at 0x804a33ec = ffs_unmount+0x6c dounmount() at 0x80340b16 = dounmount+0x2e6 vfs_unmountall() at 0x8034851c = vfs_unmountall+0x4c boot() at 0x802b2fd6 = boot+0x7b6 reboot() at 0x802b32f8 = reboot+0x68 syscallenter() at 0x802f190f = syscallenter+0xef syscall() at 0x804fc230 = syscall+0x60 Xfast_syscall() at 0x804e4312 = Xfast_syscall+0xe2 --- syscall (55, FreeBSD ELF64, reboot), rip = 0x80078db3c, rsp = 0x7fffecf8, rbp = 0 --- Swap device
Re: 7.1-PRERELEASE: asus M3A / Phenom X4 / powerd freeze
Jung-uk Kim j...@freebsd.org writes: On Friday 12 December 2008 04:26 pm, Jung-uk Kim wrote: On Friday 12 December 2008 03:36 pm, Arno J. Klaassen wrote: cpghost cpgh...@cordula.ws writes: On Fri, Dec 12, 2008 at 12:01:29AM +0100, Arno J. Klaassen wrote: yet another powerd SOS : on an ASUS M3A78-EM MB with Phenom 9750 and 8 gig memory, starting powerd freezes the box after slowing down a bit cpu frequency. (... snip ...) I forgot there is a PR with the latest driver: http://www.freebsd.org/cgi/query-pr.cgi?pr=128575 yes, this works better : kldload cpufreq : hwpstate0: Cool`n'Quiet 2.0 on cpu0 hwpstate0: SVI mode hwpstate0: you have 2 P-state. hwpstate0: freq=2400MHz volts=1300mV hwpstate0: freq=1200MHz volts=1050mV hwpstate0: Now P0-state. hwpstate1: Cool`n'Quiet 2.0 on cpu1 hwpstate1: SVI mode hwpstate1: you have 2 P-state. hwpstate1: freq=2400MHz volts=1300mV hwpstate1: freq=1200MHz volts=1050mV hwpstate1: Now P0-state. hwpstate2: Cool`n'Quiet 2.0 on cpu2 hwpstate2: SVI mode hwpstate2: you have 2 P-state. hwpstate2: freq=2400MHz volts=1300mV hwpstate2: freq=1200MHz volts=1050mV hwpstate2: Now P0-state. hwpstate3: Cool`n'Quiet 2.0 on cpu3 hwpstate3: SVI mode hwpstate3: you have 2 P-state. hwpstate3: freq=2400MHz volts=1300mV hwpstate3: freq=1200MHz volts=1050mV hwpstate3: Now P0-state. however, I need to disable acpi_throttle; standard, I get : [r...@m34 ~]# sysctl dev.cpu.0.freq_levels dev.cpu.0.freq_levels: 2398/-1 2098/-1 1798/-1 1498/-1 1199/-1 899/-1 599/-1 299/-1 [r...@m34 ~]# kldload cpufreq [r...@m34 ~]# sysctl dev.cpu.0.freq_levels dev.cpu.0.freq_levels: 2400/-1 2100/-1 1800/-1 1500/-1 1200/-1 1050/-1 900/-1 750/-1 600/-1 450/-1 300/-1 150/-1 [r...@m34 ~]# powerd -v powerd: unable to determine AC line status idle time 90%, decreasing clock speed from 2398 MHz to -1 MHz powerd: error setting CPU frequency -1: Invalid argument idle time 90%, decreasing clock speed from 2398 MHz to -1 MHz powerd: error setting CPU frequency -1: Invalid argument rebooting with hint.acpi_throttle.0.disabled=1 gives : [r...@m34 ~]# sysctl dev.cpu.0.freq_levels sysctl: unknown oid 'dev.cpu.0.freq_levels' [r...@m34 ~]# kldload cpufreq [r...@m34 ~]# sysctl dev.cpu.0.freq_levels dev.cpu.0.freq_levels: 2400/-1 1200/-1 [r...@m34 ~]# powerd -v powerd: unable to determine AC line status idle time 90%, decreasing clock speed from 2400 MHz to 1200 MHz Thanx, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 7.1-PRERELEASE: asus M3A / Phenom X4 / powerd freeze
cpghost cpgh...@cordula.ws writes: On Fri, Dec 12, 2008 at 12:01:29AM +0100, Arno J. Klaassen wrote: yet another powerd SOS : on an ASUS M3A78-EM MB with Phenom 9750 and 8 gig memory, starting powerd freezes the box after slowing down a bit cpu frequency. (... snip ...) dev.cpu.0.freq_levels: 2398/-1 2098/-1 1798/-1 1498/-1 1199/-1 899/-1 599/-1 299/-1 further : - I set debug.cpufreq.lowest superior to 1500 : system remains up but only when pushing really slightly - I set debug.cpufreq.lowest inferior to 1100 : freeze garantueed Same here. Running with debug.cpufreq.lowest=1240 in /boot/loader.conf to prevent freezes. This is a FreeBSD 7.1-PRERELEASE #0: Sat Nov 8 14:18:05 CET 2008 r...@textbox:/usr/obj/usr/src/sys/GENERIC running in amd64 and i386 mode with ACPI enabled (default): CPU: AMD Phenom(tm) 9350e Quad-Core Processor (2000.08-MHz K8-class CPU) Origin = AuthenticAMD Id = 0x100f23 Stepping = 3 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR, PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x802009SSE3,MON,CX16,b23 AMD Features=0xee500800SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM, 3DNow!+,3DNow! AMD Features2=0x7ffLAHF,CMP,SVM,ExtAPIC,CR8,b5,b6,b7, Prefetch,b9,b10 Cores per package: 4 using an MSI board with SB600 chipset and newest BIOS. No idea why the system freezes below approx 1200 MHz. But apparently, this bug is quite common and affects a lot of systems with Phenoms. :( do Phenoms not support powernow? I am a bit puzzled by the differnce with two X2 boards I have around here : FreeBSD 7.1-PRERELEASE #0: Tue Dec 2 20:09:28 ... CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ (2992.52-MHz K8-class CPU) Origin = AuthenticAMD Id = 0x40f33 Stepping = 3 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x2001SSE3,CX16 AMD Features=0xea500800SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow! AMD Features2=0x1fLAHF,CMP,SVM,ExtAPIC,CR8 Cores per package: 2 ... cpu0: ACPI CPU on acpi0 powernow0: PowerNow! K8 on cpu0 cpu1: ACPI CPU on acpi0 powernow1: PowerNow! K8 on cpu1 FreeBSD 7.1-PRERELEASE #1: Mon Nov 17 14:40:26 ... CPU: AMD Turion(tm) 64 X2 Mobile Technology TL-62 (2109.70-MHz K8-class CPU) Origin = AuthenticAMD Id = 0x60f82 Stepping = 2 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x2001SSE3,CX16 AMD Features=0xea500800SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow! AMD Features2=0x11fLAHF,CMP,SVM,ExtAPIC,CR8,Prefetch Cores per package: 2 ... cpu0: ACPI CPU on acpi0 acpi_throttle0: ACPI CPU Throttling on cpu0 powernow0: PowerNow! K8 on cpu0 cpu1: ACPI CPU on acpi0 acpi_throttle1: ACPI CPU Throttling on cpu1 acpi_throttle1: failed to attach P_CNT device_attach: acpi_throttle1 attach returned 6 powernow1: PowerNow! K8 on cpu1 whereas the Phenom says : CPU: AMD Phenom(tm) 9750 Quad-Core Processor (2410.66-MHz K8-class CPU) Origin = AuthenticAMD Id = 0x100f23 Stepping = 3 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x802009SSE3,MON,CX16,b23 AMD Features=0xee500800SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow AMD Features2=0x7ffLAHF,CMP,SVM,ExtAPIC,CR8,b5,b6,b7,Prefetch,b9,b1 ... cpu0: ACPI CPU on acpi0 acpi_throttle0: ACPI CPU Throttling on cpu0 cpu1: ACPI CPU on acpi0 cpu2: ACPI CPU on acpi0 cpu3: ACPI CPU on acpi0 my conclusion : acpi_throttle attaches a X4 (why not) and not at X2 (thought the Turion seems to detect it but fails to attach), powernow does not seem to attach to X4 ... Best regards, Arno - I define hint.acpi_throttle.0.disabled=1 in loader.conf then no dev.cpu.0.freq is showing up ... (as if only acpi_throttle is attaching and not powernow) Let me know what I can test further. Best, Arno Regards, -cpghost. -- Cordula's Web. http://www.cordula.ws/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 7.1-PRERELEASE: asus M3A / Phenom X4 / powerd freeze
Jung-uk Kim j...@freebsd.org writes: On Friday 12 December 2008 04:26 pm, Jung-uk Kim wrote: On Friday 12 December 2008 03:36 pm, Arno J. Klaassen wrote: cpghost cpgh...@cordula.ws writes: On Fri, Dec 12, 2008 at 12:01:29AM +0100, Arno J. Klaassen wrote: [ .. stuff deleted .. ] do Phenoms not support powernow? [SNIP] Phenom is 10H family processor and it has Cool`n'Quiet 2.0. Someone wrote a driver for it and it was posted on freebsd-current in September: http://lists.freebsd.org/pipermail/freebsd-current/2008-September/0 88330.html http://lists.freebsd.org/pipermail/freebsd-current/2008-September/0 88803.html http://lists.freebsd.org/pipermail/freebsd-current/2008-September/0 88806.html I forgot there is a PR with the latest driver: http://www.freebsd.org/cgi/query-pr.cgi?pr=128575 ah, I see. Thank you very much. I'll give it a try this WE Best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
7.1-PRERELEASE: asus M3A / Phenom X4 / powerd freeze
hello, yet another powerd SOS : on an ASUS M3A78-EM MB with Phenom 9750 and 8 gig memory, starting powerd freezes the box after slowing down a bit cpu frequency. [IMHO] usefull bit of info : FreeBSD m34.scito.local 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Thu Dec 11 14:24:39 CET 2008 r...@m34.scito.local:/usr/obj/raid1/bsd/src7/sys/M3A78-EM amd64 CPU: AMD Phenom(tm) 9750 Quad-Core Processor (2410.66-MHz K8-class CPU) Origin = AuthenticAMD Id = 0x100f23 Stepping = 3 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x802009SSE3,MON,CX16,b23 AMD Features=0xee500800SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow! AMD Features2=0x7ffLAHF,CMP,SVM,ExtAPIC,CR8,b5,b6,b7,Prefetch,b9,b10 Cores per package: 4 usable memory = 8547172352 (8151 MB) avail memory = 8268722176 (7885 MB) ACPI APIC Table: 102408 APIC2239 FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ioapic0 Version 2.1 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: 102408 XSDT2239 on motherboard ... cpu0: ACPI CPU on acpi0 acpi_throttle0: ACPI CPU Throttling on cpu0 cpu1: ACPI CPU on acpi0 cpu2: ACPI CPU on acpi0 cpu3: ACPI CPU on acpi0 acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on acpi0 Timecounter HPET frequency 14318180 Hz quality 900 dev.cpu.0.freq_levels: 2398/-1 2098/-1 1798/-1 1498/-1 1199/-1 899/-1 599/-1 299/-1 further : - I set debug.cpufreq.lowest superior to 1500 : system remains up but only when pushing really slightly - I set debug.cpufreq.lowest inferior to 1100 : freeze garantueed - I define hint.acpi_throttle.0.disabled=1 in loader.conf then no dev.cpu.0.freq is showing up ... (as if only acpi_throttle is attaching and not powernow) Let me know what I can test further. Best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 7.1-PRERELEASE : bad network performance (nfe0)
Robert Watson [EMAIL PROTECTED] writes: On Mon, 29 Sep 2008, Arno J. Klaassen wrote: However, the request/respones tests are awfull for my notebook (test repeated on the notebook for the sake of conviction) : Is it possible to rerun these tests with a 7.0 kernel of the same general configuration? That would help us determine if it's a regression between 7.0 and 7.1, 7.0-RELEASE-p4 kernel (and 7.1 world) as well as 7.0-RELEASE life-cd give same results : great streaming, very poor request/response or perhaps a more general issue between 6.x and 7.x. nve(4) does not recognise this chip. If someone does have a bootable 6-stable .iso with a backported nfe(4) ... or email if_nfe.ko to me and I will tes under 6-stable For now I will test the patches Pyun and Luigi sent me and let you know. Best, arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 7.1-PRERELEASE : bad network performance (nfe0)
Dear Pyun, thanx for your prompt answer (as usual). Pyun YongHyeon [EMAIL PROTECTED] writes: On Sat, Sep 27, 2008 at 11:21:00PM +0200, Arno J. Klaassen wrote: Hello, I've serious network performance problems on a HP Turion X2 based brand new notebook; I only used a 7-1Beta CD and 7-STABLE on this thing. Scp-ing ports.tgz from a rock-stable 7-STABLE server to it gives : # scp -p ports.tgz [EMAIL PROTECTED]:/tmp/ ports.tgz 100% 98MB 88.7KB/s 18:49 (doing the same thing by copy from an nfs-mounted disk even takes mores than an hour ...) Doing a top(1) aside, just shows the box 100% idle : PID USERNAME PRI NICE SIZERES STATE C TIME WCPU COMMAND 12 root 171 ki31 0K16K CPU0 0 38:55 100.00% idle: cpu0 11 root 171 ki31 0K16K RUN1 38:55 100.00% idle: cpu1 13 root -32- 0K16K WAIT 0 0:02 0.00% swi4: clock sio 29 root -68- 0K16K - 0 0:00 0.00% nfe0 taskq 34 root -64- 0K16K WAIT 1 0:00 0.00% irq23: atapci1 1853 root 80 7060K 1920K wait 0 0:00 0.00% sh 878 nono 440 8112K 2288K CPU1 1 0:00 0.00% top 884 root 8- 0K16K - 1 0:00 0.00% nfsiod 0 4 root -8- 0K16K - 1 0:00 0.00% g_down 16 root -16- 0K16K - 1 0:00 0.00% yarrow 46 root 20- 0K16K syncer 0 0:00 0.00% syncer 3 root -8- 0K16K - 0 0:00 0.00% g_up 30 root -68- 0K16K - 0 0:00 0.00% fw0_taskq I tested : Update Bios ULE /4BSD PREEMPTION on/off PREEMPTION + IPI_PREEMPTION hw.nfe.msi[x]_disable=1 ^^^ This has no effect as MCP65 lacks MSI/MSI-X capability. All don't seem to matter to the problem. I put two tcpdumps (server and client during another scp(1) ) on http://bare.snv.jussieu.fr/temp/tcpdump-s1518.server http://bare.snv.jussieu.fr/temp/tcpdump-s1518.client I'm far from an expert on TCP/IP, but wireshark expert info shows lots of sequences like : TCP Previous segment lost TCP Duplicate ACK 1 TCP Window update TCP Duplicate ACK 2 TCP Duplicate ACK 3 TCP Duplicate ACK 4 TCP Duplicate ACK 5 TCP Fast retransmission (suspected) TCP ... TCP Out-of-Order segment TCP ... As usual, feel free to contact me for further info/tests. AFAIK it seems that you're the first one that reports poor performance issue of MCP65. someone must be ;) no kiddin, I am not convinced this is (only) a driver issue (cf. bad NFS/UDP performance thread on -hackers). I just have no experience on this notebook, so I can't say it worked great before and my only other 7-stable-amd64 I have does not show the probs, having a cheap re0 *and* being UP. MCP65 has no checksum offload/TSO capability so nfe(4) never try to take advantage of the hardware capability. So you should have no checksum offload/TSO related issue here. Also note, checking network performance with scp(1) wouldn't show real numbers as scp(1) may involve other system activities. Use one of network benchmark programs in ports(e.g. benchmarks/netperf) to measure network performance. quite funny (even taken with lots of salt since the LAN is used for normal work as well in parallel, but differences are rather significant) : I test to same server (7-stable-amd64 from Jun 7 (using nfe0 as well btw, but another chip), either from a 6-stable-x86 (Jul 14, sk0) or the notebook (7-stable-x64 below), using for i in SOME-TESTS ; do echo $i; /usr/local/bin/netperf -H push -i 4,2 -I 95,10 -t $i; echo; done streaming results are OK for both : TCP_STREAM Throughput 10^6bits/sec 6-stable-x86 349.57 7-stable-x64 939.47 UDP_STREAM Throughput 10^6bits/sec 6-stable-x86 388.45 7-stable-x64 947.89 However, the request/respones tests are awfull for my notebook (test repeated on the notebook for the sake of conviction) : TCP_RR Trans. Rate per sec 6-stable-x86 9801.58 7-stable-x64 137.61 7-stable-x6489.35 7-stable-x64 102.29 TCP_CRR Trans. Rate per sec 6-stable-x86 4520.98 7-stable-x64 7.00 7-stable-x64 8.10 7-stable-x6418.49 UDP_RR Trans. Rate per sec 6-stable-x86 9473.20 7-stable-x64 9.60 7-stable-x64 0.90 7-stable-x64 0.10 I can send you complete results if wanted. Other possible cause of issue could be link
7.1-PRERELEASE : bad network performance (nfe0)
Hello, I've serious network performance problems on a HP Turion X2 based brand new notebook; I only used a 7-1Beta CD and 7-STABLE on this thing. Scp-ing ports.tgz from a rock-stable 7-STABLE server to it gives : # scp -p ports.tgz [EMAIL PROTECTED]:/tmp/ ports.tgz 100% 98MB 88.7KB/s 18:49 (doing the same thing by copy from an nfs-mounted disk even takes mores than an hour ...) Doing a top(1) aside, just shows the box 100% idle : PID USERNAME PRI NICE SIZERES STATE C TIME WCPU COMMAND 12 root 171 ki31 0K16K CPU0 0 38:55 100.00% idle: cpu0 11 root 171 ki31 0K16K RUN1 38:55 100.00% idle: cpu1 13 root -32- 0K16K WAIT 0 0:02 0.00% swi4: clock sio 29 root -68- 0K16K - 0 0:00 0.00% nfe0 taskq 34 root -64- 0K16K WAIT 1 0:00 0.00% irq23: atapci1 1853 root 80 7060K 1920K wait 0 0:00 0.00% sh 878 nono 440 8112K 2288K CPU1 1 0:00 0.00% top 884 root 8- 0K16K - 1 0:00 0.00% nfsiod 0 4 root -8- 0K16K - 1 0:00 0.00% g_down 16 root -16- 0K16K - 1 0:00 0.00% yarrow 46 root 20- 0K16K syncer 0 0:00 0.00% syncer 3 root -8- 0K16K - 0 0:00 0.00% g_up 30 root -68- 0K16K - 0 0:00 0.00% fw0_taskq I tested : Update Bios ULE /4BSD PREEMPTION on/off PREEMPTION + IPI_PREEMPTION hw.nfe.msi[x]_disable=1 All don't seem to matter to the problem. I put two tcpdumps (server and client during another scp(1) ) on http://bare.snv.jussieu.fr/temp/tcpdump-s1518.server http://bare.snv.jussieu.fr/temp/tcpdump-s1518.client I'm far from an expert on TCP/IP, but wireshark expert info shows lots of sequences like : TCP Previous segment lost TCP Duplicate ACK 1 TCP Window update TCP Duplicate ACK 2 TCP Duplicate ACK 3 TCP Duplicate ACK 4 TCP Duplicate ACK 5 TCP Fast retransmission (suspected) TCP ... TCP Out-of-Order segment TCP ... As usual, feel free to contact me for further info/tests. Thanx, Arno # uname -a FreeBSD mv 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Fri Sep 26 15:06:07 CEST 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PAVILLON amd64 # pciconf -lcv (bits) [EMAIL PROTECTED]:0:6:0:class=0x02 card=0x30cf103c chip=0x045010de rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp' device = 'MCP65 Ethernet' class = network subclass = ethernet cap 01[44] = powerspec 2 supports D0 D1 D2 D3 current D0 # dmesg -a Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.1-PRERELEASE #0: Fri Sep 26 15:06:07 CEST 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PAVILLON Timecounter i8254 frequency 1193250 Hz quality 0 CPU: AMD Turion(tm) 64 X2 Mobile Technology TL-62 (2109.70-MHz K8-class CPU) Origin = AuthenticAMD Id = 0x60f82 Stepping = 2 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x2001SSE3,CX16 AMD Features=0xea500800SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow! AMD Features2=0x11fLAHF,CMP,SVM,ExtAPIC,CR8,Prefetch Cores per package: 2 usable memory = 3210813440 (3062 MB) avail memory = 3104542720 (2960 MB) ACPI APIC Table: HP APIC FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 Version 1.1 irqs 0-23 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: HPQOEM SLIC-MPC on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) ACPI Error (dsopcode-0671): Field [I9MN] at 544 exceeds Buffer [IORT] size 464 (bits) [20070320] ACPI Error (psparse-0626): Method parse/execution failed [\\_SB_.PCI0.LPC0.PMIO._CRS] (Node 0xff00011f50a0), AE_AML_BUFFER_LIMIT ACPI Error (uteval-0309): Method execution failed [\\_SB_.PCI0.LPC0.PMIO._CRS] (Node 0xff00011f50a0), AE_AML_BUFFER_LIMIT can't fetch resources for \\_SB_.PCI0.LPC0.PMIO - AE_AML_BUFFER_LIMIT Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0 acpi_ec0: Embedded Controller: GPE 0x10 port 0x62,0x66 on acpi0 acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on acpi0 Timecounter HPET frequency 2500 Hz quality 900 acpi_acad0: AC Adapter on acpi0 battery0: ACPI Control Method Battery on acpi0 acpi_lid0: Control Method Lid Switch on acpi0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 pci0: memory, RAM at device 0.0 (no driver attached) isab0: PCI-ISA bridge port
cpufreq for Opteron quad-core (2354)
Hello, apparently powernow on Opteron quad-core is not recognised; when I kldload cpufreq (leaving it out of kernel) I get : pci0: driver added pci1: driver added pci2: driver added pci3: driver added pci4: driver added pci5: driver added pci6: driver added found- vendor=0x9005, dev=0x0285, revid=0x00 domain=0, bus=6, slot=14, func=0 class=01-04-00, hdrtype=0x00, mfdev=0 cmdreg=0x0196, statreg=0x0230, cachelnsz=16 (dwords) lattimer=0xf8 (7440 ns), mingnt=0x01 (250 ns), maxlat=0x01 (250 ns) intpin=a, irq=28 powerspec 2 supports D0 D1 D3 current D0 MSI supports 2 messages, 64 bit pci0:6:14:0: reprobing on driver added pci7: driver added pci8: driver added pci9: driver added pci10: driver added but no dev.cpu.0.freq* showing up. When I dig up the by me so beloved good old acpi_ppc it says : cpu0: Px state: P0, 2200MHz, 28000mW, 19us, 19us cpu0: Px state: P1, 2000MHz, 26250mW, 19us, 19us cpu0: Px state: P2, 1700MHz, 23750mW, 19us, 19us cpu0: Px state: P3, 1400MHz, 21250mW, 19us, 19us cpu0: Px state: P4, 1100MHz, 18750mW, 19us, 19us cpu0: Px method: Unknown, disabled This box will probably stay at my office for a while and I'd be glad to provide more information. Best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
[nvidia | shared irq] umass disconnects [was: panic dd-ing from a USB disk ]
Mikhail Teterin [EMAIL PROTECTED] writes: Hello! I had some troubles mounting the filesystem from: da0 at umass-sim0 bus 0 target 0 lun 0 da0: MATSHITA DMC-FX12 0100 Removable Direct Access SCSI-2 device da0: 1.000MB/s transfers da0: 3886MB (7959552 512 byte sectors: 255H 63S/T 495C) and decided to just dd the entire da0 to a file, so that the camera can be disconnected: dd if=/dev/da0 of=/home/mi/da0.dd bs=16384 The dd-ing was proceeding slowly (600Kb/s) and I stopped watching it... The machine paniced about an hour later (at 0:52). The timestamp on /home/mi/da0.dd was 23:45, it was only about 500Mb in size. The stack is below. Would anybody like to look at the complete vmcore dump? The hardware is a quad Opteron with 8Gb RAM. Only 4Gb of these are used, because it runs 7.x/i386 from April 5th (without PAE) -- for the sake of NVidia's card. I can easily produce a similar panic on a dual Opteron 185 with 3G of RAM and running 7-stable-amd64 on a (cheap) nvidia-based MB. It runs gmirror on atapci1 and I attach a geli-encrypted disk via usb. Both share irq 23. Under heavy load (periodic security is enough ) it panics after having disconnected umass0 ( kgdb trace below ) : Unread portion of the kernel message buffer: umass0: at uhub1 port 1 (addr 2) disconnected (da1:umass-sim0:0:0:0): lost device (pass1:umass-sim0:0:0:0): lost device (pass1:umass-sim0:0:0:0): removing device entry I'd be happy to provide more info. Best, Arno Please, advise. Thanks! -mi [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd. There is no member named pathname. Reading symbols from /boot/modules/nvidia.ko...done. Loaded symbols for /boot/modules/nvidia.ko Reading symbols from /opt/modules/fuse.ko...done. Loaded symbols for /opt/modules/fuse.ko Unread portion of the kernel message buffer: umass0: at uhub0 port 6 (addr 2) disconnected (da0:umass-sim0:0:0:0): lost device Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x0 fault code= supervisor write, page not present instruction pointer = 0x20:0xc0449702 stack pointer = 0x28:0xeb74b8bc frame pointer = 0x28:0xeb74b8dc code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 13989 (dd) trap number = 12 panic: page fault cpuid = 3 Uptime: 12d10h52m16s (da0:dead_sim0:0:0:0): Synchronize cache failed, status == 0x34, scsi status == 0xc8 Physical memory: 3054 MB Dumping 334 MB: (CTRL-C to abort) (CTRL-C to abort) (CTRL-C to abort) (CTRL-C to abort) (CTRL-C to abort) (CTRL-C to abort) (CTRL-C to abort) (CTRL-C to abort) 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:195 195 __asm __volatile(movl %%fs:0,%0 : =r (td)); (kgdb) #0 doadump () at pcpu.h:195 #1 0xc0599f7b in boot (howto=260) at /ibm/src/sys/kern/kern_shutdown.c:418 #2 0xc059a449 in panic (fmt=0x104 Address 0x104 out of bounds) at /ibm/src/sys/kern/kern_shutdown.c:572 #3 0xc077f60d in trap_fatal (frame=0xeb74b87c, eva=40) at /ibm/src/sys/i386/i386/trap.c:899 #4 0xc077f9aa in trap_pfault (frame=0xeb74b87c, usermode=0, eva=0) at /ibm/src/sys/i386/i386/trap.c:812 #5 0xc078035c in trap (frame=0xeb74b87c) at /ibm/src/sys/i386/i386/trap.c:490 #6 0xc076637b in calltrap () at /ibm/src/sys/i386/i386/exception.s:139 #7 0xc0449702 in xpt_done (done_ccb=0xc690a000) at /ibm/src/sys/cam/cam_xpt.c:4856 #8 0xc044b15c in xpt_action (start_ccb=0xc690a000) at /ibm/src/sys/cam/cam_xpt.c:3057 #9 0xc04462b6 in cam_periph_runccb (ccb=0xc690a000, error_routine=0, camflags=CAM_FLAG_NONE, sense_flags=1, ds=0xc6aea690) at /ibm/src/sys/cam/cam_periph.c:878 #10 0xc0453aa1 in daclose (dp=0xcc862600) at /ibm/src/sys/cam/scsi/scsi_da.c:714 #11 0xc0549b2e in g_disk_access (pp=0xc7e12680, r=0, w=0, e=Variable e is not available.) at /ibm/src/sys/geom/geom_disk.c:152 #12 0xc054ec4d in g_access (cp=0xc8a90380, dcr=-1, dcw=0, dce=0) at /ibm/src/sys/geom/geom_subr.c:748 #13 0xc05490f3 in g_dev_close (dev=0xca1dad00, flags=Variable flags is not available.) at /ibm/src/sys/geom/geom_dev.c:217 #14 0xc0531f69 in devfs_close (ap=0xeb74ba94) at /ibm/src/sys/fs/devfs/devfs_vnops.c:372 #15 0xc0623e86 in
nfs buildworld blocked by rpc.lockd ?
Hello, my buildworld on a 7-stable-amd64 blocks on the following line : TERM=dumb TERMCAP=dumb: ex - /files/bsd/src7/share/termcap/termcap.src /files/bsd/src7/share/termcap/reorder ex(1) stays in lockd state, and is unkillable, either by Ctl-C or kill -9 /files/bsd is nfs-mounted as follows : push:/raid1/bsd/files/bsd nfs rw,bg,soft,nfsv3,intr,noconn,noauto,-r=32768,-w=32768 0 0 I can provide tcpdumps on server and client if helpful. Thanx, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfs-server silent data corruption
Hello, [ .. stuff deleted .. ] I have recompiled the kernel with ULE, and it seems fine as well. I ran 160 iterations of a 300MB file and there was no corruption. Same process - copy a junk random file over nfs mount, unmount the nfs mount, remount it copy it back, compare the files. Let me summarise my investigations till now : [ .. more stuff deleted .. ] - it does *not* seem to depend on : - the interface : I could produce it using nfe0, nfe1 and re0 using some netgear pci-card - the distribution of the 4Gig memory : installing 4G at CPU1 or 1G at CPU1 and 2G at CPU2 produces same results (NB, all memory passed memtest.iso in both situtations for complete run) - the frequency control method : easier to produce with cpufreq/powerd, but finally I can reproduce the cooruption as well using acpi_ppc - the nfs-client and options (not exhaustively tested, but different test include i386-releng6, amd64-releng6 and linux, and quite a set of different try and see mounf_nfs options I am testing right now with a fixed frequency of 1Ghz. I cannot reproduce it at fixed cpu-frequency with cpufreq loaded (I ran my test for three days without prob, normally a couple of hours was enough). But I looked again at the corrupted copies : # for i in raid5/xps/SAVE/1 raid5/pxe/SAVE/1 raid5/pxe/SAVE/2 raid5/pxe/SAVE/3 raid5/blockhead/SAVE/1 scsi/pxe/SAVE/1 scsi/blockhead/SAVE/1 scsi/blockhead/SAVE/2 scsi/blockhead/SAVE/3 scsi/blockhead/SAVE/4; do ls -l $i/BIG; cmp -x $i/BIG $i/BIG2; echo; done -rw-r--r-- 1 root wheel 144703488 Apr 26 16:06 raid5/xps/SAVE/1/BIG 004fd908 18 00 02c9e6c8 11 00 034ab6c8 90 00 037e4648 09 00 039e85c8 91 01 04484408 00 09 06115cc8 00 81 06e5d148 01 91 07016048 18 00 074307c8 08 19 07aa45c8 29 20 080bfb88 00 11 -rw-r--r-- 1 root wheel 144703488 Apr 20 14:07 raid5/pxe/SAVE/1/BIG 03869a48 09 00 -rw-r--r-- 1 root wheel 144703488 Apr 20 14:47 raid5/pxe/SAVE/2/BIG 05209d88 09 00 -rw-r--r-- 1 root wheel 39845888 Apr 20 15:17 raid5/pxe/SAVE/3/BIG 01777148 09 00 -rw-r--r-- 1 root wheel 144703488 Apr 20 14:54 raid5/blockhead/SAVE/1/BIG 00f10f88 09 00 -rw-r--r-- 1 root wheel 39845888 Apr 20 16:08 scsi/pxe/SAVE/1/BIG 01f4c4c8 11 00 -rw-r--r-- 1 root wheel 144703488 Apr 20 15:38 scsi/blockhead/SAVE/1/BIG 06c3d6c8 11 00 -rw-r--r-- 1 root wheel 144703488 Apr 20 16:11 scsi/blockhead/SAVE/2/BIG 0725ca48 18 00 -rw-r--r-- 1 root wheel 144703488 Apr 20 17:32 scsi/blockhead/SAVE/3/BIG 01608008 09 00 -rw-r--r-- 1 root wheel 144703488 Apr 23 19:26 scsi/blockhead/SAVE/4/BIG 00f3b888 18 00 The output from raid5/xps/SAVE/1/BIG is after installing at a lab with without doubt more sophisticated switches than I use and the first I was able to produce with more that just one byte corrupted, but still with the same pattern : it looks like the position always is 2^3 * 'somethin without power of two' (e.g. factor(hex2dec('00f10f88')) = 2 2 2 809 2441 factor(hex2dec('01f4c4c8')) = 2 2 2 317 12941 ) and the corruption is one out of the following half-byte transitions : 1 - 0 8 - 0 9 - 0 0 - 1 0 - 8 0 - 9 8 - 9 Maybe this gives a hint to someone ... Best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfs-server silent data corruption
Hello, Mike Tancsa [EMAIL PROTECTED] writes: At 02:35 PM 4/22/2008, Arno J. Klaassen wrote: Also, you are using ULE or the 4BSD scheduler ? I still have 4BSD on the box I am testing on. Interesting, this is with ULE. I didn't really test 4BSD on this box (I believed those who said SMP needs ULE *and* am quite satisfied with overall performance). I'll try 4BSD though time is getting short; I promised to deliver this box next thursday but will still have some days for on-site testing. I have recompiled the kernel with ULE, and it seems fine as well. I ran 160 iterations of a 300MB file and there was no corruption. Same process - copy a junk random file over nfs mount, unmount the nfs mount, remount it copy it back, compare the files. Let me summarise my investigations till now : - in all failing cases just *one* byte is currupted, 4 or all 8 bits set to zero *and* the original value is one out of the limited subset {1, 8, 9} here is the output of `cmp -x $i/BIG $i/BIG2` for some failing cases I saved : 03869a48 09 00 05209d88 09 00 01777148 09 00 00f10f88 09 00 01f4c4c8 11 00 06c3d6c8 11 00 0725ca48 18 00 01608008 09 00 00f3b888 18 00 07aa45c8 29 20 - it does *not* seem to depend on : - the interface : I could produce it using nfe0, nfe1 and re0 using some netgear pci-card - the distribution of the 4Gig memory : installing 4G at CPU1 or 1G at CPU1 and 2G at CPU2 produces same results (NB, all memory passed memtest.iso in both situtations for complete run) - the frequency control method : easier to produce with cpufreq/powerd, but finally I can reproduce the cooruption as well using acpi_ppc - the nfs-client and options (not exhaustively tested, but different test include i386-releng6, amd64-releng6 and linux, and quite a set of different try and see mounf_nfs options I am testing right now with a fixed frequency of 1Ghz. I am not so inclined to test 4BSD, since reboot possibilities are limited for me now on this box, but I set up next week a similar board (S3992e) (iff I can find quad-core socket F over here ...) and in a certain sense hope I can reproduce it an that board as well. Best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfs-server silent data corruption
Hello, Mike Tancsa [EMAIL PROTECTED] writes: At 05:57 PM 4/21/2008, Arno J. Klaassen wrote: Hi, How long does it take for the problem to show up ? Less than an hour in general (running the same client script simultanuously on a 100Mbps linux box and 1Gbps bds6-x86) I am running my nic at gig speeds only... I recompiled the kernel this morning to include cpufreq as well as made sure the coolquiet was enabled in the BIOS. for info, I test with args '38 999' (38M, try 999 times) on linux (slightly adapted script BTW) and '138 999' on bsd. The best 'score' I got was 'still 871 iterations to go' So far I have done 150 loops with an 80MB file and no issues and 200 loopswith a 160MB file. My nfe nic does not support MSI and has its own interrupt # vmstat -i interrupt total rate irq1: atkbd0 5 0 irq4: sio0 3049 1 irq16: twe0 327046164 irq19: bge0 385147194 irq21: atapci1976355492 irq23: nfe0 11876726 5986 cpu0: timer 3966420 1999 cpu1: timer 3964392 1998 # vmstat -i interrupt total rate irq1: atkbd0 4 0 irq14: ata0 69 0 irq20: nfe0 11650955 5283 irq24: atapci194 0 irq28: atapci2 178 0 irq29: ahd0 355704161 cpu0: timer 4409020 1999 cpu1: timer 4391646 1991 cpu2: timer 4391643 1991 cpu3: timer 4391641 1991 I have powerd started up with powerd_enable=YES powerd_flags=-a adaptive -b adaptive -n adaptive slightly different, I mostly use -b adaptive -i 90 -n adaptive -r 80 but the problem shows up without flags as well. With the sleep in my test script, powerd does seem to be fiddling with frequencies as well during the inactivity. I most often provoke slight swapping for randomizing frequency changes and a burnK7 or similar to psuh up and down by hand # sysctl dev. | grep -i fre dev.cpu.0.freq: 1800 dev.cpu.0.freq_levels: 2200/11 2000/105600 1800/89100 1000/49000 dev.powernow.0.freq_settings: 2200/11 2000/105600 1800/89100 1000/49000 dev.powernow.1.freq_settings: 2200/11 2000/105600 1800/89100 1000/49000 dev.cpufreq.0.%driver: cpufreq dev.cpufreq.0.%parent: cpu0 dev.cpufreq.1.%driver: cpufreq dev.cpufreq.1.%parent: cpu1 funny, when I do that : # sysctl dev. | grep -i fre dev.cpu.0.freq: 995 dev.cpu.0.freq_levels: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100 dev.powernow.0.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100 dev.powernow.1.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100 dev.powernow.2.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100 dev.powernow.3.freq_settings: 6747/95000 6228/90300 5709/76200 5190/63800 4671/53200 2595/36100 dev.cpufreq.0.%driver: cpufreq dev.cpufreq.0.%parent: cpu0 dev.cpufreq.1.%driver: cpufreq dev.cpufreq.1.%parent: cpu1 dev.cpufreq.2.%driver: cpufreq dev.cpufreq.2.%parent: cpu2 dev.cpufreq.3.%driver: cpufreq dev.cpufreq.3.%parent: cpu3 especially the dev.powernow.3.freq_settings look weird ... that said, I once more dug up the old acpi_ppc.c and slightly adapted it for fbsd7 (basically some name changes and using read_cpu_time() i.s.o. cp_time) and the problem disappears ... the algo of acpi_ppc makes it somewhat harder to push up frequencies, though I doubt that matters. I tried as well with hint.acpi_throttle.0.disabled=1 in loader.conf with no luck (using powerd). I'm out of office tomorrow but will try to find time tommorow evening to test with another NIC. Best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfs-server silent data corruption
Hello, Peter Jeremy [EMAIL PROTECTED] writes: On Mon, Apr 21, 2008 at 08:30:48PM +0200, Arno J. Klaassen wrote: NB, (CC to kris@ for this) why is memtest86 port marked as i386-only? Basically because it's a bootable i386 binary image. yop, but building it could be allowed on more archs (at least amd64 imho) but no hard feelings! just a thought Best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfs-server silent data corruption
Mike Tancsa [EMAIL PROTECTED] writes: At 01:38 PM 4/22/2008, Arno J. Klaassen wrote: I'm out of office tomorrow but will try to find time tommorow evening to test with another NIC. Are you using the latest RELENG_7, or at least the latest version of nfe thats in RELENG_7 ? Think so : # cvs status if_nfe.c === File: if_nfe.c Status: Up-to-date Working revision:1.21.2.5Sat Apr 19 14:27:41 2008 Repository revision: 1.21.2.5/home/ncvs/src/sys/dev/nfe/if_nfe.c,v Sticky Tag: RELENG_7 (branch: 1.21.2) Sticky Date: (none) Sticky Options: (none) ++, Arno PS, finally the memory seems not involved : populating 4G in CPU1 or 2G in CPU1 and 2G in CPU2 does not make a difference ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfs-server silent data corruption
re, Mike Tancsa [EMAIL PROTECTED] writes: At 02:00 PM 4/22/2008, Arno J. Klaassen wrote: Are you using the latest RELENG_7, or at least the latest version of nfe thats in RELENG_7 ? Think so : OK, and it is the latest RELENG_7 ? from saturday (but I didn't see any RELENG_7 commit possibly related to this since) Also, you are using ULE or the 4BSD scheduler ? I still have 4BSD on the box I am testing on. Interesting, this is with ULE. I didn't really test 4BSD on this box (I believed those who said SMP needs ULE *and* am quite satisfied with overall performance). I'll try 4BSD though time is getting short; I promised to deliver this box next thursday but will still have some days for on-site testing. ++, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfs-server silent data corruption
Kris Kennaway [EMAIL PROTECTED] writes: On Mon, Apr 21, 2008 at 01:02:33AM +0200, Arno J. Klaassen wrote: I didn't stress-test this MB for a while, but last time I did was with 7-PRELEASE/RC?/CANTremember-exactly-but-close-to-release and all worked great I did add 2G ECC to the 2nd CPU since, though I doubt that interferes with NFS. Uh, you're getting server-side data corruption, it could definitely be because of the memory you added. yop, though I'm still not convinced the memory is bad (the very same Kingston ECC as the 2*1G in use for about half a year already) : I added it directly to the 2nd CPU (diagram on page 9 of http://www.tyan.com/manuals/m_s2895_101.pdf) and the problem seems to be the interaction between nfe0 and powerd : - if I stop powerd, problems go away - I let run powerd but turn of txcsum and tso4 on the interface, the problem is a lot harder to produce (if ever this gives a hint to anyone) Device is : [EMAIL PROTECTED]:0:10:0: class=0x068000 card=0x289510f1 chip=0x005710de rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp' device = 'nForce4 Ultra NVidia Network Bus Enumerator' class = bridge cap 01[44] = powerspec 2 supports D0 D1 D2 D3 current D0 (this is with the default BIOS setting LAN Bridge Enabled, disabling that setting makes pciconf say class = network but does not influence my problem) I will restart my tests now by populating all 4G to only CPU1 and say whether that matters. Best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfs-server silent data corruption
Hello, Jeremy Chadwick [EMAIL PROTECTED] writes: On Mon, Apr 21, 2008 at 04:52:55PM +0200, Arno J. Klaassen wrote: Kris Kennaway [EMAIL PROTECTED] writes: Uh, you're getting server-side data corruption, it could definitely be because of the memory you added. [ .. stuff deleted; I'll answer in more detail later ..] Can you boot the machine in verbose mode, and put the dmesg up somewhere? attached. More in a moment. Best, Arno Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-STABLE #1: Sun Apr 20 19:17:47 CEST 2008 [EMAIL PROTECTED]:/usr/obj/files/here/bsd/src7/sys/S2895 Preloaded elf kernel /boot/kernel/kernel at 0x807dc000. Preloaded elf obj module /boot/kernel/iicsmb.ko at 0x807dc210. Preloaded elf obj module /boot/kernel/iicbus.ko at 0x807dc738. Preloaded elf obj module /boot/kernel/smbus.ko at 0x807dcbe0. Preloaded elf obj module /boot/kernel/smb.ko at 0x807dd048. Preloaded elf obj module /boot/kernel/nfsmb.ko at 0x807dd4f0. Calibrating clock(s) ... i8254 clock: 1193107 Hz CLK_USE_I8254_CALIBRATION not specified - using default frequency Timecounter i8254 frequency 1193182 Hz quality 0 Calibrating TSC clock ... TSC clock: 2612050515 Hz CPU: Dual Core AMD Opteron(tm) Processor 285 (2612.05-MHz K8-class CPU) Origin = AuthenticAMD Id = 0x20f12 Stepping = 2 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x1SSE3 AMD Features=0xe2500800SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow! AMD Features2=0x3LAHF,CMP Cores per package: 2 L1 2MB data TLB: 8 entries, fully associative L1 2MB instruction TLB: 8 entries, fully associative L1 4KB data TLB: 32 entries, fully associative L1 4KB instruction TLB: 32 entries, fully associative L1 data cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative L1 instruction cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative L2 2MB unified TLB: 0 entries, disabled/not present L2 4KB data TLB: 512 entries, 4-way associative L2 4KB instruction TLB: 512 entries, 4-way associative L2 unified cache: 1024 kbytes, 64 bytes/line, 1 lines/tag, 16-way associative usable memory = 4285255680 (4086 MB) Physical memory chunk(s): 0x1000 - 0x00099fff, 626688 bytes (153 pages) 0x008dc000 - 0x761a5fff, 1972150272 bytes (481482 pages) 0x8000 - 0xaff7, 804782080 bytes (196480 pages) 0x0001 - 0x00014ffe, 1342111744 bytes (327664 pages) avail memory = 4108218368 (3917 MB) ACPI APIC Table: PTLTD APIC INTR: Adding local APIC 1 as a target INTR: Adding local APIC 2 as a target INTR: Adding local APIC 3 as a target FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 APIC: CPU 0 has ACPI ID 0 APIC: CPU 1 has ACPI ID 1 APIC: CPU 2 has ACPI ID 2 APIC: CPU 3 has ACPI ID 3 ULE: setup cpu group 0 ULE: setup cpu 0 ULE: adding cpu 0 to group 0: cpus 1 mask 0x1 ULE: setup cpu group 1 ULE: setup cpu 1 ULE: adding cpu 1 to group 1: cpus 1 mask 0x2 ULE: setup cpu group 2 ULE: setup cpu 2 ULE: adding cpu 2 to group 2: cpus 1 mask 0x4 ULE: setup cpu group 3 ULE: setup cpu 3 ULE: adding cpu 3 to group 3: cpus 1 mask 0x8 ACPI: RSDP @ 0x0xf78c0/0x0014 (v 0 PTLTD ) ACPI: RSDT @ 0x0x7ff8b110/0x003C (v 1 PTLTDRSDT 0x0604 LTP 0x) ACPI: FACP @ 0x0x7ff909c2/0x0074 (v 1 NVIDIA CK8S 0x0604 PTL_ 0x000F4240) ACPI: DSDT @ 0x0x7ff8b14c/0x5876 (v 1 NVIDIA CK8 0x0604 MSFT 0x010E) ACPI: FACS @ 0x0x7ff91fc0/0x0040 ACPI: SPCR @ 0x0x7ff90a36/0x0050 (v 1 PTLTD $UCRTBL$ 0x0604 PTL 0x0001) ACPI: MCFG @ 0x0x7ff90a86/0x003C (v 1 PTLTDMCFG 0x0604 0x) ACPI: APIC @ 0x0x7ff90ac2/0x009E (v 1 PTLTD APIC 0x0604 LTP 0x) ACPI: BOOT @ 0x0x7ff90b60/0x0028 (v 1 PTLTD $SBFTBL$ 0x0604 LTP 0x0001) ACPI: SSDT @ 0x0x7ff90b88/0x0478 (v 1 PTLTD POWERNOW 0x0604 LTP 0x0001) MADT: Found IO APIC ID 4, Interrupt 0 at 0xfec0 ioapic0: Routing external 8259A's - intpin 0 MADT: Found IO APIC ID 5, Interrupt 24 at 0xd000 MADT: Found IO APIC ID 6, Interrupt 28 at 0xd0001000 MADT: Found IO APIC ID 7, Interrupt 32 at 0xd0a0 MADT: Interrupt override: source 9, irq 9 ioapic0: intpin 9 trigger: level ioapic0: intpin 9 polarity: low lapic0: Routing NMI - LINT1 lapic0: LINT1 trigger: edge lapic0: LINT1 polarity: high lapic1: Routing NMI - LINT1 lapic1: LINT1 trigger: edge lapic1: LINT1 polarity: high lapic2: Routing NMI - LINT1 lapic2: LINT1 trigger: edge lapic2: LINT1 polarity: high lapic3: Routing NMI - LINT1 lapic3: LINT1 trigger: edge lapic3: LINT1 polarity
Re: nfs-server silent data corruption
yet another quick partial answer : Jeremy Chadwick [EMAIL PROTECTED] writes: On Mon, Apr 21, 2008 at 04:52:55PM +0200, Arno J. Klaassen wrote: Kris Kennaway [EMAIL PROTECTED] writes: Uh, you're getting server-side data corruption, it could definitely be because of the memory you added. yop, though I'm still not convinced the memory is bad (the very same Kingston ECC as the 2*1G in use for about half a year already) : Can you download and run memtest86 on this system, with the added 2G ECC insalled? memtest86 doesn't guarantee showing signs of memory problems, but in most cases it'll start spewing errors almost immediately. It's running for 15 minutes now without any warning; I'll let it run while cooking a meal [ with 2*1G mem for each CPU to be clear ]. NB, (CC to kris@ for this) why is memtest86 port marked as i386-only? It only seems to install floppy.bin and memtest.iso, but alas (maybe I should leave one box dedicated to freebsd-i386 for things like this ;) ) Best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfs-server silent data corruption
re, Jeremy Chadwick [EMAIL PROTECTED] writes: On Mon, Apr 21, 2008 at 04:52:55PM +0200, Arno J. Klaassen wrote: Kris Kennaway [EMAIL PROTECTED] writes: Uh, you're getting server-side data corruption, it could definitely be because of the memory you added. yop, though I'm still not convinced the memory is bad (the very same Kingston ECC as the 2*1G in use for about half a year already) : Can you download and run memtest86 on this system, with the added 2G ECC insalled? memtest86 doesn't guarantee showing signs of memory problems, but in most cases it'll start spewing errors almost immediately. it finished in a bit less than 3 hours without a single error/warning I feel pretty confident all memory is fine One thing I did notice in the motherboard manual below is something called Hammer Configuration. It appears to default to 800MHz, but there's an Auto choice. Does using Auto fix anything? Nope I added it directly to the 2nd CPU (diagram on page 9 of http://www.tyan.com/manuals/m_s2895_101.pdf) and the problem seems to be the interaction between nfe0 and powerd : That board is the weirdest thing I've seen in years. ;) I agree I lifted (?) my eye-brows the first time I saw that diagram Two separate CPUs using a single (shared) memory controller, two separate (and different!) nVidia chipsets, a SMSC I/O controller probably used for serial and parallel I/O, two separate nVidia NICs with Marvell PHYs (yet somehow you can bridge the two NICs and PHYs?), two separate PCI-e busses (each associated with a separate nVidia chipset), two separate PCI-X busses... the list continues. some may say it's just four wheels, an engine and a steer, she looks different compared to most others I know you don't need opinions at this point, but what a behemoth. I can't imagine that thing running reliably. though it does ;) (till the day I decided she deserved a -stable upgrade and 2 more gigs ...) - if I stop powerd, problems go away This would imply that clock frequency stepping is somehow attributing itself to the corruption. I don't see any BIOS options for controlling things related to AMD's Cool-n-Quiet or PowerNow! feature, which is usually what handles this. you can turn it on/off; anyway, the problem *seems* easy to reproduce when freq drops quickly form 2600Mhz to 1000Mhz I just inspected a few corrupted copies, but out of 10-200Mbytes just 1 byte was 0 iso \t - I let run powerd but turn of txcsum and tso4 on the interface, the problem is a lot harder to produce (if ever this gives a hint to anyone) Possibly shared interrupts are causing problems? don't think so; I first had two Promise TX4 cards in this box iso the Marvell 8port card; since I had problems with TX4 some time ago I first suspected them. The board is still running memtest86, but from the dmesg I posted I don't see a shared irq. MSI/MSI-X doing something odd? Have you tried disabling MSI/MSI-X and see if it makes a difference? MSI is disabled as is PCI-e Error reporting (or something like that) I think you mean MAC LAN Bridge, according to the motherboard manual. I'm not even sure what that really does; somehow trunks the two NICs together to give you the equivalent of 2000mbit of traffic? I don't know. probably; I never tried ;) I need the second NIC for a seperate subnet Does the corruption you see go away if you install a separate NIC (e.g. an Intel NIC) in a PCI or PCI-e slot, and disable the onboard NICs (should be MAC LAN: Disable on both the primary and slave)? Don't have one available right now (for a 2U server). I will test if I do not find another solution. Thanx, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfs-server silent data corruption
Hello, Mike Tancsa [EMAIL PROTECTED] writes: At 10:52 AM 4/21/2008, Arno J. Klaassen wrote: Device is : [EMAIL PROTECTED]:0:10:0: class=0x068000 card=0x289510f1 chip=0x005710de rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp' device = 'nForce4 Ultra NVidia Network Bus Enumerator' class = bridge cap 01[44] = powerspec 2 supports D0 D1 D2 D3 current D0 (this is with the default BIOS setting LAN Bridge Enabled, disabling that setting makes pciconf say class = network but does not influence my problem) I will restart my tests now by populating all 4G to only CPU1 and say whether that matters. Hi, How long does it take for the problem to show up ? Less than an hour in general (running the same client script simultanuously on a 100Mbps linux box and 1Gbps bds6-x86) I have what appears to be a very similar Tyan board (I have an Socket 939 X2 cpu) with the same NIC, but this one is running RELENG_7 from April 17th. There have been a few fixes for the nfe driver since 7.0 I am running this small script below on a nfs client (em nic) against the server (nfe) ( mount options on the client 192.168.245.1:/backup /backup nfs rw,-r=32768,-w=32768,tcp,noauto ) #!/bin/sh i=0 while true do i=`expr $i + 1` dd if=/dev/urandom of=/tmp/junk.txt bs=1024 count=81920 /dev/null 21 cp -p /tmp/junk.txt /backup/ orig=`md5 -q /tmp/junk.txt` umount /backup sleep 2 mount /backup copy=`md5 -q /backup/junk.txt` echo $orig and $copy on $i if [ $orig != $copy ]; then echo \a copy not ok on $i exit 255 fi done quite the same as what I do (apart from the umount/sleep/mount and I use same partition for write and copy) : SIZE=$1 COUNTER=${2:-20} until [ $COUNTER -lt 1 ]; do echo Still $COUNTER iterations to go *** echo echo -n Creating random file of $SIZE MBytes ... dd if=/dev/random of=BIG bs=1048576 count=${SIZE} /dev/null 21 echo Done echo -n Calculating md5 checksum ... CS1=`md5 -q BIG` echo Done echo -n Copying file ... cp -fp BIG BIG2 echo Done echo -n Calculating md5 checksum ... CS2=`md5 -q BIG2` echo Done if [ ${CS1} != ${CS2} ]; then echo CHECKSUM MISMATCH exit -1 else echo fi let COUNTER-=1 done for info, I test with args '38 999' (38M, try 999 times) on linux (slightly adapted script BTW) and '138 999' on bsd. The best 'score' I got was 'still 871 iterations to go' On the server, I have [EMAIL PROTECTED]:0:10:0: class=0x068000 card=0x286510f1 chip=0x005710de rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp' device = 'nForce4 Ultra NVidia Network Bus Enumerator' class = bridge cap 01[44] = powerspec 2 supports D0 D1 D2 D3 current D0 idem # ifconfig nfe0 nfe0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=10bRXCSUM,TXCSUM,VLAN_MTU,TSO4 ether 00:e0:81:58:91:6a inet 192.168.245.1 netmask 0xff00 broadcast 192.168.245.255 media: Ethernet autoselect (1000baseTX full-duplex,flag0,flag1) status: active idem How long does it take for the problem to come up ? as said : approximately half an hour; never more than 4 hours Best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
nfs-server silent data corruption
Hello, I've a strange problem with a box I'm setting up as nfs-server under 7-stable : - tyan S2895 MB, 2*285Dualcore Opteron, 4G-ECC, ahd-scsi, nfe-network - stripped GENERIC as kernel - sources as of last saturday afternoon (European time) I removed everything from /boot/loader.conf and /etc/sysctl.conf, still I get easily data corruption when exporting ahd-scsi over nfs (NB exporting geom_raid5 gives same data corruption) Testing with the following pseudo code : while checksum1 == checksum2 do create random file of $1 MBytes calculate md5 checksum1 copy calculate md5 checksum2 on copy Tested on both (as nfs-client) a 6-stable-i386 from a couple of weeks ago as well as a linux 2.6.15-gentoo-r1 of about two years ago : within half an hour the copy will be different ;( I played with nfs-options on client side (nfs[23], conn, intr, [udp|tcp], -r=, -w= ) but none seem to matter. Start/Stop rpc.lock/sttatd on server/client just provoked some : cp: utimes: BIG2: No such file or directory cp: chown: BIG2: Stale NFS file handle cp: chmod: BIG2: Stale NFS file handle cp: chflags: BIG2: Operation not supported cp: BIG2: Stale NFS file handle cp: setting permissions for `BIG2': Stale NFS file handle cp: closing `BIG2': Stale NFS file handle [and then the while loop continued ... as if the NFS handle where not that stale ..] Anyway, I'll try to nail this down more (e.g. nfs-write performance is horrible ... (nfsd falling down to 0% cpu and then after while 'wake up' and be at around 3-6% again)) I didn't stress-test this MB for a while, but last time I did was with 7-PRELEASE/RC?/CANTremember-exactly-but-close-to-release and all worked great I did add 2G ECC to the 2nd CPU since, though I doubt that interferes with NFS. Bref, if anyone has a suggestion (I will try downgrade to RELENG_7_0 iff noone has a new suggestion for RELENG_7, but I'd like to go forward and test some maybe suspect recent MFC or other suggestion) Thanx in advance, best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
more cpufreq woes
Hi, I once again have a freeze with cpufreq, this time on a Tyan S3950 MB + X2 BE 2400 proc; dev.cpu.0.freq_levels: 2277/10 2178/91708 1980/76426 1782/62805 990/30193 Same proc works OK with Asus M2N32 WS Pro ... Same Tyan MB works OK with X2 BE 2350 which shows dev.cpu.0.freq_levels: 2079/10 1980/91311 1782/75334 990/40013 With 'sysctl debug.cpufreq.lowest=1000' it works OK, but that's not really what I'd like to do. This is on RELENG_6. Best, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
comconsole trouble on ASUS A8VE-deluxe
Hello, I can't seem to get comconsole work on an ASUS A8VE-Deluxe MB : - I get the boot-menu, can escape to loader prompt and type, but no output once kernel starts booting - I tried (almost) all possible combinations of hint.sio.0.flags but no change, though 0x8 to recover sooner from lost output interrupts, *sometimes* gives blurbs of output - even pulling out the graphics card does not help - when in multi-user a good old kermit over cuad0 works OK - these are the relevant dmesg lines : sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: irq maps: 0 0 0 0 sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x30038 on acpi0 sio0: type ST16650A, console ioapic0: routing intpin 4 (ISA IRQ 4) to vector 55 Anyone an idea of what to try next? I tried uart(4) iso sio(4) and hint.uart.0.flags=0x10, no change, but I'm not quite sure this is supposed to work on amd64-stable. Thanx a lot in advance. Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: [summary] Re: burncd 'blank' not terminating ?
Luigi Rizzo [EMAIL PROTECTED] writes: summary: there was some discussion on how to fix the problem, in 6.x, with burncd -f /dev/acd0 -v blank getting stuck with this message blanking CD, please wait.. This used to work on 4.x. [ .. stuff deleted .. ] Patches below (to be improved to make CDIOCRESET unconditional). Does this satisfy all ? great! Works for me. (and even cdrecord now works). Thanx a lot. Arno P.S. this fixes, for real, http://www.freebsd.org/cgi/query-pr.cgi?pr=94426 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
witness_checkorder panic
Hello, I just got this on a box I'm testing before installation. It has clean RELENG_6 from about two weeks ago with only some small if_bge.c-patches Bruce Evans sent me for testing performance/hang problems. Since I doubt this panic is related to that, I just post it here in case someone is interested in more info : [sorry, no serial console attached ... just copy-paste from screen, but I will leave the box in the debugger for the WE ] struct mount mtx (struct mount mtx) @ /files/bsd/src6/sys/ufs/ufs/ufs_vnops.c:138 KDB: stack backtrace : witness_checkorder() _mtx_lock_flags() ufs_itimes() ufs_getattr() VOP_GETATTR_APV() filt_vfsread() knote() VOP_WRITE_APV() vn_write() dofilewrite() kern_writev() write() syscall() Xfast_syscall() --- syscall (4, FreeBSD ELF64, write), rip = 0x4363dc, rsp = 0X7fffdd78, rbp = 0x2f6 --- KDB: enter: witness_checkorder [thread pid 3987 tid 100133 ] Kernel config is stripped GENERIC + options AHC_ALLOW_MEMIO options TCP_DROP_SYNFIN options KDB options KDB_TRACE options DDB options KTRACE options INVARIANTS options INVARIANT_SUPPORT options DDB_NUMSYM options BREAK_TO_DEBUGGER options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_KDB options DEBUG_LOCKS options DEBUG_VFS_LOCKS options DIAGNOSTIC options MUTEX_PROFILING options MUTEX_DEBUG options SLEEPQUEUE_PROFILING options TURNSTILE_PROFILING options DEBUG_MEMGUARD The box was doing (/usr/src nfs-mounted): nohup time make -j 2 -DNO_CLEAN buildworld /tmp/bw_alone.log 21 It paniced shortly after I started 'tail -f /tmp/bw_alone.log' in another window, and /tmp is mfs. Arno -- Arno J. Klaassen SCITO S.A. 8 rue des Haies F-75020 Paris, France http://scito.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Watchdog Timeout - bge device - 6.2-PRERELEASE
John Marshall [EMAIL PROTECTED] writes: rwsrv05 dmesg | grep bge bge0: Broadcom BCM5705 A3, ASIC rev. 0x3003 mem 0xe820-0xe820 irq 17 at device 4.0 on pci4 miibus1: MII bus on bge0 bge0: Ethernet address: 00:0b:cd:e7:70:19 bge0: link state changed to UP bge0: watchdog timeout -- resetting I have a Tyan S2850 with the same (dual) LAN-chip; I increased BGE_TIMEOUT to 50 (due to reboot problems on a good-old 3com 100Mbps-hub which occasionaly gave me : bge1: firmware handshake timed out bge1: RX CPU self-diagnostics failed! ) This box occasionaly freezes under heavy load; with the above change AND compiling in DEVICE_POLLING but not enabling it, I do not have any problem for the time being (though the freeze is very hard to reproduce). Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfs-client reveals MFC-if_re-probs (or vice-versa) ?
/me wrote: I have a curious problem which at first sight seems related to the end-June MFC of if_re : - I 'mount -o nfsv3,intr,noconn,-r=32768,-w=32768 -stable-server:/files/bsd /files/bsd ' - (/usr/ports and /usr/src are symlinks to /files/bsd/*) quickly after a portinstall/portversion etc. I get : nfs server -stable-server: not responding (and the corresponding process stuck in 'bo_wwa' according to top(1) ) for info: #define RE_CSUM_FEATURES 0 in otherwise up to date if_re.c solves the problem. Best regards, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
nfs-client reveals MFC-if_re-probs (or vice-versa) ?
Hello, I have a curious problem which at first sight seems related to the end-June MFC of if_re : - I 'mount -o nfsv3,intr,noconn,-r=32768,-w=32768 -stable-server:/files/bsd /files/bsd ' - (/usr/ports and /usr/src are symlinks to /files/bsd/*) quickly after a portinstall/portversion etc. I get : nfs server -stable-server: not responding (and the corresponding process stuck in 'bo_wwa' according to top(1) ) - though I still can 'ping -stable-server' and even 'ssh me@-stable-server-IP' - -stable-server works ok with two other -stable clients (using if_bge) and all are compiled from the very same source-base (and -stable-server works fine as well with a linux-client) which seems to exclude nfsd-probs - a kernel from June the 11th works ok - downgrading if_re.c to revision 1.46.2.14 and if_rlreg.h to revision 1.51.2.3 makes the problem disappear - this is on my demo-notebook, I can test network stuff without much limitations; I just use nfs on it for upgrading world and ports. NB, same behaviour on amd64-stable and i386-stable (multi-boot same hardware) I can fill a PR if requested or feel free to contact me for further testing. Best regards, Arno PS: relevant pciconf info : [EMAIL PROTECTED]:8:0: class=0x02 card=0x47011558 chip=0x816910ec rev=0x10 hdr=0x00 vendor = 'Realtek Semiconductor' device = 'RTL8169 Gigabit Ethernet Adapter' class= network subclass = ethernet otherwise standard kernel conf with stripped unneeded drivers and extra : device cpufreq device atapicam device sound options TCP_DROP_SYNFIN (hint??) -- Arno J. Klaassen SCITO S.A. 8 rue des Haies F-75020 Paris, France http://scito.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
NFS : mount option update is unknown
Hello, I updated today two amd64-servers to -stable as of today, I now get the following dmesg when mounting nfs : mount option update is unknown mount option update is unknown mount option update is unknown mount option update is unknown mount option update is unknown May 31 01:54:18 accuracy mountd[443]: can't delete exports for /users/angora/u4: Invalid argument May 31 01:54:18 accuracy mountd[443]: can't delete exports for /data/angora/d1: Invalid argument May 31 01:54:18 accuracy mountd[443]: can't delete exports for /data/tabarnac/d2: Invalid argument May 31 01:54:18 accuracy mountd[443]: can't delete exports for /data/angora/db: Invalid argument May 31 01:54:18 accuracy mountd[443]: can't delete exports for /data/charlotte/da: Invalid argument They seem harmless and maybe related to MFC: 1.208 of ./kern/vfs_mount.c, though I don't understand the mountd messages : all mentioned filesystem are nfsclient fs and though /etc/exports exists, it only has one local fs which isn't mounted anywhere else anyway (while testing). FYI, Arno -- Arno J. Klaassen SCITO S.A. 8 rue des Haies F-75020 Paris, France http://scito.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kmem leak in tmpmfs?
Hello, thanx to all who responded. Setting ' tmpmfs_flags=-S -o async ' survived a nightly started locate script and a day of intensive 'normal' load. YMMV, but again, merci! Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
kmem leak in tmpmfs?
Hello, I get a very easy to reproduce panic on 6.1-STABLE : /etc/periodic/weekly/310.locate panics with panic: kmem_malloc(4096): kmem_map too small: 335544320 total allocated (kgdb) where #0 doadump () at pcpu.h:165 #1 0xc0577574 in boot (howto=260) at /files/bsd/src6/sys/kern/kern_shutdown.c:409 #2 0xc05778a6 in panic ( fmt=0xc078dc1d kmem_malloc(%ld): kmem_map too small: %ld total allocated) at /files/bsd/src6/sys/kern/kern_shutdown.c:565 #3 0xc06df1ab in kmem_malloc (map=0xc10430c0, size=4096, flags=258) at /files/bsd/src6/sys/vm/vm_kern.c:299 #4 0xc06d49a7 in page_alloc (zone=0xc1035700, bytes=0, pflag=0x0, wait=0) at /files/bsd/src6/sys/vm/uma_core.c:958 #5 0xc06d43db in slab_zalloc (zone=0xc1035700, wait=258) at /files/bsd/src6/sys/vm/uma_core.c:823 #6 0xc06d60f6 in uma_zone_slab (zone=0xc1035700, flags=2) at /files/bsd/src6/sys/vm/uma_core.c:2025 #7 0xc06d635f in uma_zalloc_bucket (zone=0xc1035700, flags=2) at /files/bsd/src6/sys/vm/uma_core.c:2134 #8 0xc06d5f39 in uma_zalloc_arg (zone=0xc1035700, udata=0x0, flags=2) at /files/bsd/src6/sys/vm/uma_core.c:1942 #9 0xc05d17ff in cache_enter (dvp=0xc8bf1110, vp=0xc8dd4110, cnp=0xfe14bbbc) at uma.h:275 #10 0xc06c77c4 in ufs_lookup (ap=0xfe14ba40) at /files/bsd/src6/sys/ufs/ufs/ufs_lookup.c:583 #11 0xc0756073 in VOP_CACHEDLOOKUP_APV (vop=0x0, a=0x0) at vnode_if.c:150 #12 0xc05d1dfa in vfs_cache_lookup (ap=0x0) at vnode_if.h:82 #13 0xc0755fe8 in VOP_LOOKUP_APV (vop=0xc07c8a60, a=0xfe14baec) at vnode_if.c:99 #14 0xc05d71fb in lookup (ndp=0xfe14bb94) at vnode_if.h:56 #15 0xc05d6998 in namei (ndp=0xfe14bb94) at /files/bsd/src6/sys/kern/vfs_lookup.c:203 #16 0xc05e865f in kern_lstat (td=0xc6b29780, path=0x0, pathseg=UIO_USERSPACE, sbp=0x0) at /files/bsd/src6/sys/kern/vfs_syscalls.c:2125 #17 0xc05e85df in lstat (td=0x0, uap=0xfe14bd04) at /files/bsd/src6/sys/kern/vfs_syscalls.c:2109 #18 0xc073e672 in syscall (frame= {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 134664008, tf_esi = 134663936, tf_ebp = -1077941544, tf_isp = -32195228, tf_ebx = 672511016, tf_edx = 134663936, tf_ecx = 134561792, tf_eax = 190, tf_trapno = 0, tf_err = 2, tf_eip = 672396855, tf_cs = 51, tf_eflags = 582, tf_esp = -1077941700, tf_ss = 59}) at /files/bsd/src6/sys/i386/i386/trap.c:981 #19 0xc072b21f in Xint0x80_syscall () at /files/bsd/src6/sys/i386/i386/exception.s:200 #20 0x0033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) This box has nothing particular, apart from maybe a large number of stamp-file based test-databases (with a lot of zero-sized files named .key=value). Producing this bug is easy : - set tmpmfs=YES and set tmpsize greater than around 220m - start /etc/periodic/weekly/310.locate (and nothing else!) - wait two-three hours and bang Last test is with tmpfs=1024m and I monitored df -h /tmp and vmstat -zm every minute; when the system panics, last output is : FilesystemSizeUsed Avail Capacity Mounted on /dev/md0 989M219M691M24%/var/tmp vmstat -zm | fgrep md0 md0: 512,0, 453257, 15, 453437 I'm quite not an expert, but looks to me as if md0 use stays almost 100% in kmem and is never swapped (as it is supposed to do by default according to the man-page). While here, and being struck as well by the nfsd-bug, at least vfs_lookup.c seems common to both problems. Full vmstat-zm logs available. Thanx, Arno -- Arno J. Klaassen SCITO S.A. 8 rue des Haies F-75020 Paris, France http://scito.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
RELENG_6 linux emulation problem on amd64
Hello, I get an easy to reproduce panic on recent RELENG_6/amd64 : -su-2.05b# /compat/linux/bin/bash bash-2.05b# cd /dev bash-2.05b# ls panic : kmem_malloc: entry not found or misaligned Setup is as follows : /dev/ad0s3d mounted on / /dev/ad0s4d mount on /files /usr is a symlink to /files/amd64/usr if ever that might be of importance (the rest of ad0s3 is RELENG_5/i386) uname -a : FreeBSD demo 6.0-RC1 FreeBSD 6.0-RC1 #1: Sat Oct 29 17:04:50 CEST 2005 [EMAIL PROTECTED]:/files/amd64/obj/files/bsd/src6/sys/D470K amd64 generic config-file with outcommented non-needed drivers and extra options : device cpufreq device tap device atapicam device sound device smbus device iicbus device iicsmb options NTFS options TCP_DROP_SYNFIN linux_base-8-8.0_7 installed. NB, please respond preferentially to list; i still need a good solution to filter important email from my flooding misc procmail-filter output ;( Arno # kgdb trace : (kgdb) where #0 doadump () at /files/bsd/src6/sys/kern/kern_shutdown.c:234 #1 0x8030c10b in boot (howto=260) at /files/bsd/src6/sys/kern/kern_shutdown.c:399 #2 0x8030c5de in panic ( fmt=0x805cdea8 kmem_malloc: entry not found or misaligned) at /files/bsd/src6/sys/kern/kern_shutdown.c:555 #3 0x804ed2cf in kmem_malloc (map=0xff003e0b0160, size=0, flags=258) at /files/bsd/src6/sys/vm/vm_kern.c:382 #4 0x804e00a2 in page_alloc (zone=0x0, bytes=0, pflag=0xa7aba5e7 \002\200\202®-, wait=258) at /files/bsd/src6/sys/vm/uma_core.c:957 #5 0x804e3bbb in uma_large_malloc (size=0, wait=258) at /files/bsd/src6/sys/vm/uma_core.c:2711 #6 0x802fc503 in malloc (size=0, mtp=0x80706880, flags=258) at /files/bsd/src6/sys/kern/kern_malloc.c:327 #7 0x802fc6fe in realloc (addr=0x0, size=18446744073709549576, mtp=0x80706880, flags=258) at /files/bsd/src6/sys/kern/kern_malloc.c:416 #8 0x80398412 in vfs_read_dirent (ap=0xa7aba790, dp=0xffe16298, off=0) at /files/bsd/src6/sys/kern/vfs_subr.c:3877 #9 0x80290f56 in devfs_readdir (ap=0xa7aba790) at /files/bsd/src6/sys/fs/devfs/devfs_vnops.c:828 #10 0x805815ec in VOP_READDIR_APV (vop=0x806fc480, a=0xa7aba790) at vnode_if.c:1427 #11 0x8056f559 in VOP_READDIR (vp=0xff0002f6e000, uio=0xa7abaab0, cred=0xff002e93c700, eofflag=0xa7aba854, ncookies=0xa7aba834, cookies=0xa7aba840) at vnode_if.h:747 #12 0x8056efe6 in getdents_common (td=0xff002ff22be0, args=0xa7abab90, is64bit=1) at /files/bsd/src6/sys/compat/linux/linux_file.c:328 #13 0x8056f612 in linux_getdents64 (td=0xff002ff22be0, args=0xa7abab90) at /files/bsd/src6/sys/compat/linux/linux_file.c:476 #14 0x80564f54 in ia32_syscall (frame= {tf_rdi = 3, tf_rsi = 0, tf_rdx = 4096, tf_rcx = 134598592, tf_r8 = 0, tf_r9 = 0, tf_rax = 220, tf_rbx = 3, tf_rbp = 4294958168, tf_r10 = 0, tf_r11 = 0, tf_r12 = 0, tf_r13 = 0, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 134602692, tf_flags = 0, tf_err = 2, tf_rip = 672250937, tf_cs = 27, tf_rflags = 582, tf_rsp = 4294958092, tf_ss = 35}) at /files/bsd/src6/sys/amd64/ia32/ia32_syscall.c:186 #15 0x8050c1ad in Xint0x80_syscall () at ia32_exception.S:64 #16 0x2811bc39 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) -- Arno J. Klaassen SCITO S.A. 8 rue des Haies F-75020 Paris, France http://scito.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Memory requirements between releases
hello, The installation notes for 5.4 and 6 (the floppies README.TXT) say FreeBSD for the i386 requires ...at least 24 MB of RAM. [ .. ] I have on old tosh 110CT laptop with 24mb memory I want to set up as a wireless router/NAT box but would prefer to use 6 or 5.4. I've run 5.X for about a year on a Pentium60 with 16M as ethernet router/NAT; flawless, excellent perf (untill it died a couple of weeks ago). net-booting via PXE though, no idea whether you can *install* with less than 24M, running only seems OK Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: TIMEOUT - WRITE_DMA - A possible FIX! turn off ACPI
Joe Koberg [EMAIL PROTECTED] writes: Zsolt Kúti wrote: My system produces these messages that I already know well from this list (as well ;): ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=213249674 Like many people I was confronted with TIMEOUT - READ_DMA and TIMEOUT - WRITE_DMA errors on my drives. I was frustrated. But I found a workaround: Turning off ACPI. dunno, I'd more suspect ACPI-APIC issues : untill now I only had problems on nForce based systems, but today I installed a brand new VIA based A7VT mini-server and re-voila les XXX_DMA errors (and accompanying severe system slow-down). (Disk swapped from the old PII-233 minimalist-server; worked OK there; disabling APIC (in BOIS and/or config and/or hints) made disappear the XXX_DMA messages (and gave me my network connexion back ;) ) whilst ACPI still enabled). FYI, Arno ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Continuing ahc problems - also cause fxp failure
I see an identical problem with and without this diff applied on an ASUS motherboard with onboard SCSI. No onboard Ethernet. same here; ASUS MB with onboard SCSI, offboard xl0 Ethernet. kernel 4.3-STABLE #0: Wed Jul 11 Arno To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-stable in the body of the message
Re: stable status.... still hosed (no more)
Arnout Boer [EMAIL PROTECTED] writes: On Thu, Aug 31, 2000 at 10:07:12AM +0100, Steve O'Hara-Smith wrote: Checking procedure is simple: load kernel, boot, then telnet from outside. ssh from outside will do it too (as I discovered this morning). ANy network connection will do - even samba! make sure you cvsup sys/kern/uipc_socket2.c version 1.55.2.6 commited this morning, and everything works fine again -- even Samba. A Ciao, Arno To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message