Re: RELENG_8 -- NFSv3 credentials/permissions issue
On Sun, Feb 21, 2010 at 09:25:45AM +0200, Daniel Braniss wrote: > > I'm willing to bet this is something simple I've overlooked, but I'm out > > of ideas. Client is 8.0-RELEASE i386, server is 8.0-STABLE amd64 > > (kernel/world 2010/01/16). NFS version used is v3. Server filesystem > > is UFS2. > at boot time, the NFS is V2!, if the server is FreeBSD it can be upgraded > later in the boot progress to V3 > > > > Client configuration is off-kilter: it's a PXE booted machine. Initial > > PXE booting uses TFTP, then switches to NFS to load the kernel and > > kernel modules. The TFTP part works, with a caveat[1], but the NFS > > portion fails. > TFTP is as old as the Internet, so it mostly works, and security was in > dipers, > so the T for trivial also means un-secure :-) > > > > With NFS, I'm forced to change permissions on all the exported > > files/directories to be 0644/0755 (specifically, setting other/global > > read/write access) otherwise the client gets back "Permission denied". > > The nfsd(8) man page implies that this shouldn't be necessary; adding > > -mapall=nobody:nobody or -maproot=nobody doesn't fix things either. > > > why not use -maproot=root? > by adding -ro, the client will be able to read but not modify. > That's what we do here, the /etc is mounted via unionfs to a md, but > that is yet another solution. I'll have to try that (shouldn't take me long), but I remember messing with -maproot and -mapall both and wasn't able to get anywhere. I'll try again and report back. > > Configuration data, tcpdump validation (client=192.168.1.140, > > server=192.168.1.51), and syslog data is below. > > > > Ideas? > > > > [1]: TFTP works as long as the file its trying to request (in this case > > /usr/local/freebsd8/boot/pxeboot) has its other/global read bit set, > > otherwise EACCESS is returned; I had to look in the tftpd source to > > figure this out. I'm not sure what the justification is there, given > > that use of -s and/or -u switches credentials to user/group nobody... > > > only root can read a file with mode 0, so you need to set the read bit for > any non root user. I'm not sure if you're referring to NFS here, or my TFTP comment. My TFTP comment should be discussed elsewhere -- it's broken/odd behaviour, but the workaround for TFTP (to set the file permissions to 0644 for read) I'm fine with -- it's TFTP! :-) With regards to NFS: none of the files below are mode . The request made via NFS should have gotten "translated" to being done by nobody:nobody on the NFS server, since there's no -mapall or -maproot line in the exports; user nobody has read access to everything shown below, so "Permission denied" makes no sense. > > Permissions > > = > > drwxr-xr-x 22 rootwheel512 Feb 6 12:25 / > > drwxr-xr-x 17 rootwheel512 Feb 12 03:38 /usr > > drwxr-xr-x 15 rootwheel512 Feb 19 10:41 /usr/local > > drwx-- 5 nobody nobody 512 Feb 19 10:42 /usr/local/freebsd8 > > drwx-- 7 nobody nobody 1024 Nov 21 08:11 > > /usr/local/freebsd8/boot > > drwx-- 2 nobody nobody 12800 Nov 21 08:11 > > /usr/local/freebsd8/boot/kernel > > -r 1 nobody nobody 11492703 Nov 21 07:48 > > /usr/local/freebsd8/boot/kernel/kernel > > > > tcpdump > > = > > {...snipping TFTP portion...} > > 10:57:20.601313 IP 192.168.1.140.68 > 255.255.255.255.67: BOOTP/DHCP, > > Request from 00:30:48:71:60:6b, length 548 > > 10:57:20.601442 IP 192.168.1.51.67 > 192.168.1.140.68: BOOTP/DHCP, Reply, > > length 323 > > 10:57:20.601688 IP 192.168.1.140.68 > 255.255.255.255.67: BOOTP/DHCP, > > Request from 00:30:48:71:60:6b, length 548 > > 10:57:20.601782 IP 192.168.1.51.67 > 192.168.1.140.68: BOOTP/DHCP, Reply, > > length 323 > > 10:57:20.613056 IP 192.168.1.140.1023 > 192.168.1.51.111: UDP, length 76 > > 10:57:20.613369 IP 192.168.1.51.111 > 192.168.1.140.1023: UDP, length 28 > > 10:57:20.613556 IP 192.168.1.140.1023 > 192.168.1.51.947: UDP, length 84 > > 10:57:20.613921 IP 192.168.1.51.947 > 192.168.1.140.1023: UDP, length 60 > > 10:57:20.614055 IP 192.168.1.140.1023 > 192.168.1.51.111: UDP, length 76 > > 10:57:20.614291 IP 192.168.1.51.111 > 192.168.1.140.1023: UDP, length 28 > > 10:57:20.614432 IP 192.168.1.140.4 > 192.168.1.51.2049: 100 lookup fh > > 1197,150310/6618112 "boot" > > 10:57:20.614458 IP 192.168.1.51.2049 > 192.168.1.140.4: reply ok 28 lookup > > ERROR: Permission denied > > 10:57:20.615436 IP 192.168.1.140.1022 > 192.168.1.51.947: UDP, length 84 > > 10:57:20.615677 IP 192.168.1.51.947 > 192.168.1.140.1022: UDP, length 60 > > 10:57:20.615806 IP 192.168.1.140.6 > 192.168.1.51.2049: 100 lookup fh > > 1197,150310/6618112 "boot" > > 10:57:20.615824 IP 192.168.1.51.2049 > 192.168.1.140.6: reply ok 28 lookup > > ERROR: Permission denied > > 10:57:20.615929 IP 192.168.1.140.1021 > 192.168.1.51.947: UDP, length 84 > > 10:57:20.616164 IP 192.168.1.51.947 > 192.168.1.140.1021: UDP, length 60 > > 10:57:
Re: RELENG_8 -- NFSv3 credentials/permissions issue
> I'm willing to bet this is something simple I've overlooked, but I'm out > of ideas. Client is 8.0-RELEASE i386, server is 8.0-STABLE amd64 > (kernel/world 2010/01/16). NFS version used is v3. Server filesystem > is UFS2. at boot time, the NFS is V2!, if the server is FreeBSD it can be upgraded later in the boot progress to V3 > > Client configuration is off-kilter: it's a PXE booted machine. Initial > PXE booting uses TFTP, then switches to NFS to load the kernel and > kernel modules. The TFTP part works, with a caveat[1], but the NFS > portion fails. TFTP is as old as the Internet, so it mostly works, and security was in dipers, so the T for trivial also means un-secure :-) > > With NFS, I'm forced to change permissions on all the exported > files/directories to be 0644/0755 (specifically, setting other/global > read/write access) otherwise the client gets back "Permission denied". > The nfsd(8) man page implies that this shouldn't be necessary; adding > -mapall=nobody:nobody or -maproot=nobody doesn't fix things either. > why not use -maproot=root? by adding -ro, the client will be able to read but not modify. That's what we do here, the /etc is mounted via unionfs to a md, but that is yet another solution. > In the absence of -maproot and -mapall options, remote accesses by root > will result in using a credential of -2:-2. All other users will be > mapped to their remote credential. If a -maproot option is given, remote > access by root will be mapped to that credential instead of -2:-2. If a > -mapall option is given, all users (including root) will be mapped to > that credential in place of their own. > > Configuration data, tcpdump validation (client=192.168.1.140, > server=192.168.1.51), and syslog data is below. > > Ideas? > > [1]: TFTP works as long as the file its trying to request (in this case > /usr/local/freebsd8/boot/pxeboot) has its other/global read bit set, > otherwise EACCESS is returned; I had to look in the tftpd source to > figure this out. I'm not sure what the justification is there, given > that use of -s and/or -u switches credentials to user/group nobody... > only root can read a file with mode 0, so you need to set the read bit for any non root user. > -- > | Jeremy Chadwick j...@parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > > Relevant server configuration bits: > > /etc/rc.conf > == > rpcbind_enable="yes" > rpcbind_flags="-l" > mountd_enable="yes" > mountd_flags="-r -l" > nfs_server_enable="yes" > > /etc/exports > == > /usr/local/freebsd8 -network 192.168.1 -mask 255.255.255.0 > > Permissions > = > drwxr-xr-x 22 rootwheel512 Feb 6 12:25 / > drwxr-xr-x 17 rootwheel512 Feb 12 03:38 /usr > drwxr-xr-x 15 rootwheel512 Feb 19 10:41 /usr/local > drwx-- 5 nobody nobody 512 Feb 19 10:42 /usr/local/freebsd8 > drwx-- 7 nobody nobody 1024 Nov 21 08:11 /usr/local/freebsd8/boot > drwx-- 2 nobody nobody 12800 Nov 21 08:11 > /usr/local/freebsd8/boot/kernel > -r 1 nobody nobody 11492703 Nov 21 07:48 > /usr/local/freebsd8/boot/kernel/kernel > > tcpdump > = > {...snipping TFTP portion...} > 10:57:20.601313 IP 192.168.1.140.68 > 255.255.255.255.67: BOOTP/DHCP, Request > from 00:30:48:71:60:6b, length 548 > 10:57:20.601442 IP 192.168.1.51.67 > 192.168.1.140.68: BOOTP/DHCP, Reply, > length 323 > 10:57:20.601688 IP 192.168.1.140.68 > 255.255.255.255.67: BOOTP/DHCP, Request > from 00:30:48:71:60:6b, length 548 > 10:57:20.601782 IP 192.168.1.51.67 > 192.168.1.140.68: BOOTP/DHCP, Reply, > length 323 > 10:57:20.613056 IP 192.168.1.140.1023 > 192.168.1.51.111: UDP, length 76 > 10:57:20.613369 IP 192.168.1.51.111 > 192.168.1.140.1023: UDP, length 28 > 10:57:20.613556 IP 192.168.1.140.1023 > 192.168.1.51.947: UDP, length 84 > 10:57:20.613921 IP 192.168.1.51.947 > 192.168.1.140.1023: UDP, length 60 > 10:57:20.614055 IP 192.168.1.140.1023 > 192.168.1.51.111: UDP, length 76 > 10:57:20.614291 IP 192.168.1.51.111 > 192.168.1.140.1023: UDP, length 28 > 10:57:20.614432 IP 192.168.1.140.4 > 192.168.1.51.2049: 100 lookup fh > 1197,150310/6618112 "boot" > 10:57:20.614458 IP 192.168.1.51.2049 > 192.168.1.140.4: reply ok 28 lookup > ERROR: Permission denied > 10:57:20.615436 IP 192.168.1.140.1022 > 192.168.1.51.947: UDP, length 84 > 10:57:20.615677 IP 192.168.1.51.947 > 192.168.1.140.1022: UDP, length 60 > 10:57:20.615806 IP 192.168.1.140.6 > 192.168.1.51.2049: 100 lookup fh > 1197,150310/6618112 "boot" > 10:57:20.615824 IP 192.168.1.51.2049 > 192.168.1.140.6: reply ok 28 lookup > ERROR: Permission denied > 10:57:20.615929 IP 192.168.1.140.1021 > 192.168.1.51.947: UDP, length 84 > 10:57:20.616164 IP 192.168.1.51.9
Re: ntpd struggling to keep up - how to fix?
On 2010-Feb-20 22:32:01 +0100, Torfinn Ingolfsen wrote: >On Sat, 20 Feb 2010 12:53:51 +1100 >Peter Jeremy wrote: > >> Looks reasonable. Let us know the results. I'd be interested in >> the output from "ntpdc -c loopi -c sysi". > >Ok, here we go (the server panic'ed again last night): >r...@kg-f2# uptime >10:28PM up 2:26, 3 users, load averages: 0.00, 0.00, 0.00 >r...@kg-f2# sysctl machdep.acpi_timer_freq >machdep.acpi_timer_freq: 3577045 >r...@kg-f2# tvlm >Feb 20 20:06:41 kg-f2 ntpd[942]: kernel time sync status change 2001 >Feb 20 20:21:49 kg-f2 ntpd[942]: time reset +1.118880 s >Feb 20 20:37:53 kg-f2 ntpd[942]: time reset +1.188538 s >Feb 20 20:53:03 kg-f2 ntpd[942]: time reset +1.121903 s >Feb 20 21:09:00 kg-f2 ntpd[942]: time reset +1.179924 s >Feb 20 21:24:57 kg-f2 ntpd[942]: time reset +1.178490 s >Feb 20 21:39:58 kg-f2 ntpd[942]: time reset +1.110647 s >Feb 20 21:55:53 kg-f2 ntpd[942]: time reset +1.177292 s >Feb 20 22:11:44 kg-f2 ntpd[942]: time reset +1.172358 s >Feb 20 22:26:48 kg-f2 ntpd[942]: time reset +1.114350 s That's definitely not good - though it's marginally better than before. I have checked on a local machine and the timecounter frequency definitely needs to be adjusted in the opposite direction to the ntpd drift. I think I see the problem: I suggested 3579545Hz - 2500ppm, which gives an ACPI frequency of 3570596Hz. There was some miscommunication and you have set an ACPI frequency of 3577045Hz which is 2500Hz (or 698ppm) lower. The drift reported by the time resets has gone from +1930ppm (14.5s in 2:05:17) to +1233ppm (8.4s in 2:20:06) - which is 697ppm - fairly close to the change you made. (The PLL is running at +500ppm so the actual clock offset is 500ppm more than the "time reset" reports suggest. Having re-checked my maths, using both your "time reset" results, can you please try: sysctl machdep.acpi_timer_freq=3570847 That should result in a drift of close to zero (well within NTP's lock range of +/- 300ppm). >frequency:500.000 ppm And this is definitely not good. >Not synced at all. Not good. :-/ >Perhaps I should give it more time? No. Once ntpd decides to continuously step, something is broken. I've done some double-checking and On 2010-Feb-20 22:55:21 +0100, Torfinn Ingolfsen wrote: >This output looks ... wrong ... somehow to my eyes: ... >Shouldn't ntpq and ntpdc be in agreement? I'm not sure which particular bits you are concerned about but ntpq reports delay/offset/jitter in msec whilst ntpdc reports them in sec. Note that I can't explain why the loopi offset is zero - ntpdc(8) states that this is the "last offset given to the loop filter by the packet processing code". For me it's non-zero but doesn't quite match the offset reported by 'ntpq -p'. -- Peter Jeremy pgpZax0MQojXe.pgp Description: PGP signature
Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64
On Sun, Feb 21, 2010 at 12:35 AM, Jeremy Chadwick wrote: > We can safely rule out the Silicon Image controller (otherwise "ataX" > wouldn't be involved), which leaves the AMD SB700 SATA controller and > the AMD SB700 PATA controller. > > What exact disks (e.g. adX) are attached to ata5 and ata6? You haven't > provided dmesg output in any of your posts, and atacontrol/pciconf is > not sufficient (I should really improve atacontrol by printing this > information. I'll work on that in a few minutes). > > Some Linux users have reported AHCI-related issues with the SB600 > southbridge, but the core of the problem turned out to be MSI on certain > AMD northbridges (specifically RS480, RS400, and RS200). By disabling > MSI entirely they were able to achieve stability. The FreeBSD > equivalent would be to set the following in loader.conf and reboot: > > hw.pci.enable_msix="0" > hw.pci.enable_msi="0" > > The Linux quirk fix for this: > > > http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob_plain;f=queue-2.6.21/pci-quirks-disable-msi-on-rs400-200-and-rs480.patch;hb=05ab505f2909acf3a614d3e6a32271c4c1f8a69d > > Your board has an AMD 740G northbridge, but it might be worth trying the > MSI disable trick anyway. If it doesn't fix the problem then definitely > re-enable MSI. Isn't hardware fun? ;-) > Just one more data point. I have a machine with similar hardware: an MSI K9A2GM-FIH motherboard: http://eu.msi.com/index.php?func=proddesc&maincat_no=1&prod_no=1436 with an AMD 780G northbridge and AMD SB700 SATA controller, and I experienced freezes after switching to AHCI. Those freezes happened e.g. after some sustained random disk activity, followed by starting 'dvdisaster'. Then the HDD LDD started to blink sloly, every 10 seconds on, then off and again. No more disk activity on the aha0 controller was possible. The system remained responsive as long as it didn't involve disk activity (i.e. pings, mouse, keyboard etc.., but not starting new processes). I'm pretty sure it started happening (very sporadically) only after I've switched to an AHCI setup. It didn't freeze before under the same load pattern. I can't test disabling MSI right now on that box, but will try it on a similar test machine in a few days (where I hope to reproduce this). Thanks for the hint! ahci0: port 0xc000-0xc007,0xb000-0xb003,0xa000 -0xa007,0x9000-0x9003,0x8000-0x800f mem 0xfe7ff800-0xfe7ffbff irq 22 at device 1 7.0 on pci0 ahci0: [ITHREAD] ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported ahcich0: at channel 0 on ahci0 ahcich0: [ITHREAD] ahcich1: at channel 1 on ahci0 ahcich1: [ITHREAD] ahcich2: at channel 2 on ahci0 ahcich2: [ITHREAD] ahcich3: at channel 3 on ahci0 ahcich3: [ITHREAD] ahcich4: at channel 4 on ahci0 ahcich4: [ITHREAD] ahcich5: at channel 5 on ahci0 ahcich5: [ITHREAD] ACPI Warning: \\_SB_.PCI0.SBRG.FDC_._FDE: Return type mismatch - found Package, expected Buffer 20090521 nspredef-1051 (aprobe0:ahcich0:0:0:0): SIGNATURE: ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ATA/ATAPI-7 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO size 8192bytes) ada0: Command Queueing enabled ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) -cpghost. -- Cordula's Web. http://www.cordula.ws/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ntpd struggling to keep up - how to fix?
On Sat, Feb 20, 2010 at 10:55:21PM +0100, Torfinn Ingolfsen wrote: > On Sat, 20 Feb 2010 22:32:01 +0100 > Torfinn Ingolfsen wrote: > > > This output looks ... wrong ... somehow to my eyes: > r...@kg-f2# date > Sat Feb 20 22:51:24 CET 2010 > r...@kg-f2# ntpq -p > remote refid st t when poll reach delay offset jitter > == > *kg-omni1.kg4.no 129.240.64.3 3 u 62 64 3770.244 597.314 360.123 > r...@kg-f2# ntpdc -c loopi -c sysi > offset: 0.00 s > frequency:500.000 ppm > poll adjust: 4 > watchdog timer: 549 s > system peer: kg-omni1.kg4.no > system peer mode: client > leap indicator: 11 > stratum: 16 > precision:-18 > root distance:0.0 s > root dispersion: 0.00822 s > reference ID: [10.1.10.1] > reference time: . Thu, Feb 7 2036 7:28:16.000 > system flags: auth monitor ntp kernel stats > jitter: 0.360107 s > stability:0.000 ppm > broadcastdelay: 0.003998 s > authdelay:0.00 s > > Shouldn't ntpq and ntpdc be in agreement? ntpq and ntpdc output data in slightly different formats, depending on what arguments you give them. I'm not familiar with the loopi or sysi commands; Peter should be able to help here. For sake of example -- look at ntpq's "delay" column for each peer, and then look at the same column but for ntpdc. You'll see that for ntpdc they're divided by 1000 (presumably kern.hz rate): $ ntpq -c peers remote refid st t when poll reach delay offset jitter == +clock-a.develoo 204.123.2.72 2 u 476 512 377 25.287 -0.852 0.550 -enigma.wiredgoa 209.81.9.7 2 u 185 512 377 14.7540.284 0.688 +mtnlion.com 139.78.135.142 u 208 512 377 30.788 -0.233 0.160 *ntp1.phoenixpub .LCL.1 u 179 512 377 36.322 -0.552 0.522 -ntp-1.gw.uiuc.e 128.174.38.133 2 u 141 512 377 77.321 -5.381 0.328 -tick.jrc.us 172.21.0.14 2 u 149 512 377 112.424 -8.110 1.440 $ ntpdc -c peers remote local st poll reach delay offsetdisp === *mailserv1.phoen 192.168.1.51 1 512 377 0.03632 -0.000552 0.09666 =clock-a.develoo 192.168.1.51 2 512 377 0.02528 -0.000852 0.08611 =tick.jrc.us 192.168.1.51 2 512 377 0.11241 -0.008110 0.08615 =enigma.wiredgoa 192.168.1.51 2 512 377 0.01474 0.000284 0.11473 =mtnlion.com 192.168.1.51 2 512 377 0.03078 -0.000233 0.09665 =ntp-1.gw.uiuc.e 192.168.1.51 2 512 377 0.07732 -0.005381 0.10612 -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64
On Sat, Feb 20, 2010 at 10:49:59PM +0100, Torfinn Ingolfsen wrote: > On Sat, 20 Feb 2010 11:37:18 -0800 > Jeremy Chadwick wrote: > > > Can you re-run smartctl -a instead of -H? Some of the SMART attributes > > may help determine what's going on, or there may be related errors in > > the SMART error log. > > smartctl -a output attached. Test sequence: ad4 - ad12, ada0. Most of your disks look to be in decent shape. Well, that is to say, all of them should be working fine; I don't see anything that's of major, or even minor concern. Others might focus on Attributes 191 or 195, but neither of those are absurdly high given the number of hours these disks have been in use (see Attribute 9). > > Otherwise I'd say what's happening is a SATA controller lock-up of some > > sort, since it happens on any of your channels. Could be a quirk of > > some kind in the SATA->CAM stuff (unless it also happens when using pure > > ata(4)). > > I am running a quite recent 8.0-stable: > r...@kg-f2# uname -a > FreeBSD kg-f2.kg4.no 8.0-STABLE FreeBSD 8.0-STABLE #2: Sun Jan 31 18:39:17 > CET 2010 r...@kg-f2.kg4.no:/usr/obj/usr/src/sys/GENERIC amd64 > > Perhaps I should upgrade. > > > What controller are these disks hooked to again? > > Six of the disks (ad4, ad6, ad8, ad10, ad12) are connected to the SATA ports > on the motherboard: > r...@kg-f2# pciconf -lv | grep ata -A 4 > atap...@pci0:0:17:0: class=0x010601 card=0xb0021458 chip=0x43911002 rev=0x00 > hdr=0x00 > vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' > device = 'SB700 SATA Controller [AHCI mode]' > class = mass storage > subclass = SATA Let's backtrack a bit. I've gone back and read through all of your previous posts on this matter, and so far all the problems are happening on ata5 and ata6. No timeouts or anomalies have appeared on any other ports -- just those two. The kernel error messages indicate that commands submit to the controller took longer than 10 seconds to get a response, so the OS does a force-reset of the ports in attempt to get things working again. We can safely rule out the Silicon Image controller (otherwise "ataX" wouldn't be involved), which leaves the AMD SB700 SATA controller and the AMD SB700 PATA controller. What exact disks (e.g. adX) are attached to ata5 and ata6? You haven't provided dmesg output in any of your posts, and atacontrol/pciconf is not sufficient (I should really improve atacontrol by printing this information. I'll work on that in a few minutes). Some Linux users have reported AHCI-related issues with the SB600 southbridge, but the core of the problem turned out to be MSI on certain AMD northbridges (specifically RS480, RS400, and RS200). By disabling MSI entirely they were able to achieve stability. The FreeBSD equivalent would be to set the following in loader.conf and reboot: hw.pci.enable_msix="0" hw.pci.enable_msi="0" The Linux quirk fix for this: http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob_plain;f=queue-2.6.21/pci-quirks-disable-msi-on-rs400-200-and-rs480.patch;hb=05ab505f2909acf3a614d3e6a32271c4c1f8a69d Your board has an AMD 740G northbridge, but it might be worth trying the MSI disable trick anyway. If it doesn't fix the problem then definitely re-enable MSI. Isn't hardware fun? ;-) -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS and sh(1) panic: spin lock [lock addr] (smp rendezvous) held by [sh(1) proc tid] too long
2010/1/27 Brandon Gooch : > The machine, a Dell Optiplex 755, has been locking up recently. The > situation usually occurs while using VirtualBox (running a 64-bit > Windows 7 instance) and doing anything else in another xterm (such as > rebuilding a port). I've been unable to reliably reproduce it (I'm in > an X session and the machine will not panic "properly"). > > However, while rebuilding Xorg today at ttyv0 and runnning > VBoxHeadless on ttyv1, I managed to trigger what I believe is the > lockup. > > I've attached a textdump in hopes that someone may be able to take a > look and provide clues or instruction on debugging this. I think that jhb@ saw a similar problem while working on nVidia driver or the like. Not sure if he made any progress to debug this. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ntpd struggling to keep up - how to fix?
On Sat, 20 Feb 2010 22:32:01 +0100 Torfinn Ingolfsen wrote: This output looks ... wrong ... somehow to my eyes: r...@kg-f2# date Sat Feb 20 22:51:24 CET 2010 r...@kg-f2# ntpq -p remote refid st t when poll reach delay offset jitter == *kg-omni1.kg4.no 129.240.64.3 3 u 62 64 3770.244 597.314 360.123 r...@kg-f2# ntpdc -c loopi -c sysi offset: 0.00 s frequency:500.000 ppm poll adjust: 4 watchdog timer: 549 s system peer: kg-omni1.kg4.no system peer mode: client leap indicator: 11 stratum: 16 precision:-18 root distance:0.0 s root dispersion: 0.00822 s reference ID: [10.1.10.1] reference time: . Thu, Feb 7 2036 7:28:16.000 system flags: auth monitor ntp kernel stats jitter: 0.360107 s stability:0.000 ppm broadcastdelay: 0.003998 s authdelay:0.00 s Shouldn't ntpq and ntpdc be in agreement? -- Torfinn ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64
On Sat, 20 Feb 2010 11:37:18 -0800 Jeremy Chadwick wrote: > Can you re-run smartctl -a instead of -H? Some of the SMART attributes > may help determine what's going on, or there may be related errors in > the SMART error log. smartctl -a output attached. Test sequence: ad4 - ad12, ada0. > Otherwise I'd say what's happening is a SATA controller lock-up of some > sort, since it happens on any of your channels. Could be a quirk of > some kind in the SATA->CAM stuff (unless it also happens when using pure > ata(4)). I am running a quite recent 8.0-stable: r...@kg-f2# uname -a FreeBSD kg-f2.kg4.no 8.0-STABLE FreeBSD 8.0-STABLE #2: Sun Jan 31 18:39:17 CET 2010 r...@kg-f2.kg4.no:/usr/obj/usr/src/sys/GENERIC amd64 Perhaps I should upgrade. > What controller are these disks hooked to again? Six of the disks (ad4, ad6, ad8, ad10, ad12) are connected to the SATA ports on the motherboard: r...@kg-f2# pciconf -lv | grep ata -A 4 atap...@pci0:0:17:0:class=0x010601 card=0xb0021458 chip=0x43911002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'SB700 SATA Controller [AHCI mode]' class = mass storage subclass = SATA -- atap...@pci0:0:20:1:class=0x01018a card=0x50021458 chip=0x439c1002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'PATA 133 Controller (SB7xx)' class = mass storage subclass = ATA (There is nothing connected to the PATA ports). The last disk (ada0) is connected to a PCI card: o...@kg-f2# pciconf -lv | grep siis -A 3 si...@pci0:2:0:0: class=0x018000 card=0x35311095 chip=0x35311095 rev=0x01 hdr=0x00 vendor = 'Silicon Image Inc (Was: CMD Technology Inc)' device = 'SiI 3531 SATA Controller' class = mass storage Hardware info about the machine here: http://sites.google.com/site/tingox/ga-ma74gm-s2h HTH -- Torfinn smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint F1 DT series Device Model: SAMSUNG HD252HJ Serial Number:S17HJ9BSA04283 Firmware Version: 1AC01118 User Capacity:250,059,350,016 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 3b Local Time is:Sat Feb 20 22:44:11 2010 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (3651) seconds. Offline data collection capabilities:(0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:(0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time:( 2) minutes. Extended self-test routine recommended polling time:( 62) minutes. Conveyance self-test routine recommended polling time:( 8) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 100 051Pre-fail Always - 0 3 Spin_Up_Time0x0007 092 092 011Pre-fail Always - 3260 4 Start_Stop_Count0x0032 100 100 000O
Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)
On Fri, Feb 19, 2010 at 01:12:01PM -0800, Pyun YongHyeon wrote: > Normally you should not have any FCS errors, it could be related > with signal quality and these errors might not be correctly > counted. I can't check cable and switch counters on bge1 before Feb 24. > > 3. packets don't lost on sources at Aug'09 > > Since I don't have BCM5704 hardware it's hard to find which > revision may affect to this issue. Could you narrow down which > revision number started showing the issue? I am don't update source between Aug'09 and Feb 16. 4. Packets don't lost immediately after reboot. PS: I got kernel panic. === Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x20:0x802eb3b7 stack pointer = 0x28:0xff80001c66e0 frame pointer = 0x28:0xff8 01c6740 code segment= base 0x0, limi 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 724 (named) [thread pid 724 tid 100051 ] Stopped at m_copym+0x37: movl0x18(%r12),%eax db> panic panic: from debugger cpuid = 0 Uptime: 1d5h55m33s Physical memory: 2039 MB Dumping 1448 MB: 1433 1417 1401 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ntpd struggling to keep up - how to fix?
On Sat, 20 Feb 2010 12:53:51 +1100 Peter Jeremy wrote: > Looks reasonable. Let us know the results. I'd be interested in > the output from "ntpdc -c loopi -c sysi". Ok, here we go (the server panic'ed again last night): r...@kg-f2# uptime 10:28PM up 2:26, 3 users, load averages: 0.00, 0.00, 0.00 r...@kg-f2# sysctl machdep.acpi_timer_freq machdep.acpi_timer_freq: 3577045 r...@kg-f2# tvlm Feb 20 20:06:41 kg-f2 ntpd[942]: kernel time sync status change 2001 Feb 20 20:21:49 kg-f2 ntpd[942]: time reset +1.118880 s Feb 20 20:37:53 kg-f2 ntpd[942]: time reset +1.188538 s Feb 20 20:53:03 kg-f2 ntpd[942]: time reset +1.121903 s Feb 20 21:09:00 kg-f2 ntpd[942]: time reset +1.179924 s Feb 20 21:24:57 kg-f2 ntpd[942]: time reset +1.178490 s Feb 20 21:39:58 kg-f2 ntpd[942]: time reset +1.110647 s Feb 20 21:55:53 kg-f2 ntpd[942]: time reset +1.177292 s Feb 20 22:11:44 kg-f2 ntpd[942]: time reset +1.172358 s Feb 20 22:26:48 kg-f2 ntpd[942]: time reset +1.114350 s r...@kg-f2# ntpq -p remote refid st t when poll reach delay offset jitter == kg-omni1.kg4.no 129.240.64.3 3 u8 6470.176 133.306 77.731 r...@kg-f2# ntpdc -c loopi -c sysi offset: 0.00 s frequency:500.000 ppm poll adjust: 4 watchdog timer: 194 s system peer: 0.0.0.0 system peer mode: unspec leap indicator: 11 stratum: 16 precision:-18 root distance:0.0 s root dispersion: 0.00290 s reference ID: [83.84.69.80] reference time: . Thu, Feb 7 2036 7:28:16.000 system flags: auth monitor ntp kernel stats jitter: 0.358109 s stability:0.000 ppm broadcastdelay: 0.003998 s authdelay:0.00 s Not synced at all. Not good. :-/ Perhaps I should give it more time? -- Torfinn ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64
On Sat, Feb 20, 2010 at 08:21:08PM +0100, Torfinn Ingolfsen wrote: > Another day, another crash. > >From /var/log/messages: > Feb 20 08:52:26 kg-f2 ntpd[58609]: time reset +1.169751 s > Feb 20 08:54:57 kg-f2 kernel: ata5: port is not ready (timeout 1ms) tfd = > 007f > Feb 20 08:54:57 kg-f2 kernel: ata5: hardware reset timeout > Feb 20 19:18:51 kg-f2 syslogd: kernel boot file is /boot/kernel/kernel > > The drives are as follows: > r...@kg-f2# atacontrol list;camcontrol devlist > ATA channel 0: > Master: no device present > Slave: no device present > ATA channel 2: > Master: ad4 SATA revision 2.x > Slave: no device present > ATA channel 3: > Master: ad6 SATA revision 2.x > Slave: no device present > ATA channel 4: > Master: ad8 SATA revision 2.x > Slave: no device present > ATA channel 5: > Master: ad10 SATA revision 2.x > Slave: no device present > ATA channel 6: > Master: ad12 SATA revision 2.x > Slave: no device present > ATA channel 7: > Master: ad14 SATA revision 2.x > Slave: no device present > at scbus0 target 0 lun 0 (pass0,ada0) > > Smartctl is happy, too: > r...@kg-f2# smartctl -H /dev/ad4 > smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build) > Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > r...@kg-f2# smartctl -H /dev/ad6 > smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build) > Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > r...@kg-f2# smartctl -H /dev/ad8 > smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build) > Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > r...@kg-f2# smartctl -H /dev/ad10 > smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build) > Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > r...@kg-f2# smartctl -H /dev/ad12 > smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build) > Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > r...@kg-f2# smartctl -H /dev/ada0 > smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build) > Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > Maybe the hardware is just plain broken. Can you re-run smartctl -a instead of -H? Some of the SMART attributes may help determine what's going on, or there may be related errors in the SMART error log. Otherwise I'd say what's happening is a SATA controller lock-up of some sort, since it happens on any of your channels. Could be a quirk of some kind in the SATA->CAM stuff (unless it also happens when using pure ata(4)). What controller are these disks hooked to again? -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64
Another day, another crash. >From /var/log/messages: Feb 20 08:52:26 kg-f2 ntpd[58609]: time reset +1.169751 s Feb 20 08:54:57 kg-f2 kernel: ata5: port is not ready (timeout 1ms) tfd = 007f Feb 20 08:54:57 kg-f2 kernel: ata5: hardware reset timeout Feb 20 19:18:51 kg-f2 syslogd: kernel boot file is /boot/kernel/kernel The drives are as follows: r...@kg-f2# atacontrol list;camcontrol devlist ATA channel 0: Master: no device present Slave: no device present ATA channel 2: Master: ad4 SATA revision 2.x Slave: no device present ATA channel 3: Master: ad6 SATA revision 2.x Slave: no device present ATA channel 4: Master: ad8 SATA revision 2.x Slave: no device present ATA channel 5: Master: ad10 SATA revision 2.x Slave: no device present ATA channel 6: Master: ad12 SATA revision 2.x Slave: no device present ATA channel 7: Master: ad14 SATA revision 2.x Slave: no device present at scbus0 target 0 lun 0 (pass0,ada0) Smartctl is happy, too: r...@kg-f2# smartctl -H /dev/ad4 smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED r...@kg-f2# smartctl -H /dev/ad6 smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED r...@kg-f2# smartctl -H /dev/ad8 smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED r...@kg-f2# smartctl -H /dev/ad10 smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED r...@kg-f2# smartctl -H /dev/ad12 smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED r...@kg-f2# smartctl -H /dev/ada0 smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED Maybe the hardware is just plain broken. -- Torfinn ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"