Re: RELENG_8 -- NFSv3 credentials/permissions issue

2010-02-20 Thread Jeremy Chadwick
On Sun, Feb 21, 2010 at 09:25:45AM +0200, Daniel Braniss wrote:
> > I'm willing to bet this is something simple I've overlooked, but I'm out
> > of ideas.  Client is 8.0-RELEASE i386, server is 8.0-STABLE amd64
> > (kernel/world 2010/01/16).  NFS version used is v3.  Server filesystem
> > is UFS2.
> at boot time, the NFS is V2!, if the server is FreeBSD it can be upgraded
> later in the boot progress to V3
> > 
> > Client configuration is off-kilter: it's a PXE booted machine.  Initial
> > PXE booting uses TFTP, then switches to NFS to load the kernel and
> > kernel modules.  The TFTP part works, with a caveat[1], but the NFS
> > portion fails.
> TFTP is as old as the Internet, so it mostly works, and security was in 
> dipers,
> so the T for trivial also means un-secure :-)
> > 
> > With NFS, I'm forced to change permissions on all the exported
> > files/directories to be 0644/0755 (specifically, setting other/global
> > read/write access) otherwise the client gets back "Permission denied".
> > The nfsd(8) man page implies that this shouldn't be necessary; adding
> > -mapall=nobody:nobody or -maproot=nobody doesn't fix things either.
> > 
> why not use -maproot=root?
> by adding -ro, the client will be able to read but not modify.
> That's what we do here, the /etc is mounted via unionfs to a md, but
> that is yet another solution.

I'll have to try that (shouldn't take me long), but I remember messing
with -maproot and -mapall both and wasn't able to get anywhere.  I'll
try again and report back.

> > Configuration data, tcpdump validation (client=192.168.1.140,
> > server=192.168.1.51), and syslog data is below.
> > 
> > Ideas?
> > 
> > [1]: TFTP works as long as the file its trying to request (in this case
> > /usr/local/freebsd8/boot/pxeboot) has its other/global read bit set,
> > otherwise EACCESS is returned; I had to look in the tftpd source to
> > figure this out.  I'm not sure what the justification is there, given
> > that use of -s and/or -u switches credentials to user/group nobody...
> > 
> only root can read a file with mode 0, so you need to set the read bit for
> any non root user.

I'm not sure if you're referring to NFS here, or my TFTP comment.  My
TFTP comment should be discussed elsewhere -- it's broken/odd behaviour,
but the workaround for TFTP (to set the file permissions to 0644 for
read) I'm fine with -- it's TFTP!  :-)

With regards to NFS: none of the files below are mode .  The request
made via NFS should have gotten "translated" to being done by
nobody:nobody on the NFS server, since there's no -mapall or -maproot
line in the exports; user nobody has read access to everything shown
below, so "Permission denied" makes no sense.

> > Permissions
> > =
> > drwxr-xr-x  22 rootwheel512 Feb  6 12:25 /
> > drwxr-xr-x  17 rootwheel512 Feb 12 03:38 /usr
> > drwxr-xr-x  15 rootwheel512 Feb 19 10:41 /usr/local
> > drwx--   5 nobody  nobody   512 Feb 19 10:42 /usr/local/freebsd8
> > drwx--   7 nobody  nobody  1024 Nov 21 08:11 
> > /usr/local/freebsd8/boot
> > drwx--   2 nobody  nobody 12800 Nov 21 08:11 
> > /usr/local/freebsd8/boot/kernel
> > -r   1 nobody  nobody  11492703 Nov 21 07:48 
> > /usr/local/freebsd8/boot/kernel/kernel
> > 
> > tcpdump
> > =
> > {...snipping TFTP portion...}
> > 10:57:20.601313 IP 192.168.1.140.68 > 255.255.255.255.67: BOOTP/DHCP, 
> > Request from 00:30:48:71:60:6b, length 548
> > 10:57:20.601442 IP 192.168.1.51.67 > 192.168.1.140.68: BOOTP/DHCP, Reply, 
> > length 323
> > 10:57:20.601688 IP 192.168.1.140.68 > 255.255.255.255.67: BOOTP/DHCP, 
> > Request from 00:30:48:71:60:6b, length 548
> > 10:57:20.601782 IP 192.168.1.51.67 > 192.168.1.140.68: BOOTP/DHCP, Reply, 
> > length 323
> > 10:57:20.613056 IP 192.168.1.140.1023 > 192.168.1.51.111: UDP, length 76
> > 10:57:20.613369 IP 192.168.1.51.111 > 192.168.1.140.1023: UDP, length 28
> > 10:57:20.613556 IP 192.168.1.140.1023 > 192.168.1.51.947: UDP, length 84
> > 10:57:20.613921 IP 192.168.1.51.947 > 192.168.1.140.1023: UDP, length 60
> > 10:57:20.614055 IP 192.168.1.140.1023 > 192.168.1.51.111: UDP, length 76
> > 10:57:20.614291 IP 192.168.1.51.111 > 192.168.1.140.1023: UDP, length 28
> > 10:57:20.614432 IP 192.168.1.140.4 > 192.168.1.51.2049: 100 lookup fh 
> > 1197,150310/6618112 "boot"
> > 10:57:20.614458 IP 192.168.1.51.2049 > 192.168.1.140.4: reply ok 28 lookup 
> > ERROR: Permission denied
> > 10:57:20.615436 IP 192.168.1.140.1022 > 192.168.1.51.947: UDP, length 84
> > 10:57:20.615677 IP 192.168.1.51.947 > 192.168.1.140.1022: UDP, length 60
> > 10:57:20.615806 IP 192.168.1.140.6 > 192.168.1.51.2049: 100 lookup fh 
> > 1197,150310/6618112 "boot"
> > 10:57:20.615824 IP 192.168.1.51.2049 > 192.168.1.140.6: reply ok 28 lookup 
> > ERROR: Permission denied
> > 10:57:20.615929 IP 192.168.1.140.1021 > 192.168.1.51.947: UDP, length 84
> > 10:57:20.616164 IP 192.168.1.51.947 > 192.168.1.140.1021: UDP, length 60
> > 10:57:

Re: RELENG_8 -- NFSv3 credentials/permissions issue

2010-02-20 Thread Daniel Braniss
> I'm willing to bet this is something simple I've overlooked, but I'm out
> of ideas.  Client is 8.0-RELEASE i386, server is 8.0-STABLE amd64
> (kernel/world 2010/01/16).  NFS version used is v3.  Server filesystem
> is UFS2.
at boot time, the NFS is V2!, if the server is FreeBSD it can be upgraded
later in the boot progress to V3
> 
> Client configuration is off-kilter: it's a PXE booted machine.  Initial
> PXE booting uses TFTP, then switches to NFS to load the kernel and
> kernel modules.  The TFTP part works, with a caveat[1], but the NFS
> portion fails.
TFTP is as old as the Internet, so it mostly works, and security was in dipers,
so the T for trivial also means un-secure :-)
> 
> With NFS, I'm forced to change permissions on all the exported
> files/directories to be 0644/0755 (specifically, setting other/global
> read/write access) otherwise the client gets back "Permission denied".
> The nfsd(8) man page implies that this shouldn't be necessary; adding
> -mapall=nobody:nobody or -maproot=nobody doesn't fix things either.
> 
why not use -maproot=root?
by adding -ro, the client will be able to read but not modify.
That's what we do here, the /etc is mounted via unionfs to a md, but
that is yet another solution.

>   In the absence of -maproot and -mapall options, remote accesses by root
>   will result in using a credential of -2:-2.  All other users will be
>   mapped to their remote credential.  If a -maproot option is given, remote
>   access by root will be mapped to that credential instead of -2:-2.  If a
>   -mapall option is given, all users (including root) will be mapped to
>   that credential in place of their own.  
> 
> Configuration data, tcpdump validation (client=192.168.1.140,
> server=192.168.1.51), and syslog data is below.
> 
> Ideas?
> 
> [1]: TFTP works as long as the file its trying to request (in this case
> /usr/local/freebsd8/boot/pxeboot) has its other/global read bit set,
> otherwise EACCESS is returned; I had to look in the tftpd source to
> figure this out.  I'm not sure what the justification is there, given
> that use of -s and/or -u switches credentials to user/group nobody...
> 
only root can read a file with mode 0, so you need to set the read bit for
any non root user.

> -- 
> | Jeremy Chadwick   j...@parodius.com |
> | Parodius Networking   http://www.parodius.com/ |
> | UNIX Systems Administrator  Mountain View, CA, USA |
> | Making life hard for others since 1977.  PGP: 4BD6C0CB |
> 
> 
> Relevant server configuration bits:
> 
> /etc/rc.conf
> ==
> rpcbind_enable="yes"
> rpcbind_flags="-l"
> mountd_enable="yes"
> mountd_flags="-r -l"
> nfs_server_enable="yes"
> 
> /etc/exports
> ==
> /usr/local/freebsd8   -network 192.168.1 -mask 255.255.255.0
> 
> Permissions
> =
> drwxr-xr-x  22 rootwheel512 Feb  6 12:25 /
> drwxr-xr-x  17 rootwheel512 Feb 12 03:38 /usr
> drwxr-xr-x  15 rootwheel512 Feb 19 10:41 /usr/local
> drwx--   5 nobody  nobody   512 Feb 19 10:42 /usr/local/freebsd8
> drwx--   7 nobody  nobody  1024 Nov 21 08:11 /usr/local/freebsd8/boot
> drwx--   2 nobody  nobody 12800 Nov 21 08:11 
> /usr/local/freebsd8/boot/kernel
> -r   1 nobody  nobody  11492703 Nov 21 07:48 
> /usr/local/freebsd8/boot/kernel/kernel
> 
> tcpdump
> =
> {...snipping TFTP portion...}
> 10:57:20.601313 IP 192.168.1.140.68 > 255.255.255.255.67: BOOTP/DHCP, Request 
> from 00:30:48:71:60:6b, length 548
> 10:57:20.601442 IP 192.168.1.51.67 > 192.168.1.140.68: BOOTP/DHCP, Reply, 
> length 323
> 10:57:20.601688 IP 192.168.1.140.68 > 255.255.255.255.67: BOOTP/DHCP, Request 
> from 00:30:48:71:60:6b, length 548
> 10:57:20.601782 IP 192.168.1.51.67 > 192.168.1.140.68: BOOTP/DHCP, Reply, 
> length 323
> 10:57:20.613056 IP 192.168.1.140.1023 > 192.168.1.51.111: UDP, length 76
> 10:57:20.613369 IP 192.168.1.51.111 > 192.168.1.140.1023: UDP, length 28
> 10:57:20.613556 IP 192.168.1.140.1023 > 192.168.1.51.947: UDP, length 84
> 10:57:20.613921 IP 192.168.1.51.947 > 192.168.1.140.1023: UDP, length 60
> 10:57:20.614055 IP 192.168.1.140.1023 > 192.168.1.51.111: UDP, length 76
> 10:57:20.614291 IP 192.168.1.51.111 > 192.168.1.140.1023: UDP, length 28
> 10:57:20.614432 IP 192.168.1.140.4 > 192.168.1.51.2049: 100 lookup fh 
> 1197,150310/6618112 "boot"
> 10:57:20.614458 IP 192.168.1.51.2049 > 192.168.1.140.4: reply ok 28 lookup 
> ERROR: Permission denied
> 10:57:20.615436 IP 192.168.1.140.1022 > 192.168.1.51.947: UDP, length 84
> 10:57:20.615677 IP 192.168.1.51.947 > 192.168.1.140.1022: UDP, length 60
> 10:57:20.615806 IP 192.168.1.140.6 > 192.168.1.51.2049: 100 lookup fh 
> 1197,150310/6618112 "boot"
> 10:57:20.615824 IP 192.168.1.51.2049 > 192.168.1.140.6: reply ok 28 lookup 
> ERROR: Permission denied
> 10:57:20.615929 IP 192.168.1.140.1021 > 192.168.1.51.947: UDP, length 84
> 10:57:20.616164 IP 192.168.1.51.9

Re: ntpd struggling to keep up - how to fix?

2010-02-20 Thread Peter Jeremy
On 2010-Feb-20 22:32:01 +0100, Torfinn Ingolfsen 
 wrote:
>On Sat, 20 Feb 2010 12:53:51 +1100
>Peter Jeremy  wrote:
>
>> Looks reasonable.  Let us know the results.  I'd be interested in
>> the output from "ntpdc -c loopi -c sysi".
>
>Ok, here we go (the server panic'ed again last night):
>r...@kg-f2# uptime
>10:28PM  up  2:26, 3 users, load averages: 0.00, 0.00, 0.00
>r...@kg-f2# sysctl machdep.acpi_timer_freq
>machdep.acpi_timer_freq: 3577045
>r...@kg-f2# tvlm
>Feb 20 20:06:41 kg-f2 ntpd[942]: kernel time sync status change 2001
>Feb 20 20:21:49 kg-f2 ntpd[942]: time reset +1.118880 s
>Feb 20 20:37:53 kg-f2 ntpd[942]: time reset +1.188538 s
>Feb 20 20:53:03 kg-f2 ntpd[942]: time reset +1.121903 s
>Feb 20 21:09:00 kg-f2 ntpd[942]: time reset +1.179924 s
>Feb 20 21:24:57 kg-f2 ntpd[942]: time reset +1.178490 s
>Feb 20 21:39:58 kg-f2 ntpd[942]: time reset +1.110647 s
>Feb 20 21:55:53 kg-f2 ntpd[942]: time reset +1.177292 s
>Feb 20 22:11:44 kg-f2 ntpd[942]: time reset +1.172358 s
>Feb 20 22:26:48 kg-f2 ntpd[942]: time reset +1.114350 s

That's definitely not good - though it's marginally better than before.
I have checked on a local machine and the timecounter frequency definitely
needs to be adjusted in the opposite direction to the ntpd drift.

I think I see the problem: I suggested 3579545Hz - 2500ppm, which
gives an ACPI frequency of 3570596Hz.  There was some miscommunication
and you have set an ACPI frequency of 3577045Hz which is 2500Hz (or
698ppm) lower.  The drift reported by the time resets has gone from
+1930ppm (14.5s in 2:05:17) to +1233ppm (8.4s in 2:20:06) - which is
697ppm - fairly close to the change you made.  (The PLL is running
at +500ppm so the actual clock offset is 500ppm more than the "time
reset" reports suggest.

Having re-checked my maths, using both your "time reset" results, can
you please try:
  sysctl machdep.acpi_timer_freq=3570847
That should result in a drift of close to zero (well within NTP's
lock range of +/- 300ppm).

>frequency:500.000 ppm

And this is definitely not good.

>Not synced at all. Not good. :-/
>Perhaps I should give it more time?

No.  Once ntpd decides to continuously step, something is broken.

I've done some double-checking and 
On 2010-Feb-20 22:55:21 +0100, Torfinn Ingolfsen 
 wrote:
>This output looks ... wrong ... somehow to my eyes:
...
>Shouldn't ntpq and ntpdc be in agreement?

I'm not sure which particular bits you are concerned about but ntpq
reports delay/offset/jitter in msec whilst ntpdc reports them in sec.

Note that I can't explain why the loopi offset is zero - ntpdc(8)
states that this is the "last offset given to the loop filter by the
packet processing code".  For me it's non-zero but doesn't quite
match the offset reported by 'ntpq -p'.

-- 
Peter Jeremy


pgpZax0MQojXe.pgp
Description: PGP signature


Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64

2010-02-20 Thread C. P. Ghost
On Sun, Feb 21, 2010 at 12:35 AM, Jeremy Chadwick
wrote:

> We can safely rule out the Silicon Image controller (otherwise "ataX"
> wouldn't be involved), which leaves the AMD SB700 SATA controller and
> the AMD SB700 PATA controller.
>
> What exact disks (e.g. adX) are attached to ata5 and ata6?  You haven't
> provided dmesg output in any of your posts, and atacontrol/pciconf is
> not sufficient (I should really improve atacontrol by printing this
> information.  I'll work on that in a few minutes).
>
> Some Linux users have reported AHCI-related issues with the SB600
> southbridge, but the core of the problem turned out to be MSI on certain
> AMD northbridges (specifically RS480, RS400, and RS200).  By disabling
> MSI entirely they were able to achieve stability.  The FreeBSD
> equivalent would be to set the following in loader.conf and reboot:
>
> hw.pci.enable_msix="0"
> hw.pci.enable_msi="0"
>
> The Linux quirk fix for this:
>
>
> http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob_plain;f=queue-2.6.21/pci-quirks-disable-msi-on-rs400-200-and-rs480.patch;hb=05ab505f2909acf3a614d3e6a32271c4c1f8a69d
>
> Your board has an AMD 740G northbridge, but it might be worth trying the
> MSI disable trick anyway.  If it doesn't fix the problem then definitely
> re-enable MSI.  Isn't hardware fun?  ;-)
>

Just one more data point.

I have a machine with similar hardware: an MSI K9A2GM-FIH motherboard:
  http://eu.msi.com/index.php?func=proddesc&maincat_no=1&prod_no=1436
with an AMD 780G northbridge and AMD SB700 SATA controller,
and I experienced freezes after switching to AHCI.

Those freezes happened e.g. after some sustained random disk activity,
followed by starting 'dvdisaster'. Then the HDD LDD started to blink
sloly, every 10 seconds on, then off and again. No more disk
activity on the aha0 controller was possible. The system remained responsive
as long as it didn't involve disk activity (i.e. pings, mouse, keyboard
etc..,
but not starting new processes).

I'm pretty sure it started happening (very sporadically) only after I've
switched to an AHCI setup. It didn't freeze before under the same load
pattern.

I can't test disabling MSI right now on that box, but will try it on a
similar
test machine in a few days (where I hope to reproduce this). Thanks for
the hint!

ahci0:  port
0xc000-0xc007,0xb000-0xb003,0xa000
-0xa007,0x9000-0x9003,0x8000-0x800f mem 0xfe7ff800-0xfe7ffbff irq 22 at
device 1
7.0 on pci0
ahci0: [ITHREAD]
ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported
ahcich0:  at channel 0 on ahci0
ahcich0: [ITHREAD]
ahcich1:  at channel 1 on ahci0
ahcich1: [ITHREAD]
ahcich2:  at channel 2 on ahci0
ahcich2: [ITHREAD]
ahcich3:  at channel 3 on ahci0
ahcich3: [ITHREAD]
ahcich4:  at channel 4 on ahci0
ahcich4: [ITHREAD]
ahcich5:  at channel 5 on ahci0
ahcich5: [ITHREAD]

ACPI Warning: \\_SB_.PCI0.SBRG.FDC_._FDE: Return type mismatch - found
Package, expected Buffer 20090521 nspredef-1051

(aprobe0:ahcich0:0:0:0): SIGNATURE: 
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0:  ATA/ATAPI-7 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO size 8192bytes)
ada0: Command Queueing enabled
ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)

-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ntpd struggling to keep up - how to fix?

2010-02-20 Thread Jeremy Chadwick
On Sat, Feb 20, 2010 at 10:55:21PM +0100, Torfinn Ingolfsen wrote:
> On Sat, 20 Feb 2010 22:32:01 +0100
> Torfinn Ingolfsen  wrote:
> 
> 
> This output looks ... wrong ... somehow to my eyes:
> r...@kg-f2# date
> Sat Feb 20 22:51:24 CET 2010
> r...@kg-f2# ntpq -p
>  remote   refid  st t when poll reach   delay   offset  jitter
> ==
> *kg-omni1.kg4.no 129.240.64.3 3 u   62   64  3770.244  597.314 360.123
> r...@kg-f2# ntpdc -c loopi -c sysi
> offset:   0.00 s
> frequency:500.000 ppm
> poll adjust:  4
> watchdog timer:   549 s
> system peer:  kg-omni1.kg4.no
> system peer mode: client
> leap indicator:   11
> stratum:  16
> precision:-18
> root distance:0.0 s
> root dispersion:  0.00822 s
> reference ID: [10.1.10.1]
> reference time:   .  Thu, Feb  7 2036  7:28:16.000
> system flags: auth monitor ntp kernel stats 
> jitter:   0.360107 s
> stability:0.000 ppm
> broadcastdelay:   0.003998 s
> authdelay:0.00 s
> 
> Shouldn't ntpq and ntpdc be in agreement?

ntpq and ntpdc output data in slightly different formats, depending on
what arguments you give them.  I'm not familiar with the loopi or sysi
commands; Peter should be able to help here.

For sake of example -- look at ntpq's "delay" column for each peer, and
then look at the same column but for ntpdc.  You'll see that for ntpdc
they're divided by 1000 (presumably kern.hz rate):

$ ntpq -c peers
 remote   refid  st t when poll reach   delay   offset  jitter
==
+clock-a.develoo 204.123.2.72 2 u  476  512  377   25.287   -0.852   0.550
-enigma.wiredgoa 209.81.9.7   2 u  185  512  377   14.7540.284   0.688
+mtnlion.com 139.78.135.142 u  208  512  377   30.788   -0.233   0.160
*ntp1.phoenixpub .LCL.1 u  179  512  377   36.322   -0.552   0.522
-ntp-1.gw.uiuc.e 128.174.38.133   2 u  141  512  377   77.321   -5.381   0.328
-tick.jrc.us 172.21.0.14  2 u  149  512  377  112.424   -8.110   1.440

$ ntpdc -c peers
 remote   local  st poll reach  delay   offsetdisp
===
*mailserv1.phoen 192.168.1.51 1  512  377 0.03632 -0.000552 0.09666
=clock-a.develoo 192.168.1.51 2  512  377 0.02528 -0.000852 0.08611
=tick.jrc.us 192.168.1.51 2  512  377 0.11241 -0.008110 0.08615
=enigma.wiredgoa 192.168.1.51 2  512  377 0.01474  0.000284 0.11473
=mtnlion.com 192.168.1.51 2  512  377 0.03078 -0.000233 0.09665
=ntp-1.gw.uiuc.e 192.168.1.51 2  512  377 0.07732 -0.005381 0.10612

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64

2010-02-20 Thread Jeremy Chadwick
On Sat, Feb 20, 2010 at 10:49:59PM +0100, Torfinn Ingolfsen wrote:
> On Sat, 20 Feb 2010 11:37:18 -0800
> Jeremy Chadwick  wrote:
> 
> > Can you re-run smartctl -a instead of -H?  Some of the SMART attributes
> > may help determine what's going on, or there may be related errors in
> > the SMART error log.
> 
> smartctl -a output attached. Test sequence: ad4 - ad12, ada0.

Most of your disks look to be in decent shape.  Well, that is to say,
all of them should be working fine; I don't see anything that's of
major, or even minor concern.  Others might focus on Attributes 191 or
195, but neither of those are absurdly high given the number of hours
these disks have been in use (see Attribute 9).

> > Otherwise I'd say what's happening is a SATA controller lock-up of some
> > sort, since it happens on any of your channels.  Could be a quirk of
> > some kind in the SATA->CAM stuff (unless it also happens when using pure
> > ata(4)).
> 
> I am running a quite recent 8.0-stable:
> r...@kg-f2# uname -a
> FreeBSD kg-f2.kg4.no 8.0-STABLE FreeBSD 8.0-STABLE #2: Sun Jan 31 18:39:17 
> CET 2010 r...@kg-f2.kg4.no:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> Perhaps I should upgrade.
> 
> > What controller are these disks hooked to again?
> 
> Six  of the disks (ad4, ad6, ad8, ad10, ad12) are connected to the SATA ports 
> on the motherboard:
> r...@kg-f2# pciconf -lv | grep ata -A 4
> atap...@pci0:0:17:0:  class=0x010601 card=0xb0021458 chip=0x43911002 rev=0x00 
> hdr=0x00
> vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
> device = 'SB700 SATA Controller [AHCI mode]'
> class  = mass storage
> subclass   = SATA

Let's backtrack a bit.  I've gone back and read through all of your
previous posts on this matter, and so far all the problems are happening
on ata5 and ata6.  No timeouts or anomalies have appeared on any other
ports -- just those two.  The kernel error messages indicate that
commands submit to the controller took longer than 10 seconds to get a
response, so the OS does a force-reset of the ports in attempt to get
things working again.

We can safely rule out the Silicon Image controller (otherwise "ataX"
wouldn't be involved), which leaves the AMD SB700 SATA controller and
the AMD SB700 PATA controller.

What exact disks (e.g. adX) are attached to ata5 and ata6?  You haven't
provided dmesg output in any of your posts, and atacontrol/pciconf is
not sufficient (I should really improve atacontrol by printing this
information.  I'll work on that in a few minutes).

Some Linux users have reported AHCI-related issues with the SB600
southbridge, but the core of the problem turned out to be MSI on certain
AMD northbridges (specifically RS480, RS400, and RS200).  By disabling
MSI entirely they were able to achieve stability.  The FreeBSD
equivalent would be to set the following in loader.conf and reboot:

hw.pci.enable_msix="0"
hw.pci.enable_msi="0"

The Linux quirk fix for this:

http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob_plain;f=queue-2.6.21/pci-quirks-disable-msi-on-rs400-200-and-rs480.patch;hb=05ab505f2909acf3a614d3e6a32271c4c1f8a69d

Your board has an AMD 740G northbridge, but it might be worth trying the
MSI disable trick anyway.  If it doesn't fix the problem then definitely
re-enable MSI.  Isn't hardware fun?  ;-)

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS and sh(1) panic: spin lock [lock addr] (smp rendezvous) held by [sh(1) proc tid] too long

2010-02-20 Thread Attilio Rao
2010/1/27 Brandon Gooch :
> The machine, a Dell Optiplex 755, has been locking up recently. The
> situation usually occurs while using VirtualBox (running a 64-bit
> Windows 7 instance) and doing anything else in another xterm (such as
> rebuilding a port).  I've been unable to reliably reproduce it (I'm in
> an X session and the machine will not panic "properly").
>
> However, while rebuilding Xorg today at ttyv0 and runnning
> VBoxHeadless on ttyv1, I managed to trigger what I believe is the
> lockup.
>
> I've attached a textdump in hopes that someone may be able to take a
> look and provide clues or instruction on debugging this.

I think that jhb@ saw a similar problem while working on nVidia driver
or the like.
Not sure if he made any progress to debug this.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ntpd struggling to keep up - how to fix?

2010-02-20 Thread Torfinn Ingolfsen
On Sat, 20 Feb 2010 22:32:01 +0100
Torfinn Ingolfsen  wrote:


This output looks ... wrong ... somehow to my eyes:
r...@kg-f2# date
Sat Feb 20 22:51:24 CET 2010
r...@kg-f2# ntpq -p
 remote   refid  st t when poll reach   delay   offset  jitter
==
*kg-omni1.kg4.no 129.240.64.3 3 u   62   64  3770.244  597.314 360.123
r...@kg-f2# ntpdc -c loopi -c sysi
offset:   0.00 s
frequency:500.000 ppm
poll adjust:  4
watchdog timer:   549 s
system peer:  kg-omni1.kg4.no
system peer mode: client
leap indicator:   11
stratum:  16
precision:-18
root distance:0.0 s
root dispersion:  0.00822 s
reference ID: [10.1.10.1]
reference time:   .  Thu, Feb  7 2036  7:28:16.000
system flags: auth monitor ntp kernel stats 
jitter:   0.360107 s
stability:0.000 ppm
broadcastdelay:   0.003998 s
authdelay:0.00 s

Shouldn't ntpq and ntpdc be in agreement?
-- 
Torfinn

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64

2010-02-20 Thread Torfinn Ingolfsen
On Sat, 20 Feb 2010 11:37:18 -0800
Jeremy Chadwick  wrote:

> Can you re-run smartctl -a instead of -H?  Some of the SMART attributes
> may help determine what's going on, or there may be related errors in
> the SMART error log.

smartctl -a output attached. Test sequence: ad4 - ad12, ada0.

> Otherwise I'd say what's happening is a SATA controller lock-up of some
> sort, since it happens on any of your channels.  Could be a quirk of
> some kind in the SATA->CAM stuff (unless it also happens when using pure
> ata(4)).

I am running a quite recent 8.0-stable:
r...@kg-f2# uname -a
FreeBSD kg-f2.kg4.no 8.0-STABLE FreeBSD 8.0-STABLE #2: Sun Jan 31 18:39:17 CET 
2010 r...@kg-f2.kg4.no:/usr/obj/usr/src/sys/GENERIC  amd64

Perhaps I should upgrade.

> What controller are these disks hooked to again?

Six  of the disks (ad4, ad6, ad8, ad10, ad12) are connected to the SATA ports 
on the motherboard:
r...@kg-f2# pciconf -lv | grep ata -A 4
atap...@pci0:0:17:0:class=0x010601 card=0xb0021458 chip=0x43911002 rev=0x00 
hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'SB700 SATA Controller [AHCI mode]'
class  = mass storage
subclass   = SATA
--
atap...@pci0:0:20:1:class=0x01018a card=0x50021458 chip=0x439c1002 rev=0x00 
hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'PATA 133 Controller (SB7xx)'
class  = mass storage
subclass   = ATA
(There is nothing connected to the PATA ports).

The last disk (ada0) is connected to a PCI card:
o...@kg-f2# pciconf -lv | grep siis -A 3
si...@pci0:2:0:0:   class=0x018000 card=0x35311095 chip=0x35311095 rev=0x01 
hdr=0x00
vendor = 'Silicon Image Inc (Was: CMD Technology Inc)'
device = 'SiI 3531 SATA Controller'
class  = mass storage


Hardware info about the machine here:
http://sites.google.com/site/tingox/ga-ma74gm-s2h

HTH
-- 
Torfinn

smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint F1 DT series
Device Model: SAMSUNG HD252HJ
Serial Number:S17HJ9BSA04283
Firmware Version: 1AC01118
User Capacity:250,059,350,016 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3b
Local Time is:Sat Feb 20 22:44:11 2010 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection: (3651) seconds.
Offline data collection
capabilities:(0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:(  62) minutes.
Conveyance self-test routine
recommended polling time:(   8) minutes.
SCT capabilities:  (0x003f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   100   100   051Pre-fail  Always   
-   0
  3 Spin_Up_Time0x0007   092   092   011Pre-fail  Always   
-   3260
  4 Start_Stop_Count0x0032   100   100   000O

Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-20 Thread Slawa Olhovchenkov
On Fri, Feb 19, 2010 at 01:12:01PM -0800, Pyun YongHyeon wrote:

> Normally you should not have any FCS errors, it could be related
> with signal quality and these errors might not be correctly
> counted.

I can't check cable and switch counters on bge1 before Feb 24.

> > 3. packets don't lost on sources at Aug'09
> 
> Since I don't have BCM5704 hardware it's hard to find which
> revision may affect to this issue. Could you narrow down which
> revision number started showing the issue?

I am don't update source between Aug'09 and Feb 16.

4. Packets don't lost immediately after reboot.

PS: I got kernel panic.

===
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x18
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x802eb3b7
stack pointer   = 0x28:0xff80001c66e0
frame pointer   = 0x28:0xff8  01c6740
code segment= base 0x0, limi  0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 724 (named)
[thread pid 724 tid 100051 ]
Stopped at  m_copym+0x37:   movl0x18(%r12),%eax
db> panic
panic: from debugger
cpuid = 0
Uptime: 1d5h55m33s
Physical memory: 2039 MB
Dumping 1448 MB: 1433 1417 1401
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ntpd struggling to keep up - how to fix?

2010-02-20 Thread Torfinn Ingolfsen
On Sat, 20 Feb 2010 12:53:51 +1100
Peter Jeremy  wrote:

> Looks reasonable.  Let us know the results.  I'd be interested in
> the output from "ntpdc -c loopi -c sysi".

Ok, here we go (the server panic'ed again last night):
r...@kg-f2# uptime
10:28PM  up  2:26, 3 users, load averages: 0.00, 0.00, 0.00
r...@kg-f2# sysctl machdep.acpi_timer_freq
machdep.acpi_timer_freq: 3577045
r...@kg-f2# tvlm
Feb 20 20:06:41 kg-f2 ntpd[942]: kernel time sync status change 2001
Feb 20 20:21:49 kg-f2 ntpd[942]: time reset +1.118880 s
Feb 20 20:37:53 kg-f2 ntpd[942]: time reset +1.188538 s
Feb 20 20:53:03 kg-f2 ntpd[942]: time reset +1.121903 s
Feb 20 21:09:00 kg-f2 ntpd[942]: time reset +1.179924 s
Feb 20 21:24:57 kg-f2 ntpd[942]: time reset +1.178490 s
Feb 20 21:39:58 kg-f2 ntpd[942]: time reset +1.110647 s
Feb 20 21:55:53 kg-f2 ntpd[942]: time reset +1.177292 s
Feb 20 22:11:44 kg-f2 ntpd[942]: time reset +1.172358 s
Feb 20 22:26:48 kg-f2 ntpd[942]: time reset +1.114350 s
r...@kg-f2# ntpq -p
 remote   refid  st t when poll reach   delay   offset  jitter
==
 kg-omni1.kg4.no 129.240.64.3 3 u8   6470.176  133.306  77.731
r...@kg-f2# ntpdc -c loopi -c sysi
offset:   0.00 s
frequency:500.000 ppm
poll adjust:  4
watchdog timer:   194 s
system peer:  0.0.0.0
system peer mode: unspec
leap indicator:   11
stratum:  16
precision:-18
root distance:0.0 s
root dispersion:  0.00290 s
reference ID: [83.84.69.80]
reference time:   .  Thu, Feb  7 2036  7:28:16.000
system flags: auth monitor ntp kernel stats 
jitter:   0.358109 s
stability:0.000 ppm
broadcastdelay:   0.003998 s
authdelay:0.00 s

Not synced at all. Not good. :-/
Perhaps I should give it more time?
-- 
Torfinn

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64

2010-02-20 Thread Jeremy Chadwick
On Sat, Feb 20, 2010 at 08:21:08PM +0100, Torfinn Ingolfsen wrote:
> Another day, another crash. 
> >From /var/log/messages:
> Feb 20 08:52:26 kg-f2 ntpd[58609]: time reset +1.169751 s
> Feb 20 08:54:57 kg-f2 kernel: ata5: port is not ready (timeout 1ms) tfd = 
> 007f
> Feb 20 08:54:57 kg-f2 kernel: ata5: hardware reset timeout
> Feb 20 19:18:51 kg-f2 syslogd: kernel boot file is /boot/kernel/kernel
> 
> The drives are as follows:
> r...@kg-f2# atacontrol list;camcontrol devlist
> ATA channel 0:
> Master:  no device present
> Slave:   no device present
> ATA channel 2:
> Master:  ad4  SATA revision 2.x
> Slave:   no device present
> ATA channel 3:
> Master:  ad6  SATA revision 2.x
> Slave:   no device present
> ATA channel 4:
> Master:  ad8  SATA revision 2.x
> Slave:   no device present
> ATA channel 5:
> Master: ad10  SATA revision 2.x
> Slave:   no device present
> ATA channel 6:
> Master: ad12  SATA revision 2.x
> Slave:   no device present
> ATA channel 7:
> Master: ad14  SATA revision 2.x
> Slave:   no device present
>  at scbus0 target 0 lun 0 (pass0,ada0)
> 
> Smartctl is happy, too:
> r...@kg-f2# smartctl -H /dev/ad4
> smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
> Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> r...@kg-f2# smartctl -H /dev/ad6
> smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
> Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> r...@kg-f2# smartctl -H /dev/ad8
> smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
> Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> r...@kg-f2# smartctl -H /dev/ad10
> smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
> Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> r...@kg-f2# smartctl -H /dev/ad12
> smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
> Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> r...@kg-f2# smartctl -H /dev/ada0
> smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
> Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> Maybe the hardware is just plain broken.

Can you re-run smartctl -a instead of -H?  Some of the SMART attributes
may help determine what's going on, or there may be related errors in
the SMART error log.

Otherwise I'd say what's happening is a SATA controller lock-up of some
sort, since it happens on any of your channels.  Could be a quirk of
some kind in the SATA->CAM stuff (unless it also happens when using pure
ata(4)).

What controller are these disks hooked to again?

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64

2010-02-20 Thread Torfinn Ingolfsen
Another day, another crash. 
>From /var/log/messages:
Feb 20 08:52:26 kg-f2 ntpd[58609]: time reset +1.169751 s
Feb 20 08:54:57 kg-f2 kernel: ata5: port is not ready (timeout 1ms) tfd = 
007f
Feb 20 08:54:57 kg-f2 kernel: ata5: hardware reset timeout
Feb 20 19:18:51 kg-f2 syslogd: kernel boot file is /boot/kernel/kernel

The drives are as follows:
r...@kg-f2# atacontrol list;camcontrol devlist
ATA channel 0:
Master:  no device present
Slave:   no device present
ATA channel 2:
Master:  ad4  SATA revision 2.x
Slave:   no device present
ATA channel 3:
Master:  ad6  SATA revision 2.x
Slave:   no device present
ATA channel 4:
Master:  ad8  SATA revision 2.x
Slave:   no device present
ATA channel 5:
Master: ad10  SATA revision 2.x
Slave:   no device present
ATA channel 6:
Master: ad12  SATA revision 2.x
Slave:   no device present
ATA channel 7:
Master: ad14  SATA revision 2.x
Slave:   no device present
 at scbus0 target 0 lun 0 (pass0,ada0)

Smartctl is happy, too:
r...@kg-f2# smartctl -H /dev/ad4
smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

r...@kg-f2# smartctl -H /dev/ad6
smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

r...@kg-f2# smartctl -H /dev/ad8
smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

r...@kg-f2# smartctl -H /dev/ad10
smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

r...@kg-f2# smartctl -H /dev/ad12
smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

r...@kg-f2# smartctl -H /dev/ada0
smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

Maybe the hardware is just plain broken.
-- 
Torfinn

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"