Re: error building kernel: nfs_clvfsops.o: In function `nfs_mount':, nfs_clvfsops.c:(.text+0x1638): undefined reference to `nfs_diskless_valid'

2011-04-26 Thread Rick Macklem
 Since today's source (FreeBSD 9.0-CURRENT/amd64 (source is: Revision:
 221060) update I get the follwoing error while building the kernel
 (options NFSD/options NFSCL instead of options NFSSERVER/options
 NFSCLIENT):
 
 cc -c -O2 -frename-registers -pipe -fno-strict-aliasing -march=native
 -std=c99 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes
 -W issing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef
 -Wno-pointer-sign -fformat-extensions -nostdinc -I. -I/usr/src/sys
 -I/usr/src/s s/contrib/altq -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS
 -include opt_global.h -fno-common -finline-limit=8000 --param
 inline-unit-growth=100 --par m large-function-growth=1000
 -fno-omit-frame-pointer -mcmodel=kernel -mno-red-zone -mfpmath=387
 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-ss 3 -msoft-float
 -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector
 -Werror
 vers.c
 linking kernel
 nfs_clvfsops.o: In function `nfs_mount':
 nfs_clvfsops.c:(.text+0x1638): undefined reference to
 `nfs_diskless_valid'
 nfs_clvfsops.c:(.text+0x1652): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1658): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1689): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x16d1): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1712): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x171b): more undefined
 references
 to `nfsv3_diskless' follow
 nfs_clvfsops.o: In function `nfs_mount':
 nfs_clvfsops.c:(.text+0x1e19): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1e2a): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1e31): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1e3d): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1e44): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1e4a): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1e50): undefined reference to `nfs_diskless'
 nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x1e57): more undefined
 references
 to `nfs_diskless' follow
 nfs_clvfsops.o: In function `nfs_mount':
 nfs_clvfsops.c:(.text+0x1e65): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1e6b): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1e73): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1e79): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1e80): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1e87): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1e8e): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1e94): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1e9a): undefined reference to `nfs_diskless'
 nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x1ea0): more undefined
 references
 to `nfs_diskless' follow
 nfs_clvfsops.o: In function `nfs_mount':
 nfs_clvfsops.c:(.text+0x1eb3): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1ebd): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1ec4): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1ecb): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1ed2): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x1ed9): more undefined
 references
 to `nfsv3_diskless' follow
 nfs_clvfsops.o: In function `nfs_mount':
 nfs_clvfsops.c:(.text+0x1f18): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1f1e): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1f33): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1f3a): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1f4b): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1f52): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1f5e): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1f6a): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1f71): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1f78): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1f83): undefined reference to
 `nfs_diskless_valid'
 nfs_clvfsops.c:(.text+0x1fcc): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x1fd3): undefined reference to `nfs_diskless'
 nfs_clvfsops.c:(.text+0x1fd9): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.c:(.text+0x20ae): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.o:(.data+0x1f8): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.o:(.data+0x258): undefined reference to `nfsv3_diskless'
 nfs_clvfsops.o:(.data+0x2b8): undefined reference to
 `nfs_diskless_valid'
 *** Error code 1
 
Oops, you'll have to add options NFS_ROOT to your kernel config until
I commit a fix.

Thanks for spotting it, rick
ps: And a fresh config KERNEL followed by a build. I suspect you already
did that.

Heads up: was Re: error building kernel: nfs_clvfsops.o: In function `nfs_mount':, nfs_clvfsops.c:(.text+0x1638): undefined reference to `nfs_diskless_valid'

2011-04-26 Thread Rick Macklem
  Since today's source (FreeBSD 9.0-CURRENT/amd64 (source is:
  Revision:
  221060) update I get the follwoing error while building the kernel
  (options NFSD/options NFSCL instead of options NFSSERVER/options
  NFSCLIENT):
 
  cc -c -O2 -frename-registers -pipe -fno-strict-aliasing
  -march=native
  -std=c99 -Wall -Wredundant-decls -Wnested-externs
  -Wstrict-prototypes
  -W issing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef
  -Wno-pointer-sign -fformat-extensions -nostdinc -I. -I/usr/src/sys
  -I/usr/src/s s/contrib/altq -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS
  -include opt_global.h -fno-common -finline-limit=8000 --param
  inline-unit-growth=100 --par m large-function-growth=1000
  -fno-omit-frame-pointer -mcmodel=kernel -mno-red-zone -mfpmath=387
  -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-ss 3 -msoft-float
  -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector
  -Werror
  vers.c
  linking kernel
  nfs_clvfsops.o: In function `nfs_mount':
  nfs_clvfsops.c:(.text+0x1638): undefined reference to
  `nfs_diskless_valid'
  nfs_clvfsops.c:(.text+0x1652): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1658): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1689): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x16d1): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1712): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x171b): more undefined
  references
  to `nfsv3_diskless' follow
  nfs_clvfsops.o: In function `nfs_mount':
  nfs_clvfsops.c:(.text+0x1e19): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1e2a): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1e31): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1e3d): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1e44): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1e4a): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1e50): undefined reference to `nfs_diskless'
  nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x1e57): more undefined
  references
  to `nfs_diskless' follow
  nfs_clvfsops.o: In function `nfs_mount':
  nfs_clvfsops.c:(.text+0x1e65): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1e6b): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1e73): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1e79): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1e80): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1e87): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1e8e): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1e94): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1e9a): undefined reference to `nfs_diskless'
  nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x1ea0): more undefined
  references
  to `nfs_diskless' follow
  nfs_clvfsops.o: In function `nfs_mount':
  nfs_clvfsops.c:(.text+0x1eb3): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1ebd): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1ec4): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1ecb): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1ed2): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x1ed9): more undefined
  references
  to `nfsv3_diskless' follow
  nfs_clvfsops.o: In function `nfs_mount':
  nfs_clvfsops.c:(.text+0x1f18): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1f1e): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1f33): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1f3a): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1f4b): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1f52): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1f5e): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1f6a): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1f71): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1f78): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1f83): undefined reference to
  `nfs_diskless_valid'
  nfs_clvfsops.c:(.text+0x1fcc): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x1fd3): undefined reference to `nfs_diskless'
  nfs_clvfsops.c:(.text+0x1fd9): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.c:(.text+0x20ae): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.o:(.data+0x1f8): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.o:(.data+0x258): undefined reference to
  `nfsv3_diskless'
  nfs_clvfsops.o:(.data+0x2b8): undefined reference to
  `nfs_diskless_valid'
  *** Error code 1
 
 Oops, you'll have to add options NFS_ROOT to your kernel config
 until
 I commit a fix.
 

Re: nfs error: No route to host when starting apache ...

2011-04-03 Thread Rick Macklem
 On Fri, 1 Apr 2011, Rick Macklem wrote:
 
  Since rpc.lockd and rpc.statd expect to be able to do IP broadcast
  (same goes for rpcbind), I suspect that might be a problem w.r.t.
  jails, although I know nothing about how jails work?
 
  Oh, and you can use the nolock mount option to avoid use of
  rpc.lockd and rpc.statd.
 
 based on the mount_nfs man page, as well as trying it just in case,
 this
 option no longer appears to be availalble in the 7.x nfs code ... :(
 
Oops, sorry. The option is called nolockd.

rick
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: nfs error: No route to host when starting apache ...

2011-04-01 Thread Rick Macklem
 I just setup an nfs mount between two servers ...
 
 ServerA, nfsd on 192.168.1.8
 ServerB, nfs client on 192.168.1.7
 
 I have a jail, ServerC, running on 192.168.1.7 ... most operations
 appear
 to work, but it looks like 'special files' of a sort aren't working,
 for
 when I try and startup Apache, I get:
 
 [Fri Apr 01 19:42:02 2011] [emerg] (65)No route to host: couldn't grab
 the
 accept mutex
 
 When I try and do a 'newaliases', I get:
 
 # newaliases
 postalias: fatal: lock /etc/aliases.db: No route to host
 
 Yet, for instance, both MySQL and PostgreSQL are running without any
 issues ...
 
 So, the mount is there, it is readable, it is working ... I can ssh
 into
 the jail, I can create files, etc ...
 
 I do have rpc.lockd and rpc.statd running on both client / server
 sides
 ...
 
Since rpc.lockd and rpc.statd expect to be able to do IP broadcast
(same goes for rpcbind), I suspect that might be a problem w.r.t.
jails, although I know nothing about how jails work?

 I'm not seeing anything in eithr the man page for mount_nfs *or* nfsd
 that
 might account / corect for something like this, but since I'm not sure
 what this is exactly, not sure exactl what I should be looking for
 :(
 
 Note that this behaviour happens at the *physical* server level as
 well,
 having tested with using postalias to generate the same 'lock' issue
 above
 ...
 
 Now, I do have mountd/nfsd started iwth the -h to bind them to
 192.168.1.8
 ... *but*, the servers themselves, although on same switch do have
 different default gateways ... I'm not seeing anything within the man
 page
 for, say, rpc.statd/rpc.lockd that allows me to bind it to the
 192.168.1.0/24 IP, so is it binding to my public IP instead of my
 private?
 So nfsd / mount_nfs can talk find, as they go thorugh 192.168.1.0/24
 as
 desired, but rpc.statd/rpc.lockd are the public IPs and not able to
 talk
 to each other?
 
 Thx ...
 ___
 freebsd-...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: nfs error: No route to host when starting apache ...

2011-04-01 Thread Rick Macklem
  I just setup an nfs mount between two servers ...
 
  ServerA, nfsd on 192.168.1.8
  ServerB, nfs client on 192.168.1.7
 
  I have a jail, ServerC, running on 192.168.1.7 ... most operations
  appear
  to work, but it looks like 'special files' of a sort aren't working,
  for
  when I try and startup Apache, I get:
 
  [Fri Apr 01 19:42:02 2011] [emerg] (65)No route to host: couldn't
  grab
  the
  accept mutex
 
  When I try and do a 'newaliases', I get:
 
  # newaliases
  postalias: fatal: lock /etc/aliases.db: No route to host
 
  Yet, for instance, both MySQL and PostgreSQL are running without any
  issues ...
 
  So, the mount is there, it is readable, it is working ... I can ssh
  into
  the jail, I can create files, etc ...
 
  I do have rpc.lockd and rpc.statd running on both client / server
  sides
  ...
 
 Since rpc.lockd and rpc.statd expect to be able to do IP broadcast
 (same goes for rpcbind), I suspect that might be a problem w.r.t.
 jails, although I know nothing about how jails work?
 
Oh, and you can use the nolock mount option to avoid use of
rpc.lockd and rpc.statd.

  I'm not seeing anything in eithr the man page for mount_nfs *or*
  nfsd
  that
  might account / corect for something like this, but since I'm not
  sure
  what this is exactly, not sure exactl what I should be looking for
  :(
 
  Note that this behaviour happens at the *physical* server level as
  well,
  having tested with using postalias to generate the same 'lock' issue
  above
  ...
 
  Now, I do have mountd/nfsd started iwth the -h to bind them to
  192.168.1.8
  ... *but*, the servers themselves, although on same switch do have
  different default gateways ... I'm not seeing anything within the
  man
  page
  for, say, rpc.statd/rpc.lockd that allows me to bind it to the
  192.168.1.0/24 IP, so is it binding to my public IP instead of my
  private?
  So nfsd / mount_nfs can talk find, as they go thorugh 192.168.1.0/24
  as
  desired, but rpc.statd/rpc.lockd are the public IPs and not able to
  talk
  to each other?
 
  Thx ...
  ___
  freebsd-...@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-net
  To unsubscribe, send any mail to
  freebsd-net-unsubscr...@freebsd.org
 ___
 freebsd-...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: possible NFS lockups

2010-08-01 Thread Rick Macklem
 From: Sam Fourman
 On Tue, Jul 27, 2010 at 10:29 AM, krad kra...@googlemail.com wrote:
  I have a production mail system with an nfs backend. Every now and
  again we
  see the nfs die on a particular head end. However it doesn't die
  across all
  the nodes. This suggests to me there isnt an issue with the filer
  itself and
  the stats from the filer concur with that.
 
  The symptoms are lines like this appearing in dmesg
 
  nfs server 10.44.17.138:/vol/vol1/mail: not responding
  nfs server 10.44.17.138:/vol/vol1/mail: is alive again
 
  trussing df it seems to hang on getfsstat, this is presumably when
  it tries
  the nfs mounts
 
 
 I also have this problem, where nfs locks up on a FreeBSD 9 server
 and a FreeBSD RELENG_8 client
 
If by RELENG_8, you mean 8.0 (or pre-8.1), there are a number
of patches for the client side krpc. They can be found at:
http://people.freebsd.org/~rmacklem/freebsd8.0-patches

(These are all in FreeBSD8.1, so ignore this if your client is
already running FreeBSD8.1.)

rick
ps: lock up can mean many things. The more specific you can
be w.r.t. the behaviour, the more likely it can be resolved.
For example:
- No more access to the subtree under the mount point is
  possible until the client is rebooted. When a ps axlH
  one process that was accessing a file in the mount point
  is shown with WCHAN rpclock and STAT DL.
vs
- All access to the mount point stops for about 1minute
  and then recovers.

Also, showing what mount options are being used by the
client and whether or not rpc.lockd and rpc.statd are
running can also be useful.
And if you can look at the net ttraffic with wireshark
when it is locked up and see if any NFS traffic is
happening can also be useful.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: possible NFS lockups

2010-07-31 Thread Rick Macklem
 From: krad kra...@googlemail.com
 To: freebsd-hack...@freebsd.org, FreeBSD Questions 
 freebsd-questions@freebsd.org
 Sent: Tuesday, July 27, 2010 11:29:20 AM
 Subject: possible NFS lockups
 I have a production mail system with an nfs backend. Every now and
 again we
 see the nfs die on a particular head end. However it doesn't die
 across all
 the nodes. This suggests to me there isnt an issue with the filer
 itself and
 the stats from the filer concur with that.
 
 The symptoms are lines like this appearing in dmesg
 
 nfs server 10.44.17.138:/vol/vol1/mail: not responding
 nfs server 10.44.17.138:/vol/vol1/mail: is alive again
 
 trussing df it seems to hang on getfsstat, this is presumably when it
 tries
 the nfs mounts
 
 eg
 
 __sysctl(0xbfbfe224,0x2,0xbfbfe22c,0xbfbfe230,0x0,0x0) = 0 (0x0)
 mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =
 1746583552 (0x681ac000)
 mmap(0x682ac000,344064,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0)
 =
 1747632128 (0x682ac000)
 munmap(0x681ac000,344064) = 0 (0x0)
 getfsstat(0x68201000,0x1270,0x2,0xbfbfe960,0xbfbfe95c,0x1) = 9 (0x9)
 
 
 I have played with mount options a fair bit but they dont make much
 difference. This is what they are set to at present
 
 10.44.17.138:/vol/vol1/mail /mail/0 nfs
 rw,noatime,tcp,acdirmax=320,acdirmin=180,acregmax=320,acregmin=180 0 0
 
 When this locking is occuring I find that if I do a show mount or
 mount
 10.44.17.138:/vol/vol1/mail again under another mount point I can
 access it
 fine.
 
 One thing I have just noticed is that lockd and statd always seem to
 have
 died when this happens. Restarting does not help
 
 
lockd and statd implement separate protocols (NLM ans NSM) that do
locking. The protocols were poorly designed and fundamentally
broken imho. (That refers to the protocols and not the implementation.)

I am not familiar with the lockd and statd implementations, but if you
don't need file locking to work for the same file when accessed
concurrently from multiple clients (heads) concurrently, you can use
the nolockd mount option to avoid using them. (I have no idea if
the mail system you are using will work without lockd or not? It
should be ok to use nolockd if file locking is only done on a
given file in one client node.)

I suspect that some interaction between your server and the
lockd/statd client causes them to crash and then the client is
stuck trying to talk to them, but I don't really know? Looking
at where all the processes and threads are sleeping via ps axlH
may tell you what is stuck and where.

As others noted, intermittent server not responding...server ok
messages just indicate slow response from the server and don't
mean much. However, if a given process is hung and doesn't
recover, knowing what it is sleeping on can help w.r.t diagnosis.

rick

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: FreeBSD NFS client goes into infinite retry loop

2010-03-23 Thread Rick Macklem



On Tue, 23 Mar 2010, John Baldwin wrote:



Ah, I had read that patch as being a temporary testing hack.  If you think
that would be a good approach in general that would be ok with me.


Well, it kinda was. I wasn't betting on it fixing the problem, but since
it does...

I think just mapping VFS_FHTOVP() errors to ESTALE is ok. Do you think
I should ask pjd@ about it or just go ahead with a commit?

Thanks for the help, rick

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: FreeBSD NFS client goes into infinite retry loop

2010-03-22 Thread Rick Macklem



On Mon, 22 Mar 2010, John Baldwin wrote:


It looks like it also returns ESTALE when the inode is invalid (
ROOTINO ||  max inodes?) - would an unlinked file in FFS referenced at
a later time report an invalid inode?



I'm no ufs guy, but the only way I can think of is if the file system
on the server was newfs'd with fewer i-nodes? (Unlikely, but...)
(Basically, it is safe to return ESTALE for anything that is not
 a transient failure that could recover on a retry.)


But back to your point, zfs_zget() seems to be failing and returning the
EINVAL before zfs_fhtovp() even has a chance to set and check zp_gen.
I'm trying to get some more details through the use of gratuitous
dprintf()'s, but they don't seem to be making it to any logs or the
console even with vfs.zfs.debug=1 set.  Any pointers on how to get these
dprintf() calls working?


I know diddly (as in absolutely nothing about zfs).


That I have no idea on.  Maybe Rick can chime in?  I'm actually not sure why
we would want to treat a FHTOVP failure as anything but an ESTALE error in the
NFS server to be honest.

As far as I know, only if the underlying file system somehow has a 
situation where the file handle can't be translated at that point in time, 
but could be able to later. I have no idea if any file system is like that 
and I don't such a file system would be an appropriate choice for an NFS 
server, even if such a beast exists. (Even then, although FreeBSD's client 
assumes EIO might recover on a retry, that isn't specified in any RFC, as 
far as I know.)


That's why I proposed a patch that simply translates all VFS_FHTOVP()
errors to ESTALE in the NFS server. (It seems simpler than chasing down 
cases in all the underlying file systems?)


rick, chiming in:-)

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: FreeBSD NFS client goes into infinite retry loop

2010-03-19 Thread Rick Macklem



On Fri, 19 Mar 2010, John Baldwin wrote:


On Friday 19 March 2010 7:34:23 am Steve Polyack wrote:

Hi, we use a FreeBSD 8-STABLE (from shortly after release) system as an
NFS server to provide user home directories which get mounted across a
few machines (all 6.3-RELEASE).  For the past few weeks we have been
running into problems where one particular client will go into an
infinite loop where it is repeatedly trying to write data which causes
the NFS server to return reply ok 40 write ERROR: Input/output error
PRE: POST:.  This retry loop can cause between 20mbps and 500mbps of


I'm afraid I don't quite understand what you mean by causes the NFS
server to return reply ok 40 write ERROR Is this something
logged by syslog (I can't find a printf like this in the kernel
sources) or is this something that tcpdump is giving you or ???

Why I ask is that it seems to say that the server is returning EIO
(or maybe 40 == EMSGSIZE).

The server should return ESTALE (NFSERR_STALE) after a file has
been deleted. If it is returning EIO, then that will cause the
client to keep trying to write the dirty block to the server.
(EIO is interpreted by the client as a transient error.)

[good stuff snipped]


I have a feeling that using NFS in such a matter may simply be prone to
such problems, but what confuses me is why the NFS client system is
infinitely retrying the write operation and causing itself so much grief.


Yes, your feeling is correct.  This sort of race is inherent to NFS if you do
not use some sort of locking protocol to resolve the race.  The infinite
retries sound like a client-side issue.  Have you been able to try a newer OS
version on a client to see if it still causes the same behavior?


As John notes, having one client delete a file while another is trying
to write it, is not a good thing.

However, the server should return ESTALE after the file is deleted and
that tells the client that the write can never succeed, so it marks the
buffer cache block invalid and returns the error to the app. (The app.
may not see it, if it doesn't check for error returns upon close as well
as write, but that's another story...)

If you could look at a packet trace via wireshark when the problem
occurs, it would be nice to see what the server is returning. (If it
isn't ESTALE and the file no longer exists on the server, then thats
a server problem.) If it is returning ESTALE, then the client is busted.
(At a glance, the client code looks like it would handle ESTALE as a
fatal error for the buffer cache, but that doesn't mean it isn't broken,
just that it doesn't appear wrong. Also, it looks like mmap'd writes
won't recognize a fatal write error and will just keep trying to write
the dirty page back to the server. Take this with a big grain of salt,
since I just took a quick look at the sources. FreeBSD6-8 appear to
be pretty much the same as far as this goes, in the client.

Please let us know if you can see the server's error reply code.

Good luck with it, rick
ps: If the server isn't returning ESTALE, you could try switching to
the experimental nfs server and see if it exhibits the same behaviour?
(-e option on both mountd and nfsd, assuming the server is
 FreeBSD8.)
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: FreeBSD NFS client goes into infinite retry loop

2010-03-19 Thread Rick Macklem



On Fri, 19 Mar 2010, Steve Polyack wrote:



To anyone who is interested: I did some poking around with DTrace, which led 
me to the nfsiod client code.

In src/sys/nfsclient/nfs_nfsiod.c:
   } else {
   if (bp-b_iocmd == BIO_READ)
   (void) nfs_doio(bp-b_vp, bp, bp-b_rcred, NULL);
   else
   (void) nfs_doio(bp-b_vp, bp, bp-b_wcred, NULL);
   }



If you look t nfs_doio(), it decides whether or not to mark the buffer
invalid, based on the return value it gets. Some (EINTR, ETIMEDOUT, EIO)
are not considered fatal, but the others are. (When the async I/O
daemons call nfs_doio(), they are threads that couldn't care less if
the underlying I/O op succeeded. The outcome of the I/O operation
determines what nfs_doio() does with the buffer cache block.)



The result is that my problematic repeatable circumstance begins logging 
nfssvc_iod: iod 0 nfs_doio returned errno: 5 (corresponding to 
NFSERR_INVAL?) for each repetition of the failed write.  The only things 
triggering this are my failed writes.  I can also see the nfsiod0 process 
waking up each iteration.




Nope, errno 5 is EIO and that's where the problem is. I don't know why
the server is returning EIO after the file has been deleted on the
server (I assume you did that when running your little shell script?).


Do we need some kind of retry x times then abort logic within nfsiod_iod(), 
or does this belong in the subsequent functions, such as nfs_doio()?  I think 
it's best to avoid these sorts of infinite loops which have the potential to 
take out the system or overload the network due to dumb decisions made by 
unprivileged users.



Nope, people don't like data not getting written back to a server when
it is slow or temporarily network partitioned. The only thing that should
stop a client from retrying a write back to the server is a fatal error
from the server that says this won't ever succeed.

I think we need to figure out if the EIO (NFS3ERR_IO in wireshark) or
if the server is sending NFS3ERR_STALE and the client is somehow munging
that into EIO, causing the confusion.

rick

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: FreeBSD NFS client goes into infinite retry loop

2010-03-19 Thread Rick Macklem



On Fri, 19 Mar 2010, Steve Polyack wrote:

[good stuff snipped]


This makes sense.  According to wireshark, the server is indeed transmitting 
Status: NFS3ERR_IO (5).  Perhaps this should be STALE instead; it sounds 
more correct than marking it a general IO error.  Also, the NFS server is 
serving its share off of a ZFS filesystem, if it makes any difference.  I 
suppose ZFS could be talking to the NFS server threads with some mismatched 
language, but I doubt it.



Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return
ESTALE when the file no longer exists, the NFS server returns whatever
error it has returned.

So, either VFS_FHTOVP() succeeds after the file has been deleted, which
would be a problem that needs to be fixed within ZFS
OR
ZFS returns an error other than ESTALE when it doesn't exist.

Try the following patch on the server (which just makes any error
returned by VFS_FHTOVP() into ESTALE) and see if that helps.

--- nfsserver/nfs_srvsubs.c.sav 2010-03-19 22:06:43.0 -0400
+++ nfsserver/nfs_srvsubs.c 2010-03-19 22:07:22.0 -0400
@@ -1127,6 +1127,8 @@
}
}
error = VFS_FHTOVP(mp, fhp-fh_fid, vpp);
+   if (error != 0)
+   error = ESTALE;
vfs_unbusy(mp);
if (error)
goto out;

Please let me know if the patch helps, rick

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: NFSv4: mount -t nsf4 not the same as mount_newnfs?

2010-02-09 Thread Rick Macklem



On Tue, 9 Feb 2010, O. Hartmann wrote:

Well, I guess I havn't uderstood everything of NFSv4. The 'concept' of the 
'root' is new to me, maybe there are some deeper explanation of the purpose? 
Are there supposed to be more than one 'root' enries or only one?




Only to specify different security flavours for different client host
IP#s. There is only one root location in the file system tree. This
was done for NFSv4 to avoid any need for the mount protocol. See below.

At this very moment mounting seems to work, but I always get a 'permission 
denied' error on every ZFS exported filesystem. Doing the same with UFS2 
filesystems, everything works as expected.




In NFSv4 mount does very little, since it does not use the mount 
protocol. It basically passes a pathname from the NFSv4 root into

the kernel for later use. (Since UFS doesn't actually check exports, the
experimental server checks them, but cheats and allows a minimal set
of NFSv4 Operations on non-exported volumes, so that this pathname can
be traversed to the exported volume.

At this time ZFS checks exports. As such everything in the tree from the
root specified by the V4: line must be exported for ZFS to work. I
believe others have gotten a ZFS export to work, but I have no experience
with it at this time.


Is there a way to inspect the exports and mounts for the used NFS-protocol?


Not that I am aware. (Excluding ZFS, which I don't know anything about, 
the /etc/exports file specifies the exports.)


When issuing 'mount', the 'backup' mount is repoted to be 'newnfs', I assume 
this reflects NFSv4 being used, now I need to figure out what's going wrong 
with the ZFS export. NFS export of the ZFS filesystem is enabled, but as far 
as I know, this feature is not used in FreeBSD since ZFS in FreeBSD lacks of 
the capabilities of autonomously exporting its via NFS - well, I'm not an 
expert in this matter.



I'm definitely not a ZFS expert either:-) I think the mount command is
showing you that the mount point was created (newnfs refers to the
experimental client), but as noted above, that doesn't indicate that
it is accessible. (If you haven't tried moving the V4: /backup ...
that moves the NFSv4 root to /backup, you should do that and see
how it goes.)

Good luck with it, rick

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: NFSv4: mount -t nsf4 not the same as mount_newnfs?

2010-02-08 Thread Rick Macklem



On Mon, 8 Feb 2010, O. Hartmann wrote:



Mounting the filessystem via

mount_newnfs host:/path /path


Oh, and you should set:
sysctl vfs.newnfs.locallocks_enable=0
in the server, since I haven't fixed the local locking yet. (This implies
that apps/daemons running locally on the server won't see byte range
locks performed by NFSv4 clients.) However, byte range locking between
NFSv4 clients should work ok.

rick
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: NFSv4: mount -t nsf4 not the same as mount_newnfs?

2010-02-08 Thread Rick Macklem



On Mon, 8 Feb 2010, O. Hartmann wrote:



Mounting the filessystem via

mount_newnfs host:/path /path

works fine, but not

mount -t nfs4 host:/path /path.



The mount command can be either:
mount -t nfs -o nfsv4 host:/path /path
or
mount -t newnfs -o nfsv4 host:/path /path
(The above was what the old now removed nfs4 used.)

Have fun with it, rick
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: NFSv4: mount -t nsf4 not the same as mount_newnfs?

2010-02-08 Thread Rick Macklem



On Mon, 8 Feb 2010, O. Hartmann wrote:



Oh, and you should set:
sysctl vfs.newnfs.locallocks_enable=0
in the server, since I haven't fixed the local locking yet. (This implies
that apps/daemons running locally on the server won't see byte range
locks performed by NFSv4 clients.) However, byte range locking between
NFSv4 clients should work ok.



Interesting, I see a lot of vfs.newfs-stuff on server-side, but not this 
specific OID. Do I miss something here?




Oops, make that vfs.newnfs.enable_locallocks=0

rick
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: NFSv4: mount -t nsf4 not the same as mount_newnfs?

2010-02-08 Thread Rick Macklem



On Mon, 8 Feb 2010, O. Hartmann wrote:



So I guess the above one is the more 'transparent' one with respect to the 
future, when NFSv4 gets mature and its way as matured into the kernel?




Yea, I'd only use mount -t newnfs if for some reason you want to 
test/use the experimental client for nfsv2,3 instead of the regular one.


I tried the above and it works. But it seems, that only UFS2 filesystems can 
be mounted by the client. When trying mounting a filesystem residing on ZFS, 
it fails. Mounting works, but when try to access or doing a simple 'ls', I 
get


ls: /backup: Permission denied


On server side, /etc/exports looks like

--
V4: /   -sec=sys:krb5   #IPv4#

/backup  #IPv4#
--

Is there still an issue with ZFS?


For ZFS, everything from the root specified by the V4: line
must be exported at this time. So, if / isn't exported, the
above won't work for ZFS. You can either export / or move the
NFSv4 root down to backup. For example, you could try:

V4: /backup -sec=sys:krb5
/backup

(assuming /backup is the ZFS volume)

and then a mount like:
mount -t nfs -o nfsv4 server:/ /mnt
will mount /backup on /mnt

rick
ps: ZFS also has its own export stuff, but it is my understanding that
putting a line in /etc/exports is sufficient. I've never used ZFS,
so others will know more than I.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


VFS KPI was Re: [OpenAFS-devel] Re: AFS ... or equivalent ...

2008-01-17 Thread Rick Macklem



On Wed, 16 Jan 2008, Robert Watson wrote:

[good stuff snipped]


Right now we maintain a relatively stable VM/VFS KPI withing a major release 
(i.e, FreeBSD 6.0 - 6.1 - 6.2 - 6.3), but see fairly significant changes 
between major releases (5.x - 6.x - 7.x, etc).  I expect to see further 
changes in VFS for 8.x (and some of the locking-related ones have already 
started going in).



This is loosely related to both the OpenAFS thread and the Mac OS X ZFS
port thread, so I thought I'd ask...

Has anyone considered trying to bring the FreeBSD VFS KPI (and others, for
that matter) closed to the Darwin/Mac OS X ones? The Apple folks made
quite dramatic changes to their VFS when going from Panther (very FreeBSD
like) to Tiger, but seemed to have stabilized, at least for Leopard. It
just seems that using the Mac OS X KPIs might leverage some work being
done on both sides? (I don't know if there is an OpenAFS port to Mac OS X
or interest in one, but I would think there would be a use for one, if it
existed?)

Although I'm far from an expert on the Mac OS X VFS (when I ported to it,
I just cribbed the code and it worked:-), it seems that they pretty well
got rid of the concept of a vnode-lock. If the underlying file system 
isn't SMP safe, it can put a lock on the subsystem at the VFS call.

(I think it optionally does a global lock or a uses an smp lock in the
vnode, but don't quote me on this. My code currently runs with the
thread-safe flag false in the vfs_conf structure entry, which enables
the automagic locking.)

Just a thought, rick

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]