Re: vm_map.c lock up (Was: Re: NFS Locking Issue)

2006-07-15 Thread Anish Mistry
On Saturday 15 July 2006 00:08, User Freebsd wrote:
> On Sat, 15 Jul 2006, Kostik Belousov wrote:
> > On Sat, Jul 15, 2006 at 12:10:29AM -0300, User Freebsd wrote:
> >> On Wed, 5 Jul 2006, Robert Watson wrote:
> >>> If you can get into DDB when the hang has occurred, output via
> >>> serial console for the following commands would be very
> >>> helpful:
> >>>
> >>> show pcpu
> >>> show allpcpu
> >>> ps
> >>> trace
> >>> traceall
> >>> show locks
> >>> show alllocks
> >>> show uma
> >>> show malloc
> >>> show lockedvnods
> >>
> >> 'k, after 16 days uptime, the server that I got all the
> >> debugging turned on for finally hung up solid ... I was able to
> >> break into DDB over the serial link, and have run all of the
> >> above on it ... and the output is attached ...
> >>
> >> One thing to note is that the ps listing is not complete ...
> >> there are >6k processes running at the time, and I don't know
> >> how to get rid of the '--more--' prompt :(  After 1k processes,
> >> I just hit 'q' and went onto the other commands ...
> >
> > set lines=0
> >
> >> Also, traceall gave me a 'No such command' error ... now that I
> >> think about it, my luck, it was supposed to be 'trace all'?
> >
> > It is alltrace.
> >
> >> If this doesn't provide enough information, please let me know
> >> what else I should do the next time through, besides the above
> >> commands ...
> >
> > Missing alltrace output seems to be critical. If this is not
> > feasible, please, provide at least the output of the bt  for
> > each pid shown in the "show lockedvnods" and "show alllocks". In
> > you case, bt 64880 was the most interesting. It is pity that you
> > had reset the machine.
>
> Was down for too long as it was ... it, of course, happened while I
> was out with the family :(
>
> Will keep all of this in mind next time I get a chance to run
> through things ...
>
> Any idea why 'panic' doesn't produce core like it used to?
call doadump
Should force a core dump.

-- 
Anish Mistry


pgpR6RAW6o4vE.pgp
Description: PGP signature


Re: vm_map.c lock up (Was: Re: NFS Locking Issue)

2006-07-14 Thread Antony Mawer

On 14/07/2006 6:08 PM, User Freebsd wrote:
Just in case, do you use mlocked mappings ? Also, why so huge number 
of crons exist in the system ? The are all forking now. It may be (can 
not say definitely without further investigation) just a fork bomb.


re: crons ... this, I'm not sure of, but my suspicion was that the crons 
weren't able to complete, since the file system was locked up, but the 
next one was being attempted to run ... *shrug*


This seems consistent with behaviour I've seen in on several 6.0-RELEASE 
machines.. from the limited information I've been able to get from the 
machines, there has appeared to be multiple tasks from cron all piled up 
upon one another. In particular, the daily periodic tasks that run the 
various 'find' were one of the things I noticed (although we run 
numerous tasks out of cron)...


If something is blocking the filesystem and causing find (and possibly 
other processes) to become stuck, these would just keep mounting up 
until it all falls over (with numerous maxproc exceeded etc errors).


These are on machines without NFS, but the symptoms are very very 
similar.. NWFS and SMBFS are commonly used on a number of the machines 
I've seen the problem on, which may be relevant -- perhaps it affects 
more than just NFS?


I may experiment with building up a test server locally and trying to 
reproduce similar loads to see if I can trigger the problem in-house.. 
at least that way I can hook up a serial console and get some more 
detailed information...


Regards
Antony

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: vm_map.c lock up (Was: Re: NFS Locking Issue)

2006-07-14 Thread User Freebsd

On Sat, 15 Jul 2006, Kostik Belousov wrote:


On Sat, Jul 15, 2006 at 12:10:29AM -0300, User Freebsd wrote:



On Wed, 5 Jul 2006, Robert Watson wrote:


If you can get into DDB when the hang has occurred, output via serial
console for the following commands would be very helpful:

show pcpu
show allpcpu
ps
trace
traceall
show locks
show alllocks
show uma
show malloc
show lockedvnods


'k, after 16 days uptime, the server that I got all the debugging turned
on for finally hung up solid ... I was able to break into DDB over the
serial link, and have run all of the above on it ... and the output is
attached ...

One thing to note is that the ps listing is not complete ... there are >6k
processes running at the time, and I don't know how to get rid of the
'--more--' prompt :(  After 1k processes, I just hit 'q' and went onto the
other commands ...

set lines=0


Also, traceall gave me a 'No such command' error ... now that I think
about it, my luck, it was supposed to be 'trace all'?

It is alltrace.


If this doesn't provide enough information, please let me know what else I
should do the next time through, besides the above commands ...

Missing alltrace output seems to be critical. If this is not feasible,
please, provide at least the output of the bt  for each pid
shown in the "show lockedvnods" and "show alllocks". In you case,
bt 64880 was the most interesting. It is pity that you had reset the
machine.


Was down for too long as it was ... it, of course, happened while I was 
out with the family :(


Will keep all of this in mind next time I get a chance to run through 
things ...


Any idea why 'panic' doesn't produce core like it used to?

Just in case, do you use mlocked mappings ? Also, why so huge number of 
crons exist in the system ? The are all forking now. It may be (can not 
say definitely without further investigation) just a fork bomb.


mlocked mappings?  What are they? :)

re: crons ... this, I'm not sure of, but my suspicion was that the crons 
weren't able to complete, since the file system was locked up, but the 
next one was being attempted to run ... *shrug*



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: vm_map.c lock up (Was: Re: NFS Locking Issue)

2006-07-14 Thread Kostik Belousov
On Sat, Jul 15, 2006 at 12:10:29AM -0300, User Freebsd wrote:
> 
> 
> On Wed, 5 Jul 2006, Robert Watson wrote:
> 
> >If you can get into DDB when the hang has occurred, output via serial 
> >console for the following commands would be very helpful:
> >
> >show pcpu
> >show allpcpu
> >ps
> >trace
> >traceall
> >show locks
> >show alllocks
> >show uma
> >show malloc
> >show lockedvnods
> 
> 'k, after 16 days uptime, the server that I got all the debugging turned 
> on for finally hung up solid ... I was able to break into DDB over the 
> serial link, and have run all of the above on it ... and the output is 
> attached ...
> 
> One thing to note is that the ps listing is not complete ... there are >6k 
> processes running at the time, and I don't know how to get rid of the 
> '--more--' prompt :(  After 1k processes, I just hit 'q' and went onto the 
> other commands ...
set lines=0
> 
> Also, traceall gave me a 'No such command' error ... now that I think 
> about it, my luck, it was supposed to be 'trace all'?
It is alltrace.
> 
> If this doesn't provide enough information, please let me know what else I 
> should do the next time through, besides the above commands ...
Missing alltrace output seems to be critical. If this is not feasible,
please, provide at least the output of the bt  for each pid
shown in the "show lockedvnods" and "show alllocks". In you case,
bt 64880 was the most interesting. It is pity that you had reset the
machine.

Just in case, do you use mlocked mappings ? Also, why so huge number
of crons exist in the system ? The are all forking now. It may be
(can not say definitely without further investigation) just a fork bomb.


pgpGRGY1ljkXo.pgp
Description: PGP signature


Re: vm_map.c lock up (Was: Re: NFS Locking Issue)

2006-07-14 Thread User Freebsd

On Sat, 15 Jul 2006, User Freebsd wrote:


On Wed, 5 Jul 2006, Robert Watson wrote:

If you can get into DDB when the hang has occurred, output via serial 
console for the following commands would be very helpful:


show pcpu
show allpcpu
ps
trace
traceall
show locks
show alllocks
show uma
show malloc
show lockedvnods


'k, after 16 days uptime, the server that I got all the debugging turned on 
for finally hung up solid ... I was able to break into DDB over the serial 
link, and have run all of the above on it ... and the output is attached ...


One thing to note is that the ps listing is not complete ... there are >6k 
processes running at the time, and I don't know how to get rid of the 
'--more--' prompt :(  After 1k processes, I just hit 'q' and went onto the 
other commands ...


Also, traceall gave me a 'No such command' error ... now that I think about 
it, my luck, it was supposed to be 'trace all'?


If this doesn't provide enough information, please let me know what else I 
should do the next time through, besides the above commands ...


Oh, and how do you get DDB to 'dump core' in 6.x?  Back in 4.x days, I'd just 
do 'panic' (maybe twice) at the DDB prompt, but that didn't work with 6.x ... 
it just gave me a stacktrace and then the DDB> prompt both times ...


Quick appendum ... the kernel on this server is from June 28th of this 
year ...



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


vm_map.c lock up (Was: Re: NFS Locking Issue)

2006-07-14 Thread User Freebsd



On Wed, 5 Jul 2006, Robert Watson wrote:

If you can get into DDB when the hang has occurred, output via serial console 
for the following commands would be very helpful:


show pcpu
show allpcpu
ps
trace
traceall
show locks
show alllocks
show uma
show malloc
show lockedvnods


'k, after 16 days uptime, the server that I got all the debugging turned 
on for finally hung up solid ... I was able to break into DDB over the 
serial link, and have run all of the above on it ... and the output is 
attached ...


One thing to note is that the ps listing is not complete ... there are >6k 
processes running at the time, and I don't know how to get rid of the 
'--more--' prompt :(  After 1k processes, I just hit 'q' and went onto the 
other commands ...


Also, traceall gave me a 'No such command' error ... now that I think 
about it, my luck, it was supposed to be 'trace all'?


If this doesn't provide enough information, please let me know what else I 
should do the next time through, besides the above commands ...


Oh, and how do you get DDB to 'dump core' in 6.x?  Back in 4.x days, I'd 
just do 'panic' (maybe twice) at the DDB prompt, but that didn't work with 
6.x ... it just gave me a stacktrace and then the DDB> prompt both times 
...



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664

typescript.gz
Description: Binary data
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-06 Thread Ronald Klop

On Wed, 05 Jul 2006 02:49:26 +0200, Scott Long <[EMAIL PROTECTED]> wrote:


Michel Talon wrote:

BTW, I noticed yesterday that that IPv6 support committ to rpc.lockd  
was never backed out.  An immediate question for people experiencing  
new rpc.lockd problems with 6.x should be whether or not backing out  
that change helps.

  So it may be relevant to say that i have kernels without IPV6 support.
Recall that i have absolutely no problem with the client in FreeBSD-6.1.
Tomorrow i will test one of the 6.1 machines as a NFS server and the  
other as

a client, and will make you know if i see something.
 As to the problems you mention about NFS Linux, yes i have seen a lot  
since
years. But to my surprise FC5 seems to work well. By the way it is  
kernel

2.6.16 so sufficiently recent for the problems to have been ironed out,
presumably.



2.6.16 should be OK.  I've heard of problems with cookie and handle  
sizes with it, but only under highly unusual circumstances.


Scott


Just for the record.

I'm running a 6.1-STABLE client with a Debian 3.1 server with kernel  
2.6.12 and that works ok with nfs locking. Locking didn't work in the past  
(6.0-STABLE).


Ronald.

--
 Ronald Klop
 Amsterdam, The Netherlands
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread User Freebsd

On Wed, 5 Jul 2006, Francisco Reyes wrote:


Scott Long writes:


For what it's worth, I recently spent a lot of time putting FreeBSD 6.1
to the test as both an NFS client and server in a mixed OS environment.


I have a few debugging settings/suggestions that have been sent my way and I 
plan to try them tonight, but this is just another report..


FreeBSD only environment.
Today after hours going crazy with horrible performance I brought down nfsd 
and brought it back up.. that simple process got vmstat 'b' column down and 
everything was back to normal.


Again this will not help anyone troubleshoot, but just to mention that it 
happens even with a FreeBSD only environment.


'k, to those out there that know what is useful, and what isn't ...

If Francisco had DDB enabled, did a CTL-ALT-ESC when the above happens, 
and does a 'panic' to crash the server and dump a core ... can anything 
useful be gleamed from that core dump?



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread Francisco Reyes

User Freebsd writes:


What are others using for ethernet?


Of our two machines having the problem 1 has BGE and the other one has EM 
(Intel). Doesn't seem to make much of a difference.


Except for the network cards, these two machines are identical. Same 
motherboard, same RAID controller, same amount of RAM, same RAID 
configuration...


 
___

freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread Robert Watson


On Wed, 5 Jul 2006, Francisco Reyes wrote:

can you trigger it using work on just one client against a server, without 
client<->client interactions?  This makes tracking and reproduction a lot 
easier


Personally I am experiencing two problems.
1- NFS clients freeze/hang if the server goes away.
We have clients with several mounts so if one of the servers dies then the 
entire operation of the client is put in jeopardy.


This I can reproduce every single time with a 6.X client.. with both a 5.X 
and a 6.X server.


"umount -f" hangs too.


The problems you are experiencing are almost certainly not related to 
rpc.lockd, rather, bugs in the NFS client.


Let's just look at the normal use hang for now, and revisit umount -f after 
that.



as multi-client test cases are really tricky!


The second case only happens under heavy load and restarting nfsd makes it 
go away. Basically 'b' column in vmstat goes high and the performnance of 
the machine falls to the floor.


Going to try 
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneld 
ebug-deadlocks.html


And reading up on how to debug with DDB. Have another user who volunteered 
to give me some pointers.. so will try that.. so I am able to actually 
produce more helpfull info.


If you can get into DDB when the hang has occurred, output via serial console 
for the following commands would be very helpful:


show pcpu
show allpcpu
ps
trace
traceall
show locks
show alllocks
show uma
show malloc
show lockedvnods

Note that the last two will only work if you compile WITNESS in -- WITNESS 
significantly changes kernel timing, so you may find it closes whatever race 
you're running into.  If you can reproduce the problem with WITNESS and 
INVARIANTS, that would be very useful.  The above output will hopefully tell 
us the basic state of the system with respect to processes, threads, locking, 
and so on, and may help us track things down.  For the above, you definitely 
want a serial console as it will be quite a bit of output.


Also, can you send the output of the 'mount' command from the un-hung state? 
I notice a lot of threads stuck in 'ufs'.


Finally, during the above, if you could disable background file system 
checking by placing the following in /etc/rc.conf:


  background_fsck="NO"

And boot to single user mode, doing a full fsck -p before booting up, in order 
to make sure the file system is in a good state before beginning.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread Francisco Reyes

User Freebsd writes:

I believe, in Francisco's case, they are willing to pay someone to fix the 
NFS issues they are having, which, i'd assume, means easy access to the 
problematic server(s) to do proper testing in a "real life scenario" ...


Correct. As long as the person is someone "trusted in the community" we 
could do that. And yes we are willing to come to some agreement for 
compensation for the help. Needless to say our introduction of new machines 
will go through a more rigourous test in the future.. specially when jumping 
to a new Release number in FreeBSD. 

We lost 1 big customer and after today we likely will loose 2 or 3 more.. of 
the big ones.. when it's all said and done we are likely to loose several 
thousand dollars/month due to this 6.X incidents.


We are fairly new to NFS and that's why we were hoping to get someone to 
help us.. or at least point us in the right direction.


I plan to go over the link you sent me and try to prepare at least one 
machine. 

As for paying someone, yes we have been actively looking for someone to help 
us since we are relatively new to NFS.. and much more newer to 
troubleshooting this type of prolbems

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread Francisco Reyes

Robert Watson writes:

It's not impossible.  It would be interesting to see if ps axl reports that 
rpc.lockd is in the kqread state


Found my post in another thread.
0   354 1   0  96  0  1412  1032 select Ss??0:07.06 
/usr/sbin/rpcbind


It was not in kqread state.. and that was from a point where the machine was 
totally locked up.. had to do a physical reset.. could not even kill nfsd 
that time.


I had also more output from several different ps. You need to do "view more" 
to see them all.


http://tinyurl.com/kpejr
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread Francisco Reyes

Robert Watson writes:

It's not impossible.  It would be interesting to see if ps axl reports that 
rpc.lockd is in the kqread state, which would suggest it was blocked in the 
resolver.


Just tried "ps axl | grep rpc" in the machine giving us the most grief.. 
Only got one line back:

root  367  0.0  0.0  1368   960  ??  Ss   25Jun06   0:05.52 /usr/sbin/rpcbin
 0 1   0   4  0 select

Is that what one of the lines I should keep an eye, next time the machine is 
locked up?

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread Francisco Reyes

Robert Watson writes:

can you trigger it using work on just one client against a server, without 
client<->client interactions?  This makes tracking and reproduction a lot 
easier


Personally I am experiencing two problems.
1- NFS clients freeze/hang if the server goes away.
We have clients with several mounts so if one of the servers dies then the 
entire operation of the client is put in jeopardy.


This I can reproduce every single time with a 6.X client.. with both a 5.X 
and a 6.X server.


"umount -f" hangs too.


as multi-client test cases are really tricky! 


The second case only happens under heavy load and restarting nfsd makes it 
go away. Basically 'b' column in vmstat goes high and the performnance of 
the machine falls to the floor.


Going to try 
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneld 
ebug-deadlocks.html


And reading up on how to debug with DDB. Have another user who volunteered 
to give me some pointers.. so will try that.. so I am able to actually 
produce more helpfull info.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread Francisco Reyes

Scott Long writes:


For what it's worth, I recently spent a lot of time putting FreeBSD 6.1
to the test as both an NFS client and server in a mixed OS environment.


I have a few debugging settings/suggestions that have been sent my way and I 
plan to try them tonight, but this is just another report..


FreeBSD only environment.
Today after hours going crazy with horrible performance I brought down nfsd 
and brought it back up.. that simple process got vmstat 'b' column down and 
everything was back to normal.


Again this will not help anyone troubleshoot, but just to mention that it 
happens even with a FreeBSD only environment.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread Michel Talon
> with the bge driver ... could we be possibly talking internet vs nfs 
> issues?

Pursuing invetigations, i have discovered that for people having 
workstations whose home directories are on a NFS server, and who run 
Gnome or KDE, there is a program which has horrible NFS behavior,
it is gam_server from gamin, which detects alterations on your .kde
for example. On my machine running nfsstat -c -w 1 i see 4000 requests/s
due to that. If i displace it (*) and kill it, this drops to 80 requests/s
and KDE works exactly as well, including discovering new files.
I think it is not necessary to comment on the performance penalty if a number
of stations send 4000r/s to a server, it will soon be killed.
(*) it restarts itself automatically so it is necessary to displace or rename
it before killing.

-- 

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread User Freebsd

On Wed, 5 Jul 2006, Michel Talon wrote:


So it may be relevant to say that i have kernels without IPV6 support.
Recall that i have absolutely no problem with the client in FreeBSD-6.1.
Tomorrow i will test one of the 6.1 machines as a NFS server and the other as
a client, and will make you know if i see something.


Well, i have checked between 2 FreeBSD-6.1-RELEASE machines on the network,
both have fxp ethernet driver running at 100 Mb/s, one is NFS server other NFS
client. Both run lockd and statd. I have absolutely no problem exchanging
files, for example if i begin to copy /usr/src through NFS from one machine to
the other, which makes a lot of transactions of all sorts, i get:
niobe# mount asmodee:/usr/src /mnt
cp -R /mnt/src .
...
after some time i interrupt the transfer
niobe% du -sh .
131M.
and during this time i observe the following type of statistics
asmodee% netstat -w 1 -I fxp0
  input (fxp0)   output
  packets  errs  bytespackets  errs  bytes colls
  542 0  84116   1330 01219388 0
  515 0  72806   1290 01196330 0
  501 0  95722   1081 0 741048 0
  539 0  90704   1090 01228052 0
  645 0  67888902 01451098 0
  405 0  81264   1609 0 604278 0
  503 0  74218709 0 924422 0
  500 0  98904973 0 619350 0
  550 0 100122855 0 836328 0
  615 0  79336   1081 0 862772 0
  577 0  82862901 01005024 0

which looks decent to me.

Doing the same with just one big file no problem either, and i get a transfer
speed of 6.60 MB/s which is perhaps a little less than with linux, but nothing
catastrophic. I get 8.20 MB/s for FreeBSD client interacting with the Linux
server.

Now netstat gives
 packets  errs  bytespackets  errs  bytes colls
  785 0 123266   4716 06825600 0
  759 0 139898   4530 07747276 0
  852 0 124652   5106 06902566 0
  863 0 128040   5170 07081738 0
  811 0 123760   4862 06851498 0
  789 0 123540   4720 06834310 0
  840 0 115378   5024 06382114 0

So up to what i can see NFS works OK for me on FreeBSD-6.1.

So the main difference with other people cases may be that i have removed IPV6
support from kernel.


What are others using for ethernet?  In your case, you say you are running 
between fxp cards ... I've heard some report, in another thread, problems 
with the bge driver ... could we be possibly talking internet vs nfs 
issues?



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread Robert Watson


On Mon, 3 Jul 2006, Michael Collette wrote:


-
Let's start with the simplest.  The scenario here involves 2 machines, mach01 
and mach02.  Both are running 6-STABLE, and both are running rpcbind, 
rpc.statd, and rpc.lockd.  mach01 has exported /documents and mach02 is 
mounting that export under /mnt.  Simple enough?


The /documents directory has multiple subdirectories and files of various 
sizes.  The actual amount of data doesn't really matter to produce a failure. 
All you need to do at this point is to try to copy files from that mount 
point to somewhere else on the hard drive.


cp -Rp /mnt/* /tmp/documents/

You may, or not, see that a couple of subdirectories were created, but no 
files actually moved over.  The cp command is now locked up, and no traffic 
moves.  This usually takes a second or two to show up as a problem.  I can 
repeat this with multiple 6-STABLE boxes.


Turn off rpc.lockd on either the server or client before the cp command, and 
things work.


I've tried several times to reproduce this, and have not succeeded in doing 
so.  In princple, cp should not be using advisory locks.  Could you try 
running cp under ktrace, and saving the ktrace file somewhere outside of NFS? 
Something like the following:


  ktrace -f /usr/tmp/localfile cp -Rp /mnt/* /tmp/documents/

If you are able to reproduce the problem with tracing turned on, a copy of the 
tracefile would be very helpful.


Also, when it locks up, are you able to kill cp using Ctrl-C, and if you hit 
Ctrl-T while it appears locked, what output do you get?


Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread User Freebsd

On Wed, 5 Jul 2006, Robert Watson wrote:


On Wed, 5 Jul 2006, Danny Braniss wrote:

In my case our main servers are NetApp, and the problems are more related 
to am-utils running into some race condition (need more time to debug this 
:-) the other problem is related to throughput, freebsd is slower than 
linux, and while freebsd/nfs/tcp is faster on Freebsd than udp, on linux 
it's the same. So it seems some tunning is needed.


our main problem now is samba/rpc.lockd, we are stuck with a server running 
FreeBSD 5.4 which crashes, and we can't upgrade to 6.1 because lockd 
doesn't work.


So, if someone is willing to look into the lockd issue, we would like to 
help.


The most significant problem working with rpc.lockd is creating easy to 
reproduce test cases.  Not least because they can potentially involve 
multiple clients.  If you can help to produce simple test cases to reproduce 
the bugs you're seeing, that would be invaluable.


I'm aware of two general classes of problems with rpc.lockd.  First, 
architectural issues, some derived from architectural problems in the NLM 
protocol: for example, assumptions that there can be a clean mapping of 
process lock owners to locks, which fall down as locks are properties of file 
descriptors that can be inheritted.  Second, implementation bugs/misfeatures, 
such as the kernel not knowing how to cancel lock requests, so being unable 
to implement interruptible waits on locks in the distributed case.


Reducing complex failure modes to easily reproduced test cases is tricky 
also, though.  It requires careful analysis, often with ktrace and 
tcpdump/ethereal to work out what's going on, and not a little luck to 
perform the reduction of a large trace down to a simple test scenario.  The 
first step is to try and figure out what, if any, specific workload results 
in a problem.  For example, can you trigger it using work on just one client 
against a server, without client<->client interactions?  This makes tracking 
and reproduction a lot easier, as multi-client test cases are really tricky! 
Once you've established whether it can be reproduced with a single client, 
you have to track down the behavior that triggers it -- normally, this is 
done by attempting to narrow down the specific program or sequence of events 
that causes the bug to trigger, removing things one at a time to see what 
causes the problem to disappear.  This is made more difficult as lock 
managers are sensitive to timing, so removing a high load item from the list, 
even if it isn't the source of the problem, might cause it to trigger less 
frequently.


I'm not sure if this is an option for anyone, either developer or user, 
but in the past, on particularly tricky bugs where I seemed to be the only 
one to be able to produce it, I've given access to a 'trusted developer' 
to the machine itself, to minimize the time lag that emails create ... 
but, also, to let the developer at a machine that has the load required to 
easily reproduce it ...


Not sure if there is anyone out there, on either side of the proverbial 
fence, that feels comfortable doing this, but figured I'd throw the idea 
out ...


I believe, in Francisco's case, they are willing to pay someone to fix the 
NFS issues they are having, which, i'd assume, means easy access to the 
problematic server(s) to do proper testing in a "real life scenario" ...



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread Michel Talon
> So it may be relevant to say that i have kernels without IPV6 support.
> Recall that i have absolutely no problem with the client in FreeBSD-6.1.
> Tomorrow i will test one of the 6.1 machines as a NFS server and the other as
> a client, and will make you know if i see something.

Well, i have checked between 2 FreeBSD-6.1-RELEASE machines on the network,
both have fxp ethernet driver running at 100 Mb/s, one is NFS server other NFS
client. Both run lockd and statd. I have absolutely no problem exchanging
files, for example if i begin to copy /usr/src through NFS from one machine to
the other, which makes a lot of transactions of all sorts, i get:
niobe# mount asmodee:/usr/src /mnt
cp -R /mnt/src .
...
after some time i interrupt the transfer 
niobe% du -sh .
131M.
and during this time i observe the following type of statistics
asmodee% netstat -w 1 -I fxp0
   input (fxp0)   output
   packets  errs  bytespackets  errs  bytes colls
   542 0  84116   1330 01219388 0 
   515 0  72806   1290 01196330 0 
   501 0  95722   1081 0 741048 0 
   539 0  90704   1090 01228052 0 
   645 0  67888902 01451098 0 
   405 0  81264   1609 0 604278 0 
   503 0  74218709 0 924422 0 
   500 0  98904973 0 619350 0 
   550 0 100122855 0 836328 0 
   615 0  79336   1081 0 862772 0 
   577 0  82862901 01005024 0 
   
which looks decent to me.

Doing the same with just one big file no problem either, and i get a transfer
speed of 6.60 MB/s which is perhaps a little less than with linux, but nothing
catastrophic. I get 8.20 MB/s for FreeBSD client interacting with the Linux
server.

Now netstat gives
  packets  errs  bytespackets  errs  bytes colls
   785 0 123266   4716 06825600 0 
   759 0 139898   4530 07747276 0 
   852 0 124652   5106 06902566 0 
   863 0 128040   5170 07081738 0 
   811 0 123760   4862 06851498 0 
   789 0 123540   4720 06834310 0 
   840 0 115378   5024 06382114 0 
   
So up to what i can see NFS works OK for me on FreeBSD-6.1. 

So the main difference with other people cases may be that i have removed IPV6
support from kernel.

-- 

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread Kostik Belousov
On Wed, Jul 05, 2006 at 02:04:59PM +0100, Robert Watson wrote:
> 
> On Wed, 5 Jul 2006, Kostik Belousov wrote:
> 
> >>Also, the both lockd processes now put identification information in the 
> >>proctitle (srv and kern). SIGUSR1 shall be sent to srv process.
> >
> >Hmm, after looking at the dump there and some code reading, I have noted 
> >the following:
> >
> >1. NLM lock request contains the field caller_name. It is filled by (let 
> >call it) kernel rpc.lockd by the results of hostname(3).
> >
> >2. This caller_name is used by server rpc.lockd to send request for host 
> >monitoring to rpc.statd (see send_granted). Request is made by clnt_call, 
> >that is blocking rpc call.
> >
> >3. rpc.statd does getaddrinfo on caller_name to determine address of the 
> >host to monitor.
> >
> >If the getaddrinfo in step 3 waits for resolver, then your client machine 
> >will get locking process in"lockd" state.
> >
> >Could people experiencing rpc.lockd mistery at least report whether 
> >_server_ machine successfully resolve hostname of clients as reported by 
> >hostname? And, if yes, to what family of IP protocols ?
> 
> It's not impossible.  It would be interesting to see if ps axl reports that 
> rpc.lockd is in the kqread state, which would suggest it was blocked in the 
  rpc.statd :).
> resolver.  We probably ought to review rpc.statd and make sure it's 
> generally sensible.  I've noticed that its notification process on start is 
> a bit poorly structured in terms of how it notifies hosts of its state 
> change -- if one host is down, it may take a very long time to notify other 
> hosts.


pgpExEUvwNn5G.pgp
Description: PGP signature


Re: NFS Locking Issue

2006-07-05 Thread Robert Watson


On Wed, 5 Jul 2006, Kostik Belousov wrote:

Also, the both lockd processes now put identification information in the 
proctitle (srv and kern). SIGUSR1 shall be sent to srv process.


Hmm, after looking at the dump there and some code reading, I have noted the 
following:


1. NLM lock request contains the field caller_name. It is filled by (let 
call it) kernel rpc.lockd by the results of hostname(3).


2. This caller_name is used by server rpc.lockd to send request for host 
monitoring to rpc.statd (see send_granted). Request is made by clnt_call, 
that is blocking rpc call.


3. rpc.statd does getaddrinfo on caller_name to determine address of the 
host to monitor.


If the getaddrinfo in step 3 waits for resolver, then your client machine 
will get locking process in"lockd" state.


Could people experiencing rpc.lockd mistery at least report whether _server_ 
machine successfully resolve hostname of clients as reported by hostname? 
And, if yes, to what family of IP protocols ?


It's not impossible.  It would be interesting to see if ps axl reports that 
rpc.lockd is in the kqread state, which would suggest it was blocked in the 
resolver.  We probably ought to review rpc.statd and make sure it's generally 
sensible.  I've noticed that its notification process on start is a bit poorly 
structured in terms of how it notifies hosts of its state change -- if one 
host is down, it may take a very long time to notify other hosts.


There are a number of other dubious things about the NLM protocol design (at 
least, from my reading last night). I've also noticed that our rpc.lockd is 
particularly sensitive, on the client side, to locks being released by a 
different process than the process that acquired the lock, which is triggered 
excessively by our new libpidfile in RELENG_6.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread Kostik Belousov
On Wed, Jul 05, 2006 at 02:38:22PM +0300, Kostik Belousov wrote:
> On Wed, Jul 05, 2006 at 10:09:24AM +0100, Robert Watson wrote:
> > The most significant problem working with rpc.lockd is creating easy to 
> > reproduce test cases.  Not least because they can potentially involve 
> > multiple clients.  If you can help to produce simple test cases to 
> > reproduce the bugs you're seeing, that would be invaluable.
> > 
> 
> > 
> > Reducing complex failure modes to easily reproduced test cases is tricky 
> > also, though.  It requires careful analysis, often with ktrace and 
> > tcpdump/ethereal to work out what's going on, and not a little luck to 
> > perform the reduction of a large trace down to a simple test scenario.  The 
> > first step is to try and figure out what, if any, specific workload results 
> > in a problem.  For example, can you trigger it using work on just one 
> > client against a server, without client<->client interactions?  This makes 
> > tracking and reproduction a lot easier, as multi-client test cases are 
> > really tricky!  Once you've established whether it can be reproduced with a 
> > single client, you have to track down the behavior that triggers it -- 
> > normally, this is done by attempting to narrow down the specific program or 
> > sequence of events that causes the bug to trigger, removing things one at a 
> > time to see what causes the problem to disappear.  This is made more 
> > difficult as lock managers are sensitive to timing, so removing a high load 
> > item from the list, even if it isn't the source of the problem, might cause 
> > it to trigger less frequently.
> 
> I made the patch for rpc.lockd that could somewhat ease obtaining
> debug information. Patch is available at
> http://people.freebsd.org/~kib/rpc.lockd-debug.patch
> 
> No functional changes. Patch only adds dumping of currently held locks
> (as perceived by lockd) on receiving of SIGUSR1. You need to specify
> debug level 2 or 3 to obtain the dump.
> 
> Also, the both lockd processes now put identification information
> in the proctitle (srv and kern). SIGUSR1 shall be sent to srv process.

Hmm, after looking at the dump there and some code reading, I have noted
the following:

1. NLM lock request contains the field caller_name. It is filled by
(let call it) kernel rpc.lockd by the results of hostname(3).

2. This caller_name is used by server rpc.lockd to send request
for host monitoring to rpc.statd (see send_granted).
Request is made by clnt_call, that is blocking rpc call.

3. rpc.statd does getaddrinfo on caller_name to determine address of the
host to monitor.

If the getaddrinfo in step 3 waits for resolver, then your client machine
will get locking process in"lockd" state.

Could people experiencing rpc.lockd mistery at least report whether
_server_ machine successfully resolve hostname of clients as reported
by hostname? And, if yes, to what family of IP protocols ?


pgpqXwVLbOl6l.pgp
Description: PGP signature


Re: NFS Locking Issue

2006-07-05 Thread Kostik Belousov
On Wed, Jul 05, 2006 at 10:09:24AM +0100, Robert Watson wrote:
> The most significant problem working with rpc.lockd is creating easy to 
> reproduce test cases.  Not least because they can potentially involve 
> multiple clients.  If you can help to produce simple test cases to 
> reproduce the bugs you're seeing, that would be invaluable.
> 

> 
> Reducing complex failure modes to easily reproduced test cases is tricky 
> also, though.  It requires careful analysis, often with ktrace and 
> tcpdump/ethereal to work out what's going on, and not a little luck to 
> perform the reduction of a large trace down to a simple test scenario.  The 
> first step is to try and figure out what, if any, specific workload results 
> in a problem.  For example, can you trigger it using work on just one 
> client against a server, without client<->client interactions?  This makes 
> tracking and reproduction a lot easier, as multi-client test cases are 
> really tricky!  Once you've established whether it can be reproduced with a 
> single client, you have to track down the behavior that triggers it -- 
> normally, this is done by attempting to narrow down the specific program or 
> sequence of events that causes the bug to trigger, removing things one at a 
> time to see what causes the problem to disappear.  This is made more 
> difficult as lock managers are sensitive to timing, so removing a high load 
> item from the list, even if it isn't the source of the problem, might cause 
> it to trigger less frequently.

I made the patch for rpc.lockd that could somewhat ease obtaining
debug information. Patch is available at
http://people.freebsd.org/~kib/rpc.lockd-debug.patch

No functional changes. Patch only adds dumping of currently held locks
(as perceived by lockd) on receiving of SIGUSR1. You need to specify
debug level 2 or 3 to obtain the dump.

Also, the both lockd processes now put identification information
in the proctitle (srv and kern). SIGUSR1 shall be sent to srv process.


pgpyMjtyKCekU.pgp
Description: PGP signature


Re: NFS Locking Issue

2006-07-05 Thread Chris H.

Quoting Michel Talon <[EMAIL PROTECTED]>:


So it would appear that you cured the NFS problems inherent with FBSD-6
by replacing FBSD with Fedora Linux. Nice to know that NFSd works in Linux.
But won't help those on the FBSD list fix their FBSD-6 boxen. :/



First NFS is designed to make machines of different OSs interact properly.

Yes, this is it's purpose.

If a FreeBSD server interacts properly with a FreeBSD client, but not other
clients, you cannot say that the situation is fine.

Indeed.

Second i am not the one to chose the NFS server, there are people working
in social groups, in the real world.

And third, the most important, the OP message seemed to imply that the
FreeBSD-6 NFS client was at fault, i pointed out that in my experience my
FreeBSD-6.1 client works OK, while the 6.0 doesn't, when  interacting with a
FC5 server. This is in itself a relevant piece of information for the problem
at hand. It may be that the server side is at fault, or some complex
interaction between client and server.

Of course. I quite agree. Horrible oversight on my part.


Anyways some people claimed here that they had no problem with FreeBSD-5
clients and servers. My experience is that i had constant problems
between FreeBSD-5 clients and Fedora Core 3 servers. I cannot provide any
other data point. I am not particularly sure of the quality of the FC3 or
FC5 NFS server implementation, except that the ~ 100 workstations
running the similar Fedora distribution work like a charm with their homes
NFS mounted on the server. On  the other hand a Debian client machine 
also has

severe NFS problems. My only conclusion is that these NFS stories are very
tricky. The only moment everything worked fine was when we were running
Solaris on the server.

Useful knowledge, to be sure.
Sorry for my oversight. I should probably refrain from responding when I
have too many other things purculating in my mind while at work. This
has gotten me in trouble once before on this _same_ list. :)

Thank you for your thoughtful response.




--

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"





--
panic: kernel trap (ignored)



-
FreeBSD 5.4-RELEASE-p12 (SMP - 900x2) Tue Mar 7 19:37:23 PST 2006
/



pgpHofOVV3K34.pgp
Description: PGP Digital Signature


Re: NFS Locking Issue

2006-07-05 Thread Robert Watson

On Wed, 5 Jul 2006, Danny Braniss wrote:

In my case our main servers are NetApp, and the problems are more related to 
am-utils running into some race condition (need more time to debug this :-) 
the other problem is related to throughput, freebsd is slower than linux, 
and while freebsd/nfs/tcp is faster on Freebsd than udp, on linux it's the 
same. So it seems some tunning is needed.


our main problem now is samba/rpc.lockd, we are stuck with a server running 
FreeBSD 5.4 which crashes, and we can't upgrade to 6.1 because lockd doesn't 
work.


So, if someone is willing to look into the lockd issue, we would like to 
help.


The most significant problem working with rpc.lockd is creating easy to 
reproduce test cases.  Not least because they can potentially involve multiple 
clients.  If you can help to produce simple test cases to reproduce the bugs 
you're seeing, that would be invaluable.


I'm aware of two general classes of problems with rpc.lockd.  First, 
architectural issues, some derived from architectural problems in the NLM 
protocol: for example, assumptions that there can be a clean mapping of 
process lock owners to locks, which fall down as locks are properties of file 
descriptors that can be inheritted.  Second, implementation bugs/misfeatures, 
such as the kernel not knowing how to cancel lock requests, so being unable to 
implement interruptible waits on locks in the distributed case.


Reducing complex failure modes to easily reproduced test cases is tricky also, 
though.  It requires careful analysis, often with ktrace and tcpdump/ethereal 
to work out what's going on, and not a little luck to perform the reduction of 
a large trace down to a simple test scenario.  The first step is to try and 
figure out what, if any, specific workload results in a problem.  For example, 
can you trigger it using work on just one client against a server, without 
client<->client interactions?  This makes tracking and reproduction a lot 
easier, as multi-client test cases are really tricky!  Once you've established 
whether it can be reproduced with a single client, you have to track down the 
behavior that triggers it -- normally, this is done by attempting to narrow 
down the specific program or sequence of events that causes the bug to 
trigger, removing things one at a time to see what causes the problem to 
disappear.  This is made more difficult as lock managers are sensitive to 
timing, so removing a high load item from the list, even if it isn't the 
source of the problem, might cause it to trigger less frequently.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-05 Thread Oliver Brandmueller
Mornin'

On Tue, Jul 04, 2006 at 09:47:21PM +0100, Robert Watson wrote:
> BTW, I noticed yesterday that that IPv6 support committ to rpc.lockd was 
> never backed out.  An immediate question for people experiencing new 
> rpc.lockd problems with 6.x should be whether or not backing out that 
> change helps.

That could be a good pointer. I also started experiencing some problems 
at home (I did not investigate further though, but started using local 
locking and all was fine), while in our prod setup, where lots of 
machines are running, and many of them use 6-STABLE of not too long ago, 
I never experienced any problems with NFS. The main difference between 
both these networks is, that at home I have an IPv6 environment, while 
at work it's IPv4 only.

I barely find time before the weekend to do tests, but if I don't read 
any postings telling, that this made a difference, I will then start 
testing at home.

Thanx, Oliver

-- 
| Oliver Brandmueller | Offenbacher Str. 1  | Germany   D-14197 Berlin |
| Fon +49-172-3130856 | Fax +49-172-3145027 | WWW:   http://the.addict.de/ |
|   Ich bin das Internet. Sowahr ich Gott helfe.   |
| Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! |


pgp9BUYZloqfB.pgp
Description: PGP signature


Re: NFS Locking Issue

2006-07-04 Thread Danny Braniss
> Michel Talon wrote:
> 
> >>Using Ubuntu as the server I connected a FreeBSD 5.4 and 6-stable box as 
> >>clients on a 100Mb/s network.  The time trial used a dummy 100Meg file 
> >>transfered from the server to the client. 
> >>
> > 
> > 
> > I have similar experiences here. With FreeBSD-6.1 as client (using an Intel
> > etherexpress card at 100 Mb/s) and FC5 server i see full wire speed for file
> > transfers via NFS.
> > 
> > 
> >>After the 4th of July I intend to test Ubuntu as a client to a FreeBSD 
> >>6-STABLE server on a gigabit lan to run similar time trials.  I'm 
> >>looking to confirm what I can only suspect at this point, which is that 
> >>the NFS server on FreeBSD is mucked up, but the client is okay.
> > 
> > 
> > I have the same impression. The 6.1-RELEASE client seems to work well. 
> > Yesterday i have upgraded my 6.0 (*) box to 6.1 and i have not seen a single
> > NFS problem after that. Moreover i am using rpc.statd, and rpc.lockd
> > and they work OK and are really functional. 
> > I have the following sysctl which may have an effect on the problem:
> > vfs.nfs.access_cache_timeout=5
> > 
> > So it may well be that it is the FreeBSD NFS server code which has problems.
> > 
> > (*) 6.0-RELEASE client definitively does not work OK for me.
> > 
> > 
> 
> For what it's worth, I recently spent a lot of time putting FreeBSD 6.1
> to the test as both an NFS client and server in a mixed OS environment.
> By far and away, the biggest problems that I encountered with it were
> due to linux NFS bugs.  CentOS, FC, and SuSE all created huge problems
> under load, and it was impossible to get stable results until I started
> using 2.6.12 and higher kernels.
> 
> I have a variety of theories that I wish I had had time to test.  I've
> seen hints of problems with READDIRPLUS, with FreeBSD's habit of mapping
> GETATTR to ACCESS, and with handle sizes.  But in any case, it's been no
> secret that Linux has had very severe NFS problems in the past, and that
> the NetApp folks have worked very hard over the last year to fix them in
> the most recent Linux kernel releases.  The only real fault I give
> FreeBSD is rpc.lockd.  It's pretty much useless in all but trivial
> circumstances.  Beyond that, make sure you're using a linux kernel that
> is relatively recent.
> 
In my case our main servers are NetApp, and the problems are more related
to am-utils running into some race condition (need more time to debug this :-)
the other problem is related to throughput, freebsd is slower than linux,
and while freebsd/nfs/tcp is faster on Freebsd than udp, on linux
it's the same. So it seems some tunning is needed.

our main problem now is samba/rpc.lockd, we are stuck with a server
running FreeBSD  5.4 which crashes, and we can't upgrade to 6.1 because lockd
doesn't work.

So, if someone is willing to look into the lockd issue, we would like
to help.

danny

> Scott
> 
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-04 Thread Scott Long

Michel Talon wrote:

BTW, I noticed yesterday that that IPv6 support committ to rpc.lockd was never 
backed out.  An immediate question for people experiencing new rpc.lockd 
problems with 6.x should be whether or not backing out that change helps.



So it may be relevant to say that i have kernels without IPV6 support.
Recall that i have absolutely no problem with the client in FreeBSD-6.1.
Tomorrow i will test one of the 6.1 machines as a NFS server and the other as
a client, and will make you know if i see something.

As to the problems you mention about NFS Linux, yes i have seen a lot since
years. But to my surprise FC5 seems to work well. By the way it is kernel
2.6.16 so sufficiently recent for the problems to have been ironed out,
presumably.





2.6.16 should be OK.  I've heard of problems with cookie and handle 
sizes with it, but only under highly unusual circumstances.


Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-04 Thread Michel Talon
> BTW, I noticed yesterday that that IPv6 support committ to rpc.lockd was 
> never 
> backed out.  An immediate question for people experiencing new rpc.lockd 
> problems with 6.x should be whether or not backing out that change helps.

So it may be relevant to say that i have kernels without IPV6 support.
Recall that i have absolutely no problem with the client in FreeBSD-6.1.
Tomorrow i will test one of the 6.1 machines as a NFS server and the other as
a client, and will make you know if i see something.

As to the problems you mention about NFS Linux, yes i have seen a lot since
years. But to my surprise FC5 seems to work well. By the way it is kernel
2.6.16 so sufficiently recent for the problems to have been ironed out,
presumably.



-- 

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-04 Thread Robert Watson


On Tue, 4 Jul 2006, Scott Long wrote:

For what it's worth, I recently spent a lot of time putting FreeBSD 6.1 to 
the test as both an NFS client and server in a mixed OS environment. By far 
and away, the biggest problems that I encountered with it were due to linux 
NFS bugs.  CentOS, FC, and SuSE all created huge problems under load, and it 
was impossible to get stable results until I started using 2.6.12 and higher 
kernels.


I have a variety of theories that I wish I had had time to test.  I've seen 
hints of problems with READDIRPLUS, with FreeBSD's habit of mapping GETATTR 
to ACCESS, and with handle sizes.  But in any case, it's been no secret that 
Linux has had very severe NFS problems in the past, and that the NetApp 
folks have worked very hard over the last year to fix them in the most 
recent Linux kernel releases.  The only real fault I give FreeBSD is 
rpc.lockd.  It's pretty much useless in all but trivial circumstances. 
Beyond that, make sure you're using a linux kernel that is relatively 
recent.


BTW, I noticed yesterday that that IPv6 support committ to rpc.lockd was never 
backed out.  An immediate question for people experiencing new rpc.lockd 
problems with 6.x should be whether or not backing out that change helps.


I set up a simple local testbed for rpc.lockd this morning and have started 
running some basic tests.  I wasn't able to trivially reproduce rpc.lockd 
problems reported for cp -r, although I did bump into another bump in the 
memory mapping of zero-length files following creation in the NFS client, 
which I've passed on to Mohan.


I think what's needed is a wire-level regression suite, though, in order to 
avoid mixing up our rpc.lockd client code with the tests for rpc.lockd's 
server.  This is something I may be able to start looking at this week, 
although it's the usual time trade-off: work on getting audit ready for MFC, 
network stack locking and protocol cleanup/bug fixing, or throw rpc.lockd into 
the mix as well?  If we can demonstrate that backing out the IPv6 change 
clearly helps, we need to figure out why it's causing the problem.  A casual 
read of the change doesn't suggest anything obvious, unfortunately, suggesting 
something non-obvious :-(.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-04 Thread Scott Long

Michel Talon wrote:

Using Ubuntu as the server I connected a FreeBSD 5.4 and 6-stable box as 
clients on a 100Mb/s network.  The time trial used a dummy 100Meg file 
transfered from the server to the client. 




I have similar experiences here. With FreeBSD-6.1 as client (using an Intel
etherexpress card at 100 Mb/s) and FC5 server i see full wire speed for file
transfers via NFS.


After the 4th of July I intend to test Ubuntu as a client to a FreeBSD 
6-STABLE server on a gigabit lan to run similar time trials.  I'm 
looking to confirm what I can only suspect at this point, which is that 
the NFS server on FreeBSD is mucked up, but the client is okay.



I have the same impression. The 6.1-RELEASE client seems to work well. 
Yesterday i have upgraded my 6.0 (*) box to 6.1 and i have not seen a single

NFS problem after that. Moreover i am using rpc.statd, and rpc.lockd
and they work OK and are really functional. 
I have the following sysctl which may have an effect on the problem:

vfs.nfs.access_cache_timeout=5

So it may well be that it is the FreeBSD NFS server code which has problems.

(*) 6.0-RELEASE client definitively does not work OK for me.




For what it's worth, I recently spent a lot of time putting FreeBSD 6.1
to the test as both an NFS client and server in a mixed OS environment.
By far and away, the biggest problems that I encountered with it were
due to linux NFS bugs.  CentOS, FC, and SuSE all created huge problems
under load, and it was impossible to get stable results until I started
using 2.6.12 and higher kernels.

I have a variety of theories that I wish I had had time to test.  I've
seen hints of problems with READDIRPLUS, with FreeBSD's habit of mapping
GETATTR to ACCESS, and with handle sizes.  But in any case, it's been no
secret that Linux has had very severe NFS problems in the past, and that
the NetApp folks have worked very hard over the last year to fix them in
the most recent Linux kernel releases.  The only real fault I give
FreeBSD is rpc.lockd.  It's pretty much useless in all but trivial
circumstances.  Beyond that, make sure you're using a linux kernel that
is relatively recent.

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-04 Thread Kostik Belousov
On Mon, Jul 03, 2006 at 03:40:01PM -0700, Michael Collette wrote:
> User Freebsd wrote:
> >On Sat, 1 Jul 2006, Francisco Reyes wrote:
> >
> >>John Hay writes:
> >>
> >>>I only started to see the lockd problems when upgrading the server side
> >>>to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x
> >>>and 7-current and the lockd problem only showed up when upgrading the
> >>>server from 5.x to 6.x.
> >>
> >>It confirms the same we are experiencing.. constant freezing/locking 
> >>issues.
> >>I guess no more 6.X for us.. for the foreseable future..
> >
> >Since there are several of us experiencing what looks to be the same 
> >sort of deadlock issue, I beseech you not to give up
> 
> Honestly trying not to.  To tell ya the truth, I've been giving a real 
> hard look at Ubuntu for my serving needs.  This NFS thing has got me 
> seriously questioning FreeBSD right at the moment.
> 
> >... right now, all 
> >we've been able to get to the developers is virtually useless 
> >information (vmstat and such shows the problem, but it doesn't allow 
> >developers to identify the problem) ...
> >
> >Is this a problem that you can easily recreate, even on a non-production 
> >machine?
> 
> Oh yeah.  I've got a couple of ways I'm able to get this to fail.
> 
> Method #1:
> -
> Let's start with the simplest.  The scenario here involves 2 machines, 
> mach01 and mach02.  Both are running 6-STABLE, and both are running 
> rpcbind, rpc.statd, and rpc.lockd.  mach01 has exported /documents and 
> mach02 is mounting that export under /mnt.  Simple enough?
> 
> The /documents directory has multiple subdirectories and files of 
> various sizes.  The actual amount of data doesn't really matter to 
> produce a failure.  All you need to do at this point is to try to copy 
> files from that mount point to somewhere else on the hard drive.
> 
> cp -Rp /mnt/* /tmp/documents/
> 
> You may, or not, see that a couple of subdirectories were created, but 
> no files actually moved over.  The cp command is now locked up, and no 
> traffic moves.  This usually takes a second or two to show up as a 
> problem.  I can repeat this with multiple 6-STABLE boxes.
> 
> Turn off rpc.lockd on either the server or client before the cp command, 
> and things work.
Either way you specified is too vague to reproduce the problem.
As was said, you shall supply tcpdump of the failed nfs session.

Personally, I tried to do what you described as method 1, and got no
hangs, everything copied as it should be. I did it between
amd64 6.1-STABLE as of yesterday (client) and same STABLE i386 as
server. Monitoring lockd interaction by ethereal also did not reveal anything.

So, what you need to provide to help debug the issue:
1. as detailed information on problem machines configuration as
possible
2. exact version of the software you using
3. tcpdump of nfs sessions (for me, it is preferable to get
raw tcpdump that could be load into ethereal)
4. log of rpc.lockd both on client and server (see the -d option in man
page).

Issue seems to be highly specific for some configuration details.
And, for instance, me is unable to reproduce it on debug testbench.
Without help of the user experiencing trouble, it could take forever
to kill that bug.


pgpzCzk7MU4oD.pgp
Description: PGP signature


Re: NFS Locking Issue

2006-07-04 Thread Robert Watson


On Mon, 3 Jul 2006, Michael Collette wrote:


http://www.freebsd.org/cgi/query-pr.cgi?pr=80389


If you locally back out the referenced change lock_proc.c:1.18 in rpc.lockd on 
the server, do things improve?


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-04 Thread Michel Talon
> Using Ubuntu as the server I connected a FreeBSD 5.4 and 6-stable box as 
> clients on a 100Mb/s network.  The time trial used a dummy 100Meg file 
> transfered from the server to the client. 
> 

I have similar experiences here. With FreeBSD-6.1 as client (using an Intel
etherexpress card at 100 Mb/s) and FC5 server i see full wire speed for file
transfers via NFS.

> After the 4th of July I intend to test Ubuntu as a client to a FreeBSD 
> 6-STABLE server on a gigabit lan to run similar time trials.  I'm 
> looking to confirm what I can only suspect at this point, which is that 
> the NFS server on FreeBSD is mucked up, but the client is okay.

I have the same impression. The 6.1-RELEASE client seems to work well. 
Yesterday i have upgraded my 6.0 (*) box to 6.1 and i have not seen a single
NFS problem after that. Moreover i am using rpc.statd, and rpc.lockd
and they work OK and are really functional. 
I have the following sysctl which may have an effect on the problem:
vfs.nfs.access_cache_timeout=5

So it may well be that it is the FreeBSD NFS server code which has problems.

(*) 6.0-RELEASE client definitively does not work OK for me.


-- 

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-03 Thread Michael Collette

User Freebsd wrote:

On Sat, 1 Jul 2006, Francisco Reyes wrote:


John Hay writes:


I only started to see the lockd problems when upgrading the server side
to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x
and 7-current and the lockd problem only showed up when upgrading the
server from 5.x to 6.x.


It confirms the same we are experiencing.. constant freezing/locking 
issues.

I guess no more 6.X for us.. for the foreseable future..


Since there are several of us experiencing what looks to be the same 
sort of deadlock issue, I beseech you not to give up


Honestly trying not to.  To tell ya the truth, I've been giving a real 
hard look at Ubuntu for my serving needs.  This NFS thing has got me 
seriously questioning FreeBSD right at the moment.


... right now, all 
we've been able to get to the developers is virtually useless 
information (vmstat and such shows the problem, but it doesn't allow 
developers to identify the problem) ...


Is this a problem that you can easily recreate, even on a non-production 
machine?


Oh yeah.  I've got a couple of ways I'm able to get this to fail.

Method #1:
-
Let's start with the simplest.  The scenario here involves 2 machines, 
mach01 and mach02.  Both are running 6-STABLE, and both are running 
rpcbind, rpc.statd, and rpc.lockd.  mach01 has exported /documents and 
mach02 is mounting that export under /mnt.  Simple enough?


The /documents directory has multiple subdirectories and files of 
various sizes.  The actual amount of data doesn't really matter to 
produce a failure.  All you need to do at this point is to try to copy 
files from that mount point to somewhere else on the hard drive.


cp -Rp /mnt/* /tmp/documents/

You may, or not, see that a couple of subdirectories were created, but 
no files actually moved over.  The cp command is now locked up, and no 
traffic moves.  This usually takes a second or two to show up as a 
problem.  I can repeat this with multiple 6-STABLE boxes.


Turn off rpc.lockd on either the server or client before the cp command, 
and things work.


Method #2:
-
Booting to a diskless work station.  The server (mach01) has exported 
/usr, /usr/local, /usr/X11R6 and enough other stuff to get a diskless 
workstation up and running.  Not going to get into all the details here 
other than to say that I have a fully functioning setup like this on 5.4 
boxes now.


I've knocked the boot up of the diskless client (mach02) down to console 
only.  Once at the console I startx with a regular user, taking me in to 
twm.  From there I try to launch a KDE application, which in my test 
case is kwrite.  The same situation is true with launching a GTK app, 
such as Gimp.


X and twm start up.  I've got all the rest of the system reasonably 
functional.  When I try to run kwrite, none of the KDE subsystems start 
up.  kwrite just sits there in a lockd state.  Same is true of Gimp.


If I shutdown rpc.lockd on either machine I'm able to bring up a full 
KDE desktop, with all applications able to run.


Other Testing:
-
At one point we had in our test network a 6.1 NFS server providing files 
to 5.4 diskless clients without any problems.  We first got to noticing 
the bulk of the glitches when I moved the diskless setup to use a 6.1 
kernel.


As I said, I've been looking at Linux alternatives.  Especially after 
reading about Michel Talon's experiences with Fedora.  I initially tried 
CentOS, but wasn't able to get NFS working properly on that thing.  I 
had an Ubuntu CD handy, so I installed it on a test box.  Wow, does that 
NFS server boogie!


Using Ubuntu as the server I connected a FreeBSD 5.4 and 6-stable box as 
clients on a 100Mb/s network.  The time trial used a dummy 100Meg file 
transfered from the server to the client.  We measured 90Mb/s transfer, 
which was FAR faster than I had ever been able to get 2 FreeBSD boxes to 
perform doing similar tests.


I then used Ubuntu to connect to a 5.4 server we have in production.  I 
don't recall the exact stats, but it was close to 10x slower.  No 
lockups here though.


After the 4th of July I intend to test Ubuntu as a client to a FreeBSD 
6-STABLE server on a gigabit lan to run similar time trials.  I'm 
looking to confirm what I can only suspect at this point, which is that 
the NFS server on FreeBSD is mucked up, but the client is okay.


As time allows I hope to run similar tests between two Ubuntu boxes, 
then run it all again with Fedora.  Seriously debating whether to move 
some or all of our infrastructure to Linux after all this.  A 3-4 month 
old known bug like this gives me a great deal of concern about FreeBSD. 
 That, and Ubuntu's NFS server speed just about knocked me over!


 In my case, I have one machine fully configured for debugging, 
but, of course,

Re: NFS Locking Issue

2006-07-03 Thread Michael Collette

Garance A Drosihn wrote:

At 9:13 PM -0400 7/1/06, Francisco Reyes wrote:

John Hay writes:


I only started to see the lockd problems when upgrading
the server side to FreeBSD 6.x and later. I had various
FreeBSD clients, between 4.x and 7-current and the lockd
problem only showed up when upgrading the server from
5.x to 6.x.


It confirms the same we are experiencing.. constant
freezing/locking issues.  I guess no more 6.X for us.. for
the foreseable future..


I don't know if this will be of any help to anyone,
but...

I recently moved a network-based service from a 4.x machine
to a 6.x machine.  Despite some testing in advance of the
switch, many people had problems with the service.  I booted
to a somewhat out-of-date snapshot of 5.x on the same box.
I still had problems, but it didn't seem as bad, so I stuck
with the 5.x system.  Some problems turned out to be bugs
in the service itself, and were eventually found and fixed.

However, one set of problems on that out-of-date snapshot
of 5.x were solved by adding:

net.inet.tcp.rfc1323=0

to /etc/sysctl.conf.  The guy who suggested that said it
avoided a bug which was fixed in later versions of either
5.x or 6.x, I forget which.  Of interest is that the bug
was such that some people connecting to the service were
never bothered by the bug, while other people could not use
the service at all until I turned off tcp.rfc1323 .

I have a test version of the same service running on a
different FreeBSD/i386 box, and that box is now updated
to freebsd-stable as of June 10th.  Lo and behold, someone
connecting to that test box reported some problems.  So I
typed in 'sysctl net.inet.tcp.rfc1323=0', and his problem
immediately disappeared.  So, it might be that there is
still some problem with the rfc1323 processing, or that the
bug which had been fixed has somehow been re-introduced.

In any case, people who are experiencing problems with NFS
might want to try that, and see if it makes any difference.
It does strike me as odd that some people are having a *lot*
of trouble with NFS under 6.x, while others seem to be okay
with it.  Perhaps the difference is the network topology
between the NFS server and the NFS clients.

Obviously, this is nothing but a guess on my part.  I am
not a networking guru!



Thanks for the try Garance, but in my setup it didn't make any 
difference.  I'll get into a bit more detail about my setup in another post.


Later on,
--
Michael Collette
IT Manager
TestEquity Inc
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-03 Thread Garance A Drosihn

At 9:13 PM -0400 7/1/06, Francisco Reyes wrote:

John Hay writes:


I only started to see the lockd problems when upgrading
the server side to FreeBSD 6.x and later. I had various
FreeBSD clients, between 4.x and 7-current and the lockd
problem only showed up when upgrading the server from
5.x to 6.x.


It confirms the same we are experiencing.. constant
freezing/locking issues.  I guess no more 6.X for us.. for
the foreseable future..


I don't know if this will be of any help to anyone,
but...

I recently moved a network-based service from a 4.x machine
to a 6.x machine.  Despite some testing in advance of the
switch, many people had problems with the service.  I booted
to a somewhat out-of-date snapshot of 5.x on the same box.
I still had problems, but it didn't seem as bad, so I stuck
with the 5.x system.  Some problems turned out to be bugs
in the service itself, and were eventually found and fixed.

However, one set of problems on that out-of-date snapshot
of 5.x were solved by adding:

net.inet.tcp.rfc1323=0

to /etc/sysctl.conf.  The guy who suggested that said it
avoided a bug which was fixed in later versions of either
5.x or 6.x, I forget which.  Of interest is that the bug
was such that some people connecting to the service were
never bothered by the bug, while other people could not use
the service at all until I turned off tcp.rfc1323 .

I have a test version of the same service running on a
different FreeBSD/i386 box, and that box is now updated
to freebsd-stable as of June 10th.  Lo and behold, someone
connecting to that test box reported some problems.  So I
typed in 'sysctl net.inet.tcp.rfc1323=0', and his problem
immediately disappeared.  So, it might be that there is
still some problem with the rfc1323 processing, or that the
bug which had been fixed has somehow been re-introduced.

In any case, people who are experiencing problems with NFS
might want to try that, and see if it makes any difference.
It does strike me as odd that some people are having a *lot*
of trouble with NFS under 6.x, while others seem to be okay
with it.  Perhaps the difference is the network topology
between the NFS server and the NFS clients.

Obviously, this is nothing but a guess on my part.  I am
not a networking guru!

--
Garance Alistair Drosehn=   [EMAIL PROTECTED]
Senior Systems Programmer   or  [EMAIL PROTECTED]
Rensselaer Polytechnic Instituteor  [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-03 Thread Chuck Swiger

Michel Talon wrote:
[ ...a long email snipped... ]

My only conclusion is that these NFS stories are very
tricky. The only moment everything worked fine was when we were running
Solaris on the server.


I can't speak to the earlier part about NFS with Linux, but at least I very 
much agree with your conclusion: Solaris makes one of the best NFS servers 
available, over a broad range of use cases.


However, I also wish to note that if you want to use NFS and you need remote 
locking to work, your best hope is when the software you use is willing to use 
explicit lockfiles rather than depending on rpc.lockd to provide remote 
flock()/lockf()-style locking.


There are plenty of software out there which includes locking tests (sendmail 
does, UWash IMAP does, Perl does, etc), and my observation has been that 
actually using NFS-based remote locking under anything beyond trivial load 
tends to make rpc.lockd terminate within seconds (maybe with a core dump, if 
you get lucky), or end up with processes getting stuck forever waiting on 
locks that don't ever return because they've been lost somewhere in limbo.


YMMV.  :-)

--
-Chuck
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-03 Thread User Freebsd

On Mon, 3 Jul 2006, Francisco Reyes wrote:


Kostik Belousov writes:


I think that then 6.2 and 6.3 is not for you either. Problems
cannot be fixed until enough information is given.


I am trying.. but so far only other users who are having the same problem are 
commenting on this and other simmilar threads.


We just need some guidance..

Mark gave me a URL to turn on debugging and volunteered ot give me some 
pointers.. I will try, but I will likely try on my own time, on my own 
machines.. I can not tell the owner of the company I work for to let me 
"try".. or "play around" in production machines.. as we loose customers 
because of current problems with the 6.X line. 
Since nobody except you experience that problems (at least, only you 
notified

about the problem existence)


Did you miss the part of:


User Freebsd writes:

Since there are several of us experiencing what looks to be the same sort
of deadlock issue, I beseech you not to give up


I am not the only one reporting or having the issue.


Careful here, I think this is where things are getting confused ... the 
above is related to the deadlock (high vmstat blockd issue), not the NFS 
issue ... we're getting two different issues confused :)



improved handling of signals in nfs client. If you could test it, that
would be useful.


Does it matter if the OS is i386 or am64?
Have an amd64 machine I can more easily play with... with no risk to 
production.


Does the amd64 machine exhibit the same problem?


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-03 Thread Michel Talon
> So it would appear that you cured the NFS problems inherent with FBSD-6
> by replacing FBSD with Fedora Linux. Nice to know that NFSd works in Linux.
> But won't help those on the FBSD list fix their FBSD-6 boxen. :/
>

First NFS is designed to make machines of different OSs interact properly.
If a FreeBSD server interacts properly with a FreeBSD client, but not other
clients, you cannot say that the situation is fine.
Second i am not the one to chose the NFS server, there are people working
in social groups, in the real world.

And third, the most important, the OP message seemed to imply that the
FreeBSD-6 NFS client was at fault, i pointed out that in my experience my
FreeBSD-6.1 client works OK, while the 6.0 doesn't, when  interacting with a
FC5 server. This is in itself a relevant piece of information for the problem
at hand. It may be that the server side is at fault, or some complex
interaction between client and server.

Anyways some people claimed here that they had no problem with FreeBSD-5
clients and servers. My experience is that i had constant problems 
between FreeBSD-5 clients and Fedora Core 3 servers. I cannot provide any
other data point. I am not particularly sure of the quality of the FC3 or
FC5 NFS server implementation, except that the ~ 100 workstations 
running the similar Fedora distribution work like a charm with their homes
NFS mounted on the server. On  the other hand a Debian client machine also has
severe NFS problems. My only conclusion is that these NFS stories are very
tricky. The only moment everything worked fine was when we were running
Solaris on the server.

 
-- 

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-03 Thread Andrew Reilly
On Mon, Jul 03, 2006 at 10:06:52AM +0100, Robert Watson wrote:
> It sounds like there is also an NFS client race condition or other bug of 
> some sort.

It may not be related, directly, but one thing that I noticed,
while trying to sort out my own recently commissioned NFS setup,
is that the -r1024 mount flag is *crucial* when the network is
100BaseT and the server is a new, fast amd64 box, and the client
is an old P3-500 with a RealTek ethernet card.  It works fine,
now, but tcpdump showed that it was retrying forever without.
Even NFS over TCP seemed to suffer a bunch of error-related
retries which amounted to stalls in the client.

Is there any way for this sort of thing to be adjusted
automatically?

Cheers,

-- 
Andrew
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-03 Thread Kostik Belousov
On Mon, Jul 03, 2006 at 10:06:52AM +0100, Robert Watson wrote:
> 
> On Mon, 3 Jul 2006, Kostik Belousov wrote:
> 
> >On Mon, Jul 03, 2006 at 12:50:11AM -0400, Francisco Reyes wrote:
> >>Kostik Belousov writes:
> >>>Since nobody except you experience that problems (at least, only you
> >>>notified
> >>>about the problem existence)
> >>
> >>Did you miss the part of:
> >>
> >>>User Freebsd writes:
> Since there are several of us experiencing what looks to be the same 
> sort
> of deadlock issue, I beseech you not to give up
> >>
> >>I am not the only one reporting or having the issue.
> >I think you have different issues.
> 
> I agree.  It looks like we have several issues floating around.  There are 
> some known issues with rpc.lockd (and probably some unknown ones) that will 
> require a concerted effort to resolve.  There appear to be a number of 
> reports relating to this/these problems.
> 
> It sounds like there is also an NFS client race condition or other bug of 
> some sort.
> 
> I think it would be really useful to isolate the two during debugging. 
> Specifically, to make sure that the second client bug is reproduceable 
> without rpc.lockd running on the client (and related mount flags).  Once we 
> have some more information, such as vnode locking information, client 
> thread stack traces, etc, we should probably get Mohan in the loop if 
> things seem sticky. I believe he was on vacation last week; he may be back 
> this week sometime. With the July 4 weekend afoot, a lot of .us developers 
> are offline.
I too did noted some time ago that unresposible nfs server takes
nfs client down. I then looked at the issue, and have the impression
that this is again the case of runningbufspace depletion. I got a lot
of processes in wdrain and flswai states. After nfs server repaired,
active write requests were executed, number of dirty buffers decreased,
and system returned to normal operation.

This seems to be an architectural issue. I tried to bring discussion up
several month ago, but got no response.

And, there is the small problem about SIGINT being ignored when mounted
with intr flag. Patch to fix this is attached in my previous mail.



pgpJkB9m4Wicz.pgp
Description: PGP signature


Re: NFS Locking Issue

2006-07-03 Thread Robert Watson


On Mon, 3 Jul 2006, Kostik Belousov wrote:


On Mon, Jul 03, 2006 at 12:50:11AM -0400, Francisco Reyes wrote:

Kostik Belousov writes:

Since nobody except you experience that problems (at least, only you
notified
about the problem existence)


Did you miss the part of:


User Freebsd writes:

Since there are several of us experiencing what looks to be the same sort
of deadlock issue, I beseech you not to give up


I am not the only one reporting or having the issue.

I think you have different issues.


I agree.  It looks like we have several issues floating around.  There are 
some known issues with rpc.lockd (and probably some unknown ones) that will 
require a concerted effort to resolve.  There appear to be a number of reports 
relating to this/these problems.


It sounds like there is also an NFS client race condition or other bug of some 
sort.


I think it would be really useful to isolate the two during debugging. 
Specifically, to make sure that the second client bug is reproduceable without 
rpc.lockd running on the client (and related mount flags).  Once we have some 
more information, such as vnode locking information, client thread stack 
traces, etc, we should probably get Mohan in the loop if things seem sticky. 
I believe he was on vacation last week; he may be back this week sometime. 
With the July 4 weekend afoot, a lot of .us developers are offline.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-02 Thread Chris H.

Quoting Michel Talon <[EMAIL PROTECTED]>:


I guess I'm still just a bit stunned that a bug this obvious not only
found it's way into the STABLE branch, but is still there.  Maybe it's
not as obvious as I think, or not many folks are using it?  All I know
for sure here is that if I had upgraded to 6.1 my network would have
been crippled.


Strange, since i upgraded to FreeBSD-6.1 and the NFS server to Fedora Core 5,
my machine, NFS client is happy, and lockd works. It is first time since
years i have no problem. It certainly did not work with FreeBSD-5 and i still
have a machine with FreeBSD-6.0 which does not work properly 
(frequently loses

the NFS mount, but it gets remounted some times later by amd). Anyways i have
exactly 0 problem with the 6.1 machine. I could extend that to say that
everything works very well on that machine, nothing is slow, including disk
access. This has not always been the case. Stability wise, i have not 
seen any
panic, hang or whatever since i have compiled a kernel adapted to my 
hardware.

I got a panic with the generic kernel soon after installation, but now
machine is totally stable.


So it would appear that you cured the NFS problems inherent with FBSD-6
by replacing FBSD with Fedora Linux. Nice to know that NFSd works in Linux.
But won't help those on the FBSD list fix their FBSD-6 boxen. :/





--

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"





--
panic: kernel trap (ignored)



-
FreeBSD 5.4-RELEASE-p12 (SMP - 900x2) Tue Mar 7 19:37:23 PST 2006
/



pgpP612thhpv3.pgp
Description: PGP Digital Signature


Re: NFS Locking Issue

2006-07-02 Thread Kostik Belousov
On Mon, Jul 03, 2006 at 12:50:11AM -0400, Francisco Reyes wrote:
> Kostik Belousov writes:
> >Since nobody except you experience that problems (at least, only you 
> >notified
> >about the problem existence)
> 
> Did you miss the part of:
> 
> >User Freebsd writes:
> >>Since there are several of us experiencing what looks to be the same sort
> >>of deadlock issue, I beseech you not to give up
> 
> I am not the only one reporting or having the issue.
I think you have different issues.
> 
> >Is this for intr mounts?
> 
> "intr" ?
Mount option that allows to interrupt nfs operation by signal.
See mount_nfs(8). BTW, I had the impression that this feature not working
was one of your problem.
> 
> 
> >improved handling of signals in nfs client. If you could test it, that
> >would be useful.
> 
> Does it matter if the OS is i386 or am64?
> Have an amd64 machine I can more easily play with... with no risk to 
> production. 
No, this shall be applicable to any arch. Except that the patches are several
month old, and were developed against CURRENT. But I think that it is
applicable to STABLE.


pgpI2ne8y7oAR.pgp
Description: PGP signature


Re: NFS Locking Issue

2006-07-02 Thread Francisco Reyes

Kostik Belousov writes:


I think that then 6.2 and 6.3 is not for you either. Problems
cannot be fixed until enough information is given.


I am trying.. but so far only other users who are having the same problem 
are commenting on this and other simmilar threads.


We just need some guidance..

Mark gave me a URL to turn on debugging and volunteered ot give me some 
pointers.. I will try, but I will likely try on my own time, on my own 
machines.. I can not tell the owner of the company I work for to let me 
"try".. or "play around" in production machines.. as we loose customers 
because of current problems with the 6.X line. 


Since nobody except you experience that problems (at least, only you notified
about the problem existence)


Did you miss the part of:


User Freebsd writes:

Since there are several of us experiencing what looks to be the same sort
of deadlock issue, I beseech you not to give up


I am not the only one reporting or having the issue.


Is this for intr mounts?


"intr" ?



improved handling of signals in nfs client. If you could test it, that
would be useful.


Does it matter if the OS is i386 or am64?
Have an amd64 machine I can more easily play with... with no risk to 
production. 
___

freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-02 Thread Kostik Belousov
On Sun, Jul 02, 2006 at 05:49:44PM -0400, Francisco Reyes wrote:
> User Freebsd writes:
> 
> >Since there are several of us experiencing what looks to be the same sort 
> >of deadlock issue, I beseech you not to give up
> 
> I will try to setup the environment, but to be honest no more 6.X for us 
> until 6.2 or 6.3.. We have lost clients already.
I think that then 6.2 and 6.3 is not for you either. Problems
cannot be fixed until enough information is given. Since nobody
except you experience that problems (at least, only you notified
about the problem existence), no bug reports with sufficient
information is given.
> 
> >Is this a problem that you can easily recreate
> 
> There is one thing I can easily recreate that would very helpfull to solve.
> The 6.X NFS clients freeze if the NFS server goes away.
> 
> I have been able to reproduce that every single time.. both in test and 
> production.
Is this for intr mounts ? I posted some time ago the patches that
improved handling of signals in nfs client. If you could test it, that
would be useful.


? sys/nfsclient/.arch-ids
Index: sys/nfsclient/nfs_socket.c
===
RCS file: /usr/local/arch/ncvs/src/sys/nfsclient/nfs_socket.c,v
retrieving revision 1.141
diff -u -r1.141 nfs_socket.c
--- sys/nfsclient/nfs_socket.c  23 May 2006 18:33:58 -  1.141
+++ sys/nfsclient/nfs_socket.c  3 Jul 2006 04:19:23 -
@@ -1701,11 +1701,13 @@
p = td->td_proc;
PROC_LOCK(p);
tmpset = p->p_siglist;
+   SIGSETOR(tmpset, td->td_siglist);
SIGSETNAND(tmpset, td->td_sigmask);
mtx_lock(&p->p_sigacts->ps_mtx);
SIGSETNAND(tmpset, p->p_sigacts->ps_sigignore);
mtx_unlock(&p->p_sigacts->ps_mtx);
-   if (SIGNOTEMPTY(p->p_siglist) && nfs_sig_pending(tmpset)) {
+   if ((SIGNOTEMPTY(p->p_siglist) || SIGNOTEMPTY(td->td_siglist))
+   && nfs_sig_pending(tmpset)) {
PROC_UNLOCK(p);
return (EINTR);
}
Index: sys/nfsclient/nfs_vnops.c
===
RCS file: /usr/local/arch/ncvs/src/sys/nfsclient/nfs_vnops.c,v
retrieving revision 1.266
diff -u -r1.266 nfs_vnops.c
--- sys/nfsclient/nfs_vnops.c   19 May 2006 00:04:24 -  1.266
+++ sys/nfsclient/nfs_vnops.c   3 Jul 2006 04:19:24 -
@@ -2716,7 +2716,7 @@
 * otherwise just do it ourselves.
 */
if ((bp->b_flags & B_ASYNC) == 0 ||
-   nfs_asyncio(VFSTONFS(ap->a_vp->v_mount), bp, NOCRED, td))
+   nfs_asyncio(VFSTONFS(ap->a_vp->v_mount), bp, NOCRED, curthread))
(void)nfs_doio(ap->a_vp, bp, cr, td);
return (0);
 }


pgpVXbZXOGFFf.pgp
Description: PGP signature


Re: NFS Locking Issue

2006-07-02 Thread Francisco Reyes

User Freebsd writes:

Since there are several of us experiencing what looks to be the same sort 
of deadlock issue, I beseech you not to give up


I will try to setup the environment, but to be honest no more 6.X for us 
until 6.2 or 6.3.. We have lost clients already.



Is this a problem that you can easily recreate


There is one thing I can easily recreate that would very helpfull to solve.
The 6.X NFS clients freeze if the NFS server goes away.

I have been able to reproduce that every single time.. both in test and 
production.



machine?  In my case, I have one machine fully configured for debugging,


Although solving both, server and client, would be great for us if we could 
at least solve the client.. it would be very helpfull.. until our next 
server comes.. in which we are going to install 5.5
  
information to the developers to debug this, the faster it will get fixed 


Agree.. but with 4+ crashes in less than a week it has reached the point 
where we have moved workload away from the most problematic machine.. to try 
to aliviate the problem.. but still was not enough.. to prevent at least one 
big customer of ours to go.. We don't keep tight track of the smaller ones. 
:-)



different then your auto-mechanic ... try telling him there is a 'knocking 
under the hood, please tell me how to fix it, but you can't have my car', 
and he'll brush you off ... give him 30 minutes under the hood, and not 
only will he have identified it, but he'll probably fix it too ...


The problem is when you are a taxi driver... and it cost you money to have 
the car off the streets.. and you don't know when the 'knocking' will 
occur... :-)


Will setup my laptop with the debug settings and will then work on trying to 
debug the client problem... depending on how that goes will then possibly 
try the server that is giving us problems.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-02 Thread User Freebsd

On Sat, 1 Jul 2006, Francisco Reyes wrote:


John Hay writes:


I only started to see the lockd problems when upgrading the server side
to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x
and 7-current and the lockd problem only showed up when upgrading the
server from 5.x to 6.x.


It confirms the same we are experiencing.. constant freezing/locking issues.
I guess no more 6.X for us.. for the foreseable future..


Since there are several of us experiencing what looks to be the same sort 
of deadlock issue, I beseech you not to give up ... right now, all we've 
been able to get to the developers is virtually useless information 
(vmstat and such shows the problem, but it doesn't allow developers to 
identify the problem) ...


Is this a problem that you can easily recreate, even on a non-production 
machine?  In my case, I have one machine fully configured for debugging, 
but, of course, since re-configuring it, it hasn't exhibited the problem 
... if most of us get our machines configured properly to give useful 
information to the developers to debug this, the faster it will get fixed 
...


My experience with most of the developers is that if you can get into DDB 
and give them 'internal traces' of the code, bugs tend to get fixed very 
quickly ... vmstat/ps give "external views", more summaries then anything 
... its the details "under the hood" that they need ... its not much 
different then your auto-mechanic ... try telling him there is a 'knocking 
under the hood, please tell me how to fix it, but you can't have my car', 
and he'll brush you off ... give him 30 minutes under the hood, and not 
only will he have identified it, but he'll probably fix it too ...



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-01 Thread Danny Braniss
> John Hay writes:
> 
> > I only started to see the lockd problems when upgrading the server side
> > to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x
> > and 7-current and the lockd problem only showed up when upgrading the
> > server from 5.x to 6.x.
> 
> It confirms the same we are experiencing.. constant freezing/locking issues.
> I guess no more 6.X for us.. for the foreseable future..

just to add some more 'ingredients' to the problems:
1- we are suffering from the lockd syndrome
2- am-utils sometimes failes - specially /net (type:=host)
  [there seems to be a race condition]

both problems are new since 6.1
and now, on a 'mostly idle' machine, after failing to compile openoffice-2.0
the lockd is 'spinning' with no real work, at least so it seems:

last pid: 69935;  load averages:  0.16,  0.10,  0.08up 1+16:37:25  09:37:09
44 processes:  1 running, 43 sleeping
CPU states:  2.6% user,  0.0% nice,  0.4% system,  0.4% interrupt, 96.6% idle
Mem: 129M Active, 2796M Inact, 157M Wired, 106M Cache, 214M Buf, 132M Free
Swap: 4096M Total, 4096M Free

  PID USERNAMETHR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
  513 root  1  960 48628K 45304K select 1  67:39  5.13% rpc.lockd
  498 root  1   40  2420K   868K -  1  23:38  0.83% nfsd
  419 root  1  960  5408K  2088K select 1  98:13  0.00% amd-6.1.5

danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-07-01 Thread Francisco Reyes

John Hay writes:


I only started to see the lockd problems when upgrading the server side
to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x
and 7-current and the lockd problem only showed up when upgrading the
server from 5.x to 6.x.


It confirms the same we are experiencing.. constant freezing/locking issues.
I guess no more 6.X for us.. for the foreseable future..
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-06-30 Thread Rong-en Fan

On 6/29/06, Michael Collette <[EMAIL PROTECTED]> wrote:

This last week I had been working on a test network to test out 6.1
prior to upgrading our production boxes from 5.4.  That's when I ran
across the rpc.lockd issues that have been discussed earlier.

Our production setup has diskless clients running KDE, which due to this
bug is now dead on 6.1.  I also have my mail server delivering messages
to a file server via NFS.  I even have servers booting diskless with NFS
provided file systems... all of which are dead on 6.1.

The last discussion our bug updates I've seen on this issue were about 3
months ago.  This leaves me with a number of questions I hope can be
answered here on this list.

Is NFS a big deal for most other users, or am I out here on the fringe
using it as much as I do?

Is anyone working on a fix for this?  If so, is there any kind of time
frame where this fix might be MFC'd to 6-STABLE?

I guess I'm still just a bit stunned that a bug this obvious not only
found it's way into the STABLE branch, but is still there.  Maybe it's
not as obvious as I think, or not many folks are using it?  All I know
for sure here is that if I had upgraded to 6.1 my network would have
been crippled.


Try 6.1-STABLE, especially make sure you have

$FreeBSD: src/usr.sbin/rpc.lockd/kern.c,v 1.16.2.1 2006/06/02
01:20:58 rodrigc Exp $

for usr.sbin/rpc.lockd/kern.c, and see if this helps.

Regards,
Rong-En Fan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-06-30 Thread Michel Talon
> Based on prior reading about this problem, I'd venture to guess that the 
> file locking between FC5 and FreeBSD simply isn't.  See, between just 2 
> machines sharing files without rpc.lockd running you won't see a 
> problem.  Both the client and the server must not only be running 
> rpc.lockd, but they must be able to actually talk to each other.
> 

I definitely disagree with that. I have written a little program
just to check locking on files on the NFS share, and i can assure you
it works. Before FC5 the same program did not work, in fact hanged.
You could not kill the program, without unmounting the NFS share.
After the upgrade FC3 -> FC5 the lockd works and if i try setting a second lock
on the same file it will fail. I am using this daily with mutt, no problem.
But it is not only lockd which now works, it is more generally NFS.
On a 6.0 machine i regularly get things like:
Jun 22 17:30:10 asmodee kernel: for server ada:/ada1
Jun 22 17:30:10 asmodee kernel: nfs send error 1 for server ada:/ada1
Jun 22 17:30:10 asmodee last message repeated 797 times
Jun 22 17:30:15 asmodee kernel: for server ada:/ada
Jun 22 17:30:15 asmodee kernel: nfs send error 1 for server ada:/ada
Jun 22 17:30:15 asmodee last message repeated 817 times
Jun 22 17:30:20 asmodee kernel: nfs send error 35 for server ada:/ada
and the home directories are inaccessible for a couple of minutes. I have
never seen that once on the 6.1 machine.

-- 

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-06-30 Thread Michel Talon
> I only started to see the lockd problems when upgrading the server side
> to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x
> and 7-current and the lockd problem only showed up when upgrading the
> server from 5.x to 6.x.

As far as i remember FreeBSD-4 did not have a true lockd, only a fake one,
so it was always working no problem. I have used all versions of FreeBSD-5
up to 6.0 and 6.1 on my client with a Linux server, and i can say that 6.1
is the first one which works OK for me. I don't have any experience with
FreeBSD server, except the occasional nfs mounting after a make world.


-- 

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-06-30 Thread Michel Talon

> the one thing that sticks out to me about this report is that they 
> upgraded teh NFS server to FC5 ... what was the server running before?  if 
> FreeBSD, could the problem be an interaction problem between the NFS 
> server and client, vs just the client side?

Previously the server used  Fedora Core 3. I think like you that it is an 
interaction
between client and server. For example we have a client machine running Debian
Unstable which had NFS problems interacting FC3 server and still has with FC5
server. But i don't have any more with Fbsd-6.1. As to the problem of the
machine freezing when the server freezes i have always seen that, both under
Linux and FreeBSD, nothing new. The freeze seems to me less severe now, that
is i have been able to log in root with the server down. The load on the
server is rather big, we are talking around 100 machines having their home
directories on the server.

-- 

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-06-30 Thread Michael Collette

Michel Talon wrote:
I guess I'm still just a bit stunned that a bug this obvious not only 
found it's way into the STABLE branch, but is still there.  Maybe it's 
not as obvious as I think, or not many folks are using it?  All I know 
for sure here is that if I had upgraded to 6.1 my network would have 
been crippled.


Strange, since i upgraded to FreeBSD-6.1 and the NFS server to Fedora Core 5,
my machine, NFS client is happy, and lockd works. It is first time since
years i have no problem. It certainly did not work with FreeBSD-5 and i still
have a machine with FreeBSD-6.0 which does not work properly (frequently loses
the NFS mount, but it gets remounted some times later by amd). Anyways i have
exactly 0 problem with the 6.1 machine. I could extend that to say that
everything works very well on that machine, nothing is slow, including disk
access. This has not always been the case. Stability wise, i have not seen any
panic, hang or whatever since i have compiled a kernel adapted to my hardware.
I got a panic with the generic kernel soon after installation, but now
machine is totally stable.


Based on prior reading about this problem, I'd venture to guess that the 
file locking between FC5 and FreeBSD simply isn't.  See, between just 2 
machines sharing files without rpc.lockd running you won't see a 
problem.  Both the client and the server must not only be running 
rpc.lockd, but they must be able to actually talk to each other.


For a simple 2 machine setup, you don't really need much in the way of 
locking control, as you don't have to deal with multiple requests for 
the same resource.  This is why folks just running the "-L" flag on 
their mount command also aren't having any problems.


To actually see the problem isn't too hard to set up.  Just have 
rpc.lockd, rpc.statd, and rpcbind enabled on both the client and the 
server.  Then just starting trying to transfer a stack of files from one 
to the other.  I found this to be true even trying to go from a 5.4 
server to my 6.1 laptop here.


There was quite a thread on this back in March of this year, along with 
a few PR's that are still opened up.  I'm personally just coming head 
long into all of this.


Later on,
--
Michael Collette
IT Manager
TestEquity Inc
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-06-29 Thread John Hay
On Fri, Jun 30, 2006 at 01:03:09AM +0200, Michel Talon wrote:
> > I guess I'm still just a bit stunned that a bug this obvious not only 
> > found it's way into the STABLE branch, but is still there.  Maybe it's 
> > not as obvious as I think, or not many folks are using it?  All I know 
> > for sure here is that if I had upgraded to 6.1 my network would have 
> > been crippled.
> 
> Strange, since i upgraded to FreeBSD-6.1 and the NFS server to Fedora Core 5,
> my machine, NFS client is happy, and lockd works. It is first time since
> years i have no problem. It certainly did not work with FreeBSD-5 and i still
> have a machine with FreeBSD-6.0 which does not work properly (frequently loses
> the NFS mount, but it gets remounted some times later by amd). Anyways i have
> exactly 0 problem with the 6.1 machine. I could extend that to say that
> everything works very well on that machine, nothing is slow, including disk
> access. This has not always been the case. Stability wise, i have not seen any
> panic, hang or whatever since i have compiled a kernel adapted to my hardware.
> I got a panic with the generic kernel soon after installation, but now
> machine is totally stable.

I only started to see the lockd problems when upgrading the server side
to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x
and 7-current and the lockd problem only showed up when upgrading the
server from 5.x to 6.x.

John
-- 
John Hay -- [EMAIL PROTECTED] / [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-06-29 Thread Francisco Reyes

User Freebsd writes:

the one thing that sticks out to me about this report is that they 
upgraded teh NFS server to FC5


I wonder if the FreeBSD 6.X client would freeze with a non FreeBSD NFS 
server. Would be interesting to have that info for comparison.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-06-29 Thread User Freebsd

On Thu, 29 Jun 2006, Francisco Reyes wrote:


Michel Talon writes:

Strange, since i upgraded to FreeBSD-6.1 and the NFS server to Fedora Core 
5,

my machine, NFS client is happy, and lockd works.


What volume are we talking about?
My own problems and other reports I see are all under heavy load.


the one thing that sticks out to me about this report is that they 
upgraded teh NFS server to FC5 ... what was the server running before?  if 
FreeBSD, could the problem be an interaction problem between the NFS 
server and client, vs just the client side?



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-06-29 Thread Francisco Reyes

Michel Talon writes:


Strange, since i upgraded to FreeBSD-6.1 and the NFS server to Fedora Core 5,
my machine, NFS client is happy, and lockd works.


What volume are we talking about?
My own problems and other reports I see are all under heavy load.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-06-29 Thread Ronald Klop
On Thu, 29 Jun 2006 22:25:30 +0200, Michael Collette  
<[EMAIL PROTECTED]> wrote:



Rong-en Fan wrote:

On 6/29/06, Michael Collette <[EMAIL PROTECTED]> wrote:

This last week I had been working on a test network to test out 6.1
prior to upgrading our production boxes from 5.4.  That's when I ran
across the rpc.lockd issues that have been discussed earlier.

Our production setup has diskless clients running KDE, which due to  
this

bug is now dead on 6.1.  I also have my mail server delivering messages
to a file server via NFS.  I even have servers booting diskless with  
NFS

provided file systems... all of which are dead on 6.1.

The last discussion our bug updates I've seen on this issue were about  
3

months ago.  This leaves me with a number of questions I hope can be
answered here on this list.

Is NFS a big deal for most other users, or am I out here on the fringe
using it as much as I do?

Is anyone working on a fix for this?  If so, is there any kind of time
frame where this fix might be MFC'd to 6-STABLE?

I guess I'm still just a bit stunned that a bug this obvious not only
found it's way into the STABLE branch, but is still there.  Maybe it's
not as obvious as I think, or not many folks are using it?  All I know
for sure here is that if I had upgraded to 6.1 my network would have
been crippled.

 Try 6.1-STABLE, especially make sure you have
 $FreeBSD: src/usr.sbin/rpc.lockd/kern.c,v 1.16.2.1 2006/06/02
01:20:58 rodrigc Exp $
 for usr.sbin/rpc.lockd/kern.c, and see if this helps.


I am running STABLE on all my test boxes, and the problem is very much  
there.  It's not everything that locks up though.  I'm able to bring X  
up with twm, but unable to launch any Gnome or KDE applications without  
them being stranded in a lock state.


I sure would have loved for your suggestion to be correct.  For what  
it's worth, all the boxes I'm working with are on STABLE no more than a  
week old.  I ran fresh build worlds on all of them before getting the  
rest of my configs going.


Thanks,


Hello, I run my client with the -L mount option. This makes NFS locks  
local to the client, which is a workaround for me. If you depend on locks  
enforced on the server it wil not work.


Ronald.

--
 Ronald Klop
 Amsterdam, The Netherlands
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-06-29 Thread Michael Collette

Rong-en Fan wrote:

On 6/29/06, Michael Collette <[EMAIL PROTECTED]> wrote:

This last week I had been working on a test network to test out 6.1
prior to upgrading our production boxes from 5.4.  That's when I ran
across the rpc.lockd issues that have been discussed earlier.

Our production setup has diskless clients running KDE, which due to this
bug is now dead on 6.1.  I also have my mail server delivering messages
to a file server via NFS.  I even have servers booting diskless with NFS
provided file systems... all of which are dead on 6.1.

The last discussion our bug updates I've seen on this issue were about 3
months ago.  This leaves me with a number of questions I hope can be
answered here on this list.

Is NFS a big deal for most other users, or am I out here on the fringe
using it as much as I do?

Is anyone working on a fix for this?  If so, is there any kind of time
frame where this fix might be MFC'd to 6-STABLE?

I guess I'm still just a bit stunned that a bug this obvious not only
found it's way into the STABLE branch, but is still there.  Maybe it's
not as obvious as I think, or not many folks are using it?  All I know
for sure here is that if I had upgraded to 6.1 my network would have
been crippled.


Try 6.1-STABLE, especially make sure you have

$FreeBSD: src/usr.sbin/rpc.lockd/kern.c,v 1.16.2.1 2006/06/02
01:20:58 rodrigc Exp $

for usr.sbin/rpc.lockd/kern.c, and see if this helps.


I am running STABLE on all my test boxes, and the problem is very much 
there.  It's not everything that locks up though.  I'm able to bring X 
up with twm, but unable to launch any Gnome or KDE applications without 
them being stranded in a lock state.


I sure would have loved for your suggestion to be correct.  For what 
it's worth, all the boxes I'm working with are on STABLE no more than a 
week old.  I ran fresh build worlds on all of them before getting the 
rest of my configs going.


Thanks,
--
Michael Collette
IT Manager
TestEquity LLC
[EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-06-29 Thread Francisco Reyes

Michael Collette writes:

This last week I had been working on a test network to test out 6.1 
prior to upgrading our production boxes from 5.4.



I wish I had done that.. :-(

 That's when I ran 
across the rpc.lockd issues that have been discussed earlier.


I am not familiar with that, but I can tell you from experience that the nfs 
client code in 6.X has issues.. In particular if the server goes down the 
client machine doesn't allow you to unmount the volume.. and if you have 
programs trying to acces the downed mount, the whole machine may end up 
freezing.
  
... I also have my mail server delivering messages 
to a file server via NFS.


We use NFS as our "storage" sever for pop/imap, but use the MTA to deliver 
to the machine.


Is NFS a big deal for most other users, or am I out here on the fringe 
using it as much as I do?


It is for us..
I am even trying to see if we can even pay someone to expedite getting NFS 
fixed in 6.


Unfortunately we decided to increase our NFS usage after I had installed 6.X 
in a number of new machines.
   

Is anyone working on a fix for this?



If there is I have not read about it.

I guess I'm still just a bit stunned that a bug this obvious not only 
found it's way into the STABLE branch, but is still there.



I am fairly new to NFS.. but I am getting the impression that FreeBSD's NFS 
is not as mature as other platforms. I also think it has a lot to do with 
usage patterns. I have seen mentions of people having hundreds of clients 
connected to a single NFS server... yet I see problems with just a handfull 
of clients. Maybe the issue is only with the 6.X branch. Sadly part of the 
reason I moved some newer machines to 6.X was because of some comments I saw 
on how NFS had been improved in 6.X :-(

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS Locking Issue

2006-06-29 Thread Alexey Karagodov

me - too ...

2006/6/29, Michael Collette <[EMAIL PROTECTED]>:


This last week I had been working on a test network to test out 6.1
prior to upgrading our production boxes from 5.4.  That's when I ran
across the rpc.lockd issues that have been discussed earlier.

Our production setup has diskless clients running KDE, which due to this
bug is now dead on 6.1.  I also have my mail server delivering messages
to a file server via NFS.  I even have servers booting diskless with NFS
provided file systems... all of which are dead on 6.1.

The last discussion our bug updates I've seen on this issue were about 3
months ago.  This leaves me with a number of questions I hope can be
answered here on this list.

Is NFS a big deal for most other users, or am I out here on the fringe
using it as much as I do?

Is anyone working on a fix for this?  If so, is there any kind of time
frame where this fix might be MFC'd to 6-STABLE?

I guess I'm still just a bit stunned that a bug this obvious not only
found it's way into the STABLE branch, but is still there.  Maybe it's
not as obvious as I think, or not many folks are using it?  All I know
for sure here is that if I had upgraded to 6.1 my network would have
been crippled.

Later on,
--
Michael Collette
IT Manager
TestEquity LLC
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"





--
С Уважением, Алексей Карагодов. Проектирование, построение,
администрирование и поддержка информационных систем.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"