Re: [OpenAFS] Re: Tuning the -daemons.

2011-07-25 Thread Jan Johansson
Jan Johansson j...@it.su.se wrote:
 I will try my best to post what we did in the end.

After another hang I was able to get a thread dump and it matched
the dynamic vcache problem so we added -disable-dynamic-vcaches
to the cache manager and it has been trouble free since.

Thank you for the invaluable help provided by this list.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Tuning the -daemons.

2011-04-29 Thread Jan Johansson
Andrew Deason adea...@sinenomine.net wrote:
 It suggests that it could be the problem, but technically
 really anything holding xvcache could cause that (or anything
 else causing the callback thread to hang). But certainly the
 issue in this thread is the most likely cause.
 
 If you want to really be sure that that's it, you could 'echo t 
 /proc/sysrq-trigger' and look in syslog. If you see a process
 inside afs_FlushVCBs and RXAFS_GiveUpCallBacks, that would
 pretty much prove that this is the specific issue.

Ok. Thank you. Now we have enough information to discuss
solutions.

I will try my best to post what we did in the end.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: Tuning the -daemons.

2011-04-28 Thread Andrew Deason
On Thu, 28 Apr 2011 10:46:25 +0200
Jan Johansson j...@it.su.se wrote:

 So when reading the thread more closely I found a command that I
 had missed.
 
 cmdebug client
 
 So this time around I tried it when the IMAP server broke and got no
 response (it timed out).
 
 Would it be correct to assume that this is evidence that I am seeing
 the mentioned problem?

It suggests that it could be the problem, but technically really
anything holding xvcache could cause that (or anything else causing the
callback thread to hang). But certainly the issue in this thread is the
most likely cause.

If you want to really be sure that that's it, you could 'echo t 
/proc/sysrq-trigger' and look in syslog. If you see a process inside
afs_FlushVCBs and RXAFS_GiveUpCallBacks, that would pretty much prove
that this is the specific issue.

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Tuning the -daemons.

2011-04-19 Thread Harald Barth
 We believe that this behaviour is fixed in 1.6.0pre4.

Do you have any idea when it was introduced?

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: Tuning the -daemons.

2011-04-19 Thread Andrew Deason
On Tue, 19 Apr 2011 14:54:38 +0200 (CEST)
Harald Barth h...@kth.se wrote:

  We believe that this behaviour is fixed in 1.6.0pre4.
 
 Do you have any idea when it was introduced?

The underlying issue I think has always existed: xvcache must be
write-locked for vcache traversal, and we traverse vcaches looking for
something to flush, and a flush may hit a fileserver for a
GiveUpCallBacks call when we flush VCBs when we run out of CBRs. I think
all of that has always been the case, from looking at git history.
(Always meaning back to OpenAFS 1.0.)

Maybe dynamic vcaches made this more likely to be hit, though (which
would be 1.4.10, Linux-only). Before/without those, I think you have to
run out of free vcache entries before you hit the relevant code path,
which I expect happens less often than we ShakeLooseVCaches these days.

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Tuning the -daemons.

2011-04-19 Thread Harald Barth
 Maybe dynamic vcaches made this more likely to be hit, though (which
 would be 1.4.10, Linux-only).

That makes sense as I think we were running something that was 1.4.9-ish
a long time without seeing any such issues.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Tuning the -daemons.

2011-04-18 Thread Simon Wilkinson

On 18 Apr 2011, at 12:33, Jan Johansson wrote:
 Some time ago (in thread
 https://lists.openafs.org/pipermail/openafs-info/2011-February/035407.html)
 I asked about the client -daemons flag.

Reviewing your original post, it has occurred to me that your problem could be 
a symptom of an issue a number of sites are seeing with callback breaks. 
Essentially, it is possible for the thread in client that handles incoming 
network traffic to hang whilst handling a callback break. If this happens, it 
appears to the fileserver like the client is no longer handling data, and you 
will see the errors that you have been seeing.

We believe that this behaviour is fixed in 1.6.0pre4. If you still have your 
test environment, it would be very interesting to know whether you still see 
these problems.

Cheers,

Simon.



___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: Tuning the -daemons.

2011-02-08 Thread Andrew Deason
On Mon, 7 Feb 2011 20:55:11 +0100
Jan Johansson j...@it.su.se wrote:

 We had this kind of problems before.
 
 In the first round the client made the server crash. An upgrade
 of the client from Ubuntu Karmic to Ubuntu Lucid solved that.

If the client made the server crash, there was a bug in the server.
Clients should not be able to make the server crash, no matter what they
do. Upgrading the client may have worked around the problem, but it did
not solve it.

 This time around we are rebuilding the IMAP servers for mail
 clients and since we have a little time before the users arrive
 with the pitch forks I am trying to understand what the right
 settings should be.

Well, the right settings would arguably be don't deliver mail into
AFS ;) But we can try what we can...

 To the best of my knowledge there never was a problem running
 rxdebug client 7001. I know for a fact the rxdebug server
 700X works without problem during the hangs.

To be clear, I mean 'rxdebug client 7001' executed from the server
that was emitting this message:

fileserver[1139]: BreakDelayedCallbacks FAILED for host
AAA.BBB.CCC.186:7001 which IS UP.  Connection from
AAA.BBB.CCC.186:7001.  Possible network or routing failure.

I would try executing that while the hang is happening, to make sure
that the server can initiate connections to the client. If it seems
okay, it may help to run 'cmdebug client', and see if you see any
messages like

Lock afs_xvcache status: stuff

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: Tuning the -daemons.

2011-02-07 Thread Andrew Deason
On Mon, 07 Feb 2011 18:02:23 +0100 (CET)
Harald Barth h...@kth.se wrote:

  Long version:
  
  We have a pretty busy IMAP server with Maildir's in AFS (yeah its
  probably crazy but we have been doing it for a number of years).
 
 Longer answer: You want to tune your servers to -daemons 128 which is

I think you mean -p 128. I believe Jan is asking about the client
background I/O daemons, not the number of server threads/processes.

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Tuning the -daemons.

2011-02-07 Thread Jan Johansson
Thank you for your interest in helping out here.

So I will start with the easy questions and try to get into the
kernel later.

Based on the History I believe that the problem is the
client/cache manager.

We had this kind of problems before.

In the first round the client made the server crash. An upgrade
of the client from Ubuntu Karmic to Ubuntu Lucid solved that.

Next the server got overloaded so we upgraded from Ubuntu Hardy
to Ubuntu Lucid. Threw out some more of the old FreeBSD and
stopped running virtual servers in ESX.

Some time passed and we got blocking fileservers tuning of the
fileservers solved some of the problems. We also threw some
random options at the client and redesigned the webmail to make
users stick to a single IMAP backend.

This time around we are rebuilding the IMAP servers for mail
clients and since we have a little time before the users arrive
with the pitch forks I am trying to understand what the right
settings should be.

In the earlier cases the server would stop serving any clients so
unrelated services (like webservers) would stop and the users
would complain about not beeing able to save their files. This
time it is only the single client/cachemanager that is affected.

The server is running Ubuntu Lucid Lynx with the included OpenAFS
1.4.12+dfsg-3 package. Fileserver is started with
-L -abortthreshold 1024 -syslog

The client is running Ubuntu Lucid Lynx with the included OpenAFS 
1.4.12+dfsg-3 package. The random options on the webmail
backends are -stat 15000 -dcache 6000 -daemons 6
-volumes 256 -rxpck 2000 -files 5 -afsdb -dynroot -fakestat
the once I am testing now are -daemons 6 -afsdb -dynroot
-fakestat.

To the best of my knowledge there never was a problem running
rxdebug client 7001. I know for a fact the rxdebug server
700X works without problem during the hangs.

Jan J

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info