Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-08-01 Thread Todd M. Lewis
Is there a way to tell the fileservers not to talk to clients below a 
certain rev, or only allow reads? That should encourage them to upgrade. 
Or leave. Not nice maybe, but if old clients can DoS your servers...


Jeffrey Altman wrote:

Matthew Cocker wrote:

I wish. I still have people using 1.3.64. They refuse to upgrade despite
my efforts to show them the benefits of the upgrade. Alot of people on
campus feel that the new clients are not as stable as the old ones.
Probably because they all have the same uuid.


1.3.64 clients will take down your file servers having nothing to do
with multiple uuids.  Clients older than 1.3.80 have a bug the generates
a new rx connection per authenticated request.  Client's older than
1.3.80 do not support UUIDs.

1.4.1 or later will prevent the cloning of UUIDs.

1.5.12 or later use a CIFS server implementation that passes Microsoft's
protocol tests.

1.5.21 is current.

If your users have stability problems, then should file bug reports.
Otherwise, we won't know there are issues that need to be fixed.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-08-01 Thread Jim Rees
Todd M. Lewis wrote:

  Is there a way to tell the fileservers not to talk to clients below a 
  certain rev, or only allow reads? That should encourage them to upgrade. 
  Or leave. Not nice maybe, but if old clients can DoS your servers...

Not directly, I don't think, but you could write a script that would go
through the server log periodically and get the IP addresses of misbehaving
clients, then add them to a firewall rule.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-08-01 Thread Jeffrey Altman
Todd M. Lewis wrote:
 Is there a way to tell the fileservers not to talk to clients below a
 certain rev, or only allow reads? That should encourage them to upgrade.
 Or leave. Not nice maybe, but if old clients can DoS your servers...

You could patch your file servers to call rx_GetServerVersion() for any
new hosts for which a matching UUID could not be found.  You could then
compare the returned string to a list of known version strings that you
wish to block and refuse to provide service.

---

In addition, we recently committed DELTA viced-no-nulluuid-20070719.
This patch checks to see if the UUID is all zeros and if so treats the
client as if it doesn't support UUIDs at all.  This could easily be
expanded to treat an arbitrary list of known to be cloned UUIDs as if
UUIDs are not supported.  Simply read the list of known UUIDs from a
file a startup and 

Jeffrey Altman


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-08-01 Thread Derrick Brashear
On 8/1/07, Todd M. Lewis [EMAIL PROTECTED] wrote:

 Is there a way to tell the fileservers not to talk to clients below a
 certain rev, or only allow reads? That should encourage them to upgrade.
 Or leave. Not nice maybe, but if old clients can DoS your servers...



The version probe is not guaranteed to be reliable, and can have custom
version strings for site-built software.


Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-08-01 Thread Jeffrey Altman
Derrick Brashear wrote:
 
 
 On 8/1/07, *Todd M. Lewis* [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] wrote:
 
 Is there a way to tell the fileservers not to talk to clients below a
 certain rev, or only allow reads? That should encourage them to upgrade.
 Or leave. Not nice maybe, but if old clients can DoS your servers...
 
 
 
 The version probe is not guaranteed to be reliable, and can have custom
 version strings for site-built software.

That is why it would be implemented as a list of version strings loaded
from a file.   Organizations could decide what versions they want to
block.  We wouldn't do it for them.  It would simply provide an
additional tool that could be used to assist in forcing upgrades.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-08-01 Thread Matthew Cocker
Better than what I have now which is they break AFS for all the other users.

Cheers

Matt

On 8/2/07, Jeffrey Altman [EMAIL PROTECTED] wrote:

 Matthew Cocker wrote:
  We do a similar thing for our locally developed cost recover solution.
  The clients pass the version over in the initial handshake. If the
  version is less than our required minimum the server rejects with a you
  need to upgrade message. The only way we have found to force standards
  in our loosely coupled campus.
 
  Cheers
 
  Matt

 Except here there would be no upgrade message.  The client would just
 stop working.  When they contact the Help Desk they will be told they
 must upgrade.







Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-08-01 Thread Matthew Cocker
We do a similar thing for our locally developed cost recover solution. The
clients pass the version over in the initial handshake. If the version is
less than our required minimum the server rejects with a you need to upgrade
message. The only way we have found to force standards in our lossely
coupled campus.

Cheers

Matt

On 8/2/07, Jeffrey Altman [EMAIL PROTECTED] wrote:

 Derrick Brashear wrote:
 
 
  On 8/1/07, *Todd M. Lewis* [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  Is there a way to tell the fileservers not to talk to clients below
 a
  certain rev, or only allow reads? That should encourage them to
 upgrade.
  Or leave. Not nice maybe, but if old clients can DoS your servers...
 
 
 
  The version probe is not guaranteed to be reliable, and can have custom
  version strings for site-built software.

 That is why it would be implemented as a list of version strings loaded
 from a file.   Organizations could decide what versions they want to
 block.  We wouldn't do it for them.  It would simply provide an
 additional tool that could be used to assist in forcing upgrades.





Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-08-01 Thread Jeffrey Altman
Matthew Cocker wrote:
 We do a similar thing for our locally developed cost recover solution.
 The clients pass the version over in the initial handshake. If the
 version is less than our required minimum the server rejects with a you
 need to upgrade message. The only way we have found to force standards
 in our loosely coupled campus.
 
 Cheers
 
 Matt

Except here there would be no upgrade message.  The client would just
stop working.  When they contact the Help Desk they will be told they
must upgrade.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-08-01 Thread Matthew Cocker
Jeff


I dumped the ip addresses and checked. All the offending machines are
running 1.5.16

rxdebug 130.216.22.5 7001 -version
Trying 130.216.22.5 (port 7001):
AFS version: OpenAFS1.5.1600


Cheers

Matt

On 8/1/07, Jeffrey Altman [EMAIL PROTECTED] wrote:

 Matthew Cocker wrote:
  UUID: e3586602-1240-4f07-b1-d7-1b72dc8e715d
  UUID: ea457696-8eec-4e71-b3-74-a2164c09d96b
 
  These coorelated to a set of ip addresses that belonged to one
  department which had reimaged a lot of lab machines last week. They are
  fixing their machines using the file they got from
  http://help.unc.edu/5667#d20492e47

 They must be using an old version of the Windows client.  Get them to
 re-image with the latest build.





Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-07-31 Thread Matthew Cocker
Forgot reply-all again

On 8/1/07, Matthew Cocker [EMAIL PROTECTED] wrote:

 The -p 128 flag alone has not solved the problem. I have a Filelog (from
 one locked up server) and rxdebug output from all servers at the time when
 lots locked up last night). They can be accessed via


 https://webdropoff.auckland.ac.nz/cgi-bin/pickup/6f25589961a37ad4d4a144bbd868058a/516290
 Interesting I started running cmdebug against all the clients that connect
 to our servers and I can see a lot of machines with the same UUID. I tried
 to apply the patch but it failed to add two of the chunks against 1.4.3source 
 in the rpm. The rejected file output was

 patch -p0  ../../../UUID.patch
 patching file audit.c
 Reversed (or previously applied) patch detected!  Assume -R? [n] y
 Hunk #1 succeeded at 66 (offset 10 lines).
 Hunk #3 FAILED at 88.
 Hunk #4 succeeded at 114 (offset 6 lines).
 Hunk #5 succeeded at 144 (offset 1 line).
 Hunk #6 succeeded at 181 (offset 10 lines).
 Hunk #7 FAILED at 223.
 Hunk #8 succeeded at 222 (offset -3 lines).
 2 out of 8 hunks FAILED -- saving rejects to file audit.c.rej
 less audit.c.rej
 ***
 *** 88,98 
 bufferPtr += sizeof(vaLong);
 break;
 case AUD_LST:   /* Ptr to another list */
 -   vaLst = (va_list)va_arg(vaList, va_list);
 audmakebuf(audEvent, vaLst);
 break;
 case AUD_FID:   /* AFSFid - contains 3 entries */
 -   vaFid = (struct AFSFid *)va_arg(vaList, struct AFSFid *);
 if (vaFid) {
 ***
 *** 223,233 
 fprintf(out, LONG %d , vaLong);
 break;
 case AUD_LST:   /* Ptr to another list */
 -   vaLst = va_arg(vaList, va_list);
 printbuf(out, 1, VALST, 0, vaLst);
 break;
 case AUD_FID:   /* AFSFid - contains 3 entries */
 -   vaFid = va_arg(vaList, struct AFSFid *);
 if (vaFid)
 fprintf(out, FID %u:%u:%u , vaFid-Volume, vaFid-Vnode,
vaFid-Unique);
 --- 223,233 
 fprintf(out, LONG %d , vaLong);
 break;
 case AUD_LST:   /* Ptr to another list */
 +   vaLst = (char *)va_arg(vaList, int);
 printbuf(out, 1, VALST, 0, vaLst);
 break;
 case AUD_FID:   /* AFSFid - contains 3 entries */
 +   vaFid = (struct AFSFid *)va_arg(vaList, int);
 if (vaFid)
 fprintf(out, FID %u:%u:%u , vaFid-Volume, vaFid-Vnode,
vaFid-Unique);



 On 7/31/07, Derrick Brashear [EMAIL PROTECTED] wrote:
 
 
  It's not really particularly useful.
 
  I do, however, have a suggestion.
 
  1) up the number of threads to 128 (fileserver -p 128)
  2) apply the diff in RT ticket #19461, which i added to the ticket on
  Mon Aug 08 14:49:12 2005
 
 



Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-07-31 Thread Jeffrey Altman
What is the UUID that you see repeated?



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-07-31 Thread Matthew Cocker
OK

I rushed when I saw the diff attached. But the test cut from the web page
seems to have the wrong line numbers in the diff etc and fails to add. will
add manually if need be

Thansk for the help

Cheers

Matt


Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-07-31 Thread Matthew Cocker
UUID: e3586602-1240-4f07-b1-d7-1b72dc8e715d
UUID: ea457696-8eec-4e71-b3-74-a2164c09d96b

These coorelated to a set of ip addresses that belonged to one department
which had reimaged a lot of lab machines last week. They are fixing their
machines using the file they got from
http://help.unc.edu/5667#d20492e47

Cheers
Matt

On 8/1/07, Jeffrey Altman [EMAIL PROTECTED] wrote:

 What is the UUID that you see repeated?





Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-07-31 Thread Jeffrey Altman
Matthew Cocker wrote:
 UUID: e3586602-1240-4f07-b1-d7-1b72dc8e715d
 UUID: ea457696-8eec-4e71-b3-74-a2164c09d96b
 
 These coorelated to a set of ip addresses that belonged to one
 department which had reimaged a lot of lab machines last week. They are
 fixing their machines using the file they got from
 http://help.unc.edu/5667#d20492e47

They must be using an old version of the Windows client.  Get them to
re-image with the latest build.


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-07-31 Thread Matthew Cocker
I wish. I still have people using 1.3.64. They refuse to upgrade despite my
efforts to show them the benefits of the upgrade. Alot of people on campus
feel that the new clients are not as stable as the old ones. Probably
because they all have the same uuid.


Cheers

Matt

On 8/1/07, Jeffrey Altman [EMAIL PROTECTED] wrote:

 Matthew Cocker wrote:
  UUID: e3586602-1240-4f07-b1-d7-1b72dc8e715d
  UUID: ea457696-8eec-4e71-b3-74-a2164c09d96b
 
  These coorelated to a set of ip addresses that belonged to one
  department which had reimaged a lot of lab machines last week. They are
  fixing their machines using the file they got from
  http://help.unc.edu/5667#d20492e47

 They must be using an old version of the Windows client.  Get them to
 re-image with the latest build.





Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-07-31 Thread Jeffrey Altman
Matthew Cocker wrote:
 I wish. I still have people using 1.3.64. They refuse to upgrade despite
 my efforts to show them the benefits of the upgrade. Alot of people on
 campus feel that the new clients are not as stable as the old ones.
 Probably because they all have the same uuid.

1.3.64 clients will take down your file servers having nothing to do
with multiple uuids.  Clients older than 1.3.80 have a bug the generates
a new rx connection per authenticated request.  Client's older than
1.3.80 do not support UUIDs.

1.4.1 or later will prevent the cloning of UUIDs.

1.5.12 or later use a CIFS server implementation that passes Microsoft's
protocol tests.

1.5.21 is current.

If your users have stability problems, then should file bug reports.
Otherwise, we won't know there are issues that need to be fixed.




___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-07-30 Thread Matthew Cocker
Tonight we had 10 of our afs fielserver lockup. I had upgraded so to
1.4.4but they dies as well. All run on redhat 3 up6. Only one process
shows in ps
listing and gcores on this process seem to give nothing. A pstack dump is
below. Is it any good. This is now a real disaster and very weird. I have
other fileserver that are setup identically which are not dying. The only
difference is that these are on a different subnet and a different server
room.

Thread 22 (Thread -1218524240 (LWP 27894)):
#0  0x0044cc84 in sigwait () from /lib/tls/libpthread.so.0
#1  0x08073a32 in ?? ()
#2  0xb75ec9f0 in ?? ()
#3  0xb75ec96c in ?? ()
#4  0x in ?? ()
Thread 21 (Thread -1229263952 (LWP 27895)):
#0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x080b3c0e in ?? ()
#2  0x08646828 in ?? ()
#3  0x086467dc in ?? ()
#4  0xb6bae328 in ?? ()
#5  0x080af859 in ?? ()
#6  0x0001 in ?? ()
#7  0x in ?? ()
Thread 20 (Thread -1240114256 (LWP 27896)):
#0  0x0044959b in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0808e5c0 in ?? ()
#2  0x080f9540 in stderr ()
#3  0x080f94c0 in stderr ()
#4  0xb6155a58 in ?? ()
#5  0x01cfe9b8 in ?? ()
#6  0x in ?? ()
Thread 19 (Thread -1254962256 (LWP 27897)):
#0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0806fe71 in ?? ()
#2  0xb490bba8 in ?? ()
#3  0xb490bb60 in ?? ()
#4  0xb532c428 in ?? ()
#5  0x08085a21 in ?? ()
#6  0xab1da358 in ?? ()
#7  0x in ?? ()
Thread 18 (Thread -1265587280 (LWP 27898)):
#0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0806fe71 in ?? ()
#2  0xb490bba8 in ?? ()
#3  0xb490bb60 in ?? ()
#4  0xb490a428 in ?? ()
#5  0x08085a21 in ?? ()
#6  0xab1c93a0 in ?? ()
#7  0x in ?? ()
Thread 17 (Thread -1276077136 (LWP 27899)):
#0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0806fe71 in ?? ()
#2  0xb490bba8 in ?? ()
#3  0xb490bb60 in ?? ()
#4  0x0001 in ?? ()
#5  0xb3f093d8 in ?? ()
#6  0x00c650fd in malloc () from /lib/tls/libc.so.6
#7  0x0805ea56 in ?? ()
#8  0xb490bb5c in ?? ()
#9  0x0002 in ?? ()
#10 0x0001 in ?? ()
#11 0x00449ed5 in pthread_getspecific () from /lib/tls/libpthread.so.0
#12 0x0805f182 in ?? ()
#13 0xb490bb08 in ?? ()
#14 0x00448b20 in pthread_mutex_unlock () from /lib/tls/libpthread.so.0
#15 0x080604de in ?? ()
#16 0x07ead882 in ?? ()
#17 0x591b in ?? ()
#18 0xb3f09474 in ?? ()
#19 0x591b in ?? ()
#20 0x0854a3f8 in ?? ()
#21 0x591b5f70 in ?? ()
#22 0x in ?? ()
Thread 16 (Thread -1286566992 (LWP 27900)):
#0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x080a3a82 in ?? ()
#2  0x087244cc in ?? ()
#3  0x080f9758 in stderr ()
#4  0xb3508a18 in ?? ()
#5  0x080a3e85 in ?? ()
#6  0x in ?? ()
Thread 15 (Thread -1297056848 (LWP 27901)):
#0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0806fe71 in ?? ()
#2  0xb490bba8 in ?? ()
#3  0xb490bb60 in ?? ()
#4  0xb2b07428 in ?? ()
#5  0x08085a21 in ?? ()
#6  0xaabb1b90 in ?? ()
#7  0x0002 in ?? ()
#8  0xb490bb08 in ?? ()
#9  0xb490bb08 in ?? ()
#10 0xb490bb60 in ?? ()
#11 0x03ead882 in ?? ()
#12 0xb2b073f8 in ?? ()
#13 0x0805ea56 in ?? ()
#14 0xb490bb5c in ?? ()
#15 0x0002 in ?? ()
#16 0xaabb1b98 in ?? ()
#17 0x00449ed5 in pthread_getspecific () from /lib/tls/libpthread.so.0
Thread 14 (Thread -1307546704 (LWP 27902)):
#0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0806fe71 in ?? ()
#2  0xb490bba8 in ?? ()
#3  0xb490bb60 in ?? ()
#4  0xb2106428 in ?? ()
#5  0x08085a21 in ?? ()
#6  0x08e01618 in ?? ()
#7  0x in ?? ()
Thread 13 (Thread -1318036560 (LWP 27903)):
#0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0806fe71 in ?? ()
#2  0xb490bba8 in ?? ()
#3  0xb490bb60 in ?? ()
#4  0xb1705428 in ?? ()
#5  0x08085a21 in ?? ()
#6  0xaac696f8 in ?? ()
#7  0x0002 in ?? ()
#8  0xb490bb08 in ?? ()
#9  0xb490bb08 in ?? ()
#10 0xb490bb60 in ?? ()
#11 0x06ead882 in ?? ()
#12 0xb17053f8 in ?? ()
#13 0x0805ea56 in ?? ()
#14 0xb490bb5c in ?? ()
#15 0x0002 in ?? ()
#16 0xaac69700 in ?? ()
#17 0x00449ed5 in pthread_getspecific () from /lib/tls/libpthread.so.0
Thread 12 (Thread -1328526416 (LWP 27904)):
#0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0806fe71 in ?? ()
#2  0xb490bba8 in ?? ()
#3  0xb490bb60 in ?? ()
#4  0xb0d04428 in ?? ()
#5  0x08085a21 in ?? ()
#6  0x087335e8 in ?? ()
#7  0x0002 in ?? ()
#8  0xb490bb08 in ?? ()
#9  0xb490bb08 in ?? ()
#10 0xb490bb60 in ?? ()
#11 0x2bebd882 in ?? ()
#12 0xb0d043f8 in ?? ()
#13 0x0805ea56 in ?? ()
#14 0xb490bb5c in ?? ()
#15 0x0002 in ?? ()
#16 0x087335f0 in ?? ()
#17 0x00449ed5 in pthread_getspecific () from /lib/tls/libpthread.so.0
Thread 11 (Thread -1339016272 (LWP 27905)):
#0  0x0044bf5e in recvmsg () from /lib/tls/libpthread.so.0
#1  0x080b1a8f in ?? ()
#2  0x0005 in ?? ()
#3  0xb03039d0 in ?? ()
#4  0x in ?? ()
Thread 10 (Thread -1349506128 (LWP 27906)):
#0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0806fe71 in ?? ()
#2  0xb490bba8 in ?? ()
#3  0xb490bb60 in ?? ()
#4  0xaf902428 in ?? ()

Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-07-30 Thread Jeffrey Altman
Matthew Cocker wrote:
 Tonight we had 10 of our afs fielserver lockup. I had upgraded so to
 1.4.4 but they dies as well. All run on redhat 3 up6. Only one process
 shows in ps listing and gcores on this process seem to give nothing. A
 pstack dump is below. Is it any good. This is now a real disaster and
 very weird. I have other fileserver that are setup identically which are
 not dying. The only difference is that these are on a different subnet
 and a different server room.
 
 Thread 22 (Thread -1218524240 (LWP 27894)):
 #0  0x0044cc84 in sigwait () from /lib/tls/libpthread.so.0
 #1  0x08073a32 in ?? ()
 #2  0xb75ec9f0 in ?? ()
 #3  0xb75ec96c in ?? ()
 #4  0x in ?? ()

You will need to rebuild the server binaries without stripping the debug
info in order for the stack data to be truly useful.

The fact that your servers on another subnet are having no issues makes
me wonder if there are networking issues involved.  Perhaps a
mis-configured router or firewall.  Of course with so little data made
available to us it would be hard to point you in the right direction.

We don't know the states of the servers.  We don't know the states of
the clients.  We haven't seen any of the log data.



smime.p7s
Description: S/MIME Cryptographic Signature


Fwd: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-07-30 Thread Matthew Cocker
Sorry realised I hit reply instead reply-all

-- Forwarded message --
From: Matthew Cocker [EMAIL PROTECTED]
Date: Jul 31, 2007 7:49 AM
Subject: Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked
connections
To: [EMAIL PROTECTED]





 You will need to rebuild the server binaries without stripping the debug
 info in order for the stack data to be truly useful.



I thought that was what the openafs-debuginfo-1.4.2 package delivered.  Will
rebuild.


rpm -q --info openafs-debuginfo
Name: openafs-debuginfoRelocations: (not relocatable)
Version : 1.4.2 Vendor: (none)
Release : 1.1   Build Date: Sun 19 Nov 2006
14:43:08 NZDT
Install Date: Wed 17 Jan 2007 08:47:07 NZDT  Build Host:
ecafs-test1.test.ec.auckland.ac.nz
Group   : Development/Debug Source RPM:
openafs-1.4.2-1.1.src.rpm
Size: 7950566  License: IBM Public License
Signature   : (none)
Packager: Derek Atkins [EMAIL PROTECTED]
URL : http://www.openafs.org
Summary : Debug information for package openafs
Description :
This package provides debug information for package openafs.
Debug information is useful when developing applications that use this
package or when debugging this package.


The fact that your servers on another subnet are having no issues makes
 me wonder if there are networking issues involved.  Perhaps a
 mis-configured router or firewall.  Of course with so little data made
 available to us it would be hard to point you in the right direction.

 We don't know the states of the servers.  We don't know the states of
 the clients.  We haven't seen any of the log data.


 Apart from uping debug to 25 and getting filelog, what esle do you need.
The clients can not access a volume on these servers at this time but
recover as soon as server is restarted. The disk io is almost none existent
at the time of the problem and load/cpu usage are low. If I knwo what I need
to collect the servers are dying almost every night so I should be able to
get something more useful.

Cheers

Matt


[OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-07-27 Thread Matthew Cocker
Hi

We are running about 20 redhat AS3 based openafs 1.4.2 fileservers. For the
last three days between 4pm-10pm we have been getting 4-6 fileserver stop
serving files with nagios monitoring warning of  200 blocked connections. I
have turned on debug for the fileserver prcoess and have a log file but
nothing seemed bad to me (not that I would know). The servers are basically
idle during these distruptions with CPU or disk showing very low usage but
we have to be restarted to get access to files back.

We added the -L flag to the fileserver process today to see if this helps
but we are wondering if we can do anything else to find the cause and/or
prevent these disruptions.

We have checked and there are no admin scripts running at these times.


BTW It would not be so bad if the client would fail over to other readonly
volumes but it does not seem to. The fileservers effected seem to have the
user root readonly volume on them but when the servers go into this state
all client that have this server as the highest in the prioirity list just
lock up and need to be restarted. Also despite having 10 readonly volumes to
pcik form the clients tend to hit only a couple.


Cheers

Matt


Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

2007-07-27 Thread Derrick Brashear
use gdb's generate-core-file, or gcore, or pstack if you have it, and get a
backtrace.

On 7/27/07, Matthew Cocker [EMAIL PROTECTED] wrote:

 Hi

 We are running about 20 redhat AS3 based openafs 1.4.2 fileservers. For
 the last three days between 4pm-10pm we have been getting 4-6 fileserver
 stop serving files with nagios monitoring warning of  200 blocked
 connections. I have turned on debug for the fileserver prcoess and have a
 log file but nothing seemed bad to me (not that I would know). The servers
 are basically idle during these distruptions with CPU or disk showing very
 low usage but we have to be restarted to get access to files back.

 We added the -L flag to the fileserver process today to see if this helps
 but we are wondering if we can do anything else to find the cause and/or
 prevent these disruptions.

 We have checked and there are no admin scripts running at these times.


 BTW It would not be so bad if the client would fail over to other readonly
 volumes but it does not seem to. The fileservers effected seem to have the
 user root readonly volume on them but when the servers go into this state
 all client that have this server as the highest in the prioirity list just
 lock up and need to be restarted. Also despite having 10 readonly volumes to
 pcik form the clients tend to hit only a couple.


 Cheers

 Matt