Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
Is there a way to tell the fileservers not to talk to clients below a certain rev, or only allow reads? That should encourage them to upgrade. Or leave. Not nice maybe, but if old clients can DoS your servers... Jeffrey Altman wrote: Matthew Cocker wrote: I wish. I still have people using 1.3.64. They refuse to upgrade despite my efforts to show them the benefits of the upgrade. Alot of people on campus feel that the new clients are not as stable as the old ones. Probably because they all have the same uuid. 1.3.64 clients will take down your file servers having nothing to do with multiple uuids. Clients older than 1.3.80 have a bug the generates a new rx connection per authenticated request. Client's older than 1.3.80 do not support UUIDs. 1.4.1 or later will prevent the cloning of UUIDs. 1.5.12 or later use a CIFS server implementation that passes Microsoft's protocol tests. 1.5.21 is current. If your users have stability problems, then should file bug reports. Otherwise, we won't know there are issues that need to be fixed. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
Todd M. Lewis wrote: Is there a way to tell the fileservers not to talk to clients below a certain rev, or only allow reads? That should encourage them to upgrade. Or leave. Not nice maybe, but if old clients can DoS your servers... Not directly, I don't think, but you could write a script that would go through the server log periodically and get the IP addresses of misbehaving clients, then add them to a firewall rule. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
Todd M. Lewis wrote: Is there a way to tell the fileservers not to talk to clients below a certain rev, or only allow reads? That should encourage them to upgrade. Or leave. Not nice maybe, but if old clients can DoS your servers... You could patch your file servers to call rx_GetServerVersion() for any new hosts for which a matching UUID could not be found. You could then compare the returned string to a list of known version strings that you wish to block and refuse to provide service. --- In addition, we recently committed DELTA viced-no-nulluuid-20070719. This patch checks to see if the UUID is all zeros and if so treats the client as if it doesn't support UUIDs at all. This could easily be expanded to treat an arbitrary list of known to be cloned UUIDs as if UUIDs are not supported. Simply read the list of known UUIDs from a file a startup and Jeffrey Altman smime.p7s Description: S/MIME Cryptographic Signature
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
On 8/1/07, Todd M. Lewis [EMAIL PROTECTED] wrote: Is there a way to tell the fileservers not to talk to clients below a certain rev, or only allow reads? That should encourage them to upgrade. Or leave. Not nice maybe, but if old clients can DoS your servers... The version probe is not guaranteed to be reliable, and can have custom version strings for site-built software.
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
Derrick Brashear wrote: On 8/1/07, *Todd M. Lewis* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Is there a way to tell the fileservers not to talk to clients below a certain rev, or only allow reads? That should encourage them to upgrade. Or leave. Not nice maybe, but if old clients can DoS your servers... The version probe is not guaranteed to be reliable, and can have custom version strings for site-built software. That is why it would be implemented as a list of version strings loaded from a file. Organizations could decide what versions they want to block. We wouldn't do it for them. It would simply provide an additional tool that could be used to assist in forcing upgrades. smime.p7s Description: S/MIME Cryptographic Signature
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
Better than what I have now which is they break AFS for all the other users. Cheers Matt On 8/2/07, Jeffrey Altman [EMAIL PROTECTED] wrote: Matthew Cocker wrote: We do a similar thing for our locally developed cost recover solution. The clients pass the version over in the initial handshake. If the version is less than our required minimum the server rejects with a you need to upgrade message. The only way we have found to force standards in our loosely coupled campus. Cheers Matt Except here there would be no upgrade message. The client would just stop working. When they contact the Help Desk they will be told they must upgrade.
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
We do a similar thing for our locally developed cost recover solution. The clients pass the version over in the initial handshake. If the version is less than our required minimum the server rejects with a you need to upgrade message. The only way we have found to force standards in our lossely coupled campus. Cheers Matt On 8/2/07, Jeffrey Altman [EMAIL PROTECTED] wrote: Derrick Brashear wrote: On 8/1/07, *Todd M. Lewis* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Is there a way to tell the fileservers not to talk to clients below a certain rev, or only allow reads? That should encourage them to upgrade. Or leave. Not nice maybe, but if old clients can DoS your servers... The version probe is not guaranteed to be reliable, and can have custom version strings for site-built software. That is why it would be implemented as a list of version strings loaded from a file. Organizations could decide what versions they want to block. We wouldn't do it for them. It would simply provide an additional tool that could be used to assist in forcing upgrades.
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
Matthew Cocker wrote: We do a similar thing for our locally developed cost recover solution. The clients pass the version over in the initial handshake. If the version is less than our required minimum the server rejects with a you need to upgrade message. The only way we have found to force standards in our loosely coupled campus. Cheers Matt Except here there would be no upgrade message. The client would just stop working. When they contact the Help Desk they will be told they must upgrade. smime.p7s Description: S/MIME Cryptographic Signature
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
Jeff I dumped the ip addresses and checked. All the offending machines are running 1.5.16 rxdebug 130.216.22.5 7001 -version Trying 130.216.22.5 (port 7001): AFS version: OpenAFS1.5.1600 Cheers Matt On 8/1/07, Jeffrey Altman [EMAIL PROTECTED] wrote: Matthew Cocker wrote: UUID: e3586602-1240-4f07-b1-d7-1b72dc8e715d UUID: ea457696-8eec-4e71-b3-74-a2164c09d96b These coorelated to a set of ip addresses that belonged to one department which had reimaged a lot of lab machines last week. They are fixing their machines using the file they got from http://help.unc.edu/5667#d20492e47 They must be using an old version of the Windows client. Get them to re-image with the latest build.
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
Forgot reply-all again On 8/1/07, Matthew Cocker [EMAIL PROTECTED] wrote: The -p 128 flag alone has not solved the problem. I have a Filelog (from one locked up server) and rxdebug output from all servers at the time when lots locked up last night). They can be accessed via https://webdropoff.auckland.ac.nz/cgi-bin/pickup/6f25589961a37ad4d4a144bbd868058a/516290 Interesting I started running cmdebug against all the clients that connect to our servers and I can see a lot of machines with the same UUID. I tried to apply the patch but it failed to add two of the chunks against 1.4.3source in the rpm. The rejected file output was patch -p0 ../../../UUID.patch patching file audit.c Reversed (or previously applied) patch detected! Assume -R? [n] y Hunk #1 succeeded at 66 (offset 10 lines). Hunk #3 FAILED at 88. Hunk #4 succeeded at 114 (offset 6 lines). Hunk #5 succeeded at 144 (offset 1 line). Hunk #6 succeeded at 181 (offset 10 lines). Hunk #7 FAILED at 223. Hunk #8 succeeded at 222 (offset -3 lines). 2 out of 8 hunks FAILED -- saving rejects to file audit.c.rej less audit.c.rej *** *** 88,98 bufferPtr += sizeof(vaLong); break; case AUD_LST: /* Ptr to another list */ - vaLst = (va_list)va_arg(vaList, va_list); audmakebuf(audEvent, vaLst); break; case AUD_FID: /* AFSFid - contains 3 entries */ - vaFid = (struct AFSFid *)va_arg(vaList, struct AFSFid *); if (vaFid) { *** *** 223,233 fprintf(out, LONG %d , vaLong); break; case AUD_LST: /* Ptr to another list */ - vaLst = va_arg(vaList, va_list); printbuf(out, 1, VALST, 0, vaLst); break; case AUD_FID: /* AFSFid - contains 3 entries */ - vaFid = va_arg(vaList, struct AFSFid *); if (vaFid) fprintf(out, FID %u:%u:%u , vaFid-Volume, vaFid-Vnode, vaFid-Unique); --- 223,233 fprintf(out, LONG %d , vaLong); break; case AUD_LST: /* Ptr to another list */ + vaLst = (char *)va_arg(vaList, int); printbuf(out, 1, VALST, 0, vaLst); break; case AUD_FID: /* AFSFid - contains 3 entries */ + vaFid = (struct AFSFid *)va_arg(vaList, int); if (vaFid) fprintf(out, FID %u:%u:%u , vaFid-Volume, vaFid-Vnode, vaFid-Unique); On 7/31/07, Derrick Brashear [EMAIL PROTECTED] wrote: It's not really particularly useful. I do, however, have a suggestion. 1) up the number of threads to 128 (fileserver -p 128) 2) apply the diff in RT ticket #19461, which i added to the ticket on Mon Aug 08 14:49:12 2005
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
What is the UUID that you see repeated? smime.p7s Description: S/MIME Cryptographic Signature
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
OK I rushed when I saw the diff attached. But the test cut from the web page seems to have the wrong line numbers in the diff etc and fails to add. will add manually if need be Thansk for the help Cheers Matt
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
UUID: e3586602-1240-4f07-b1-d7-1b72dc8e715d UUID: ea457696-8eec-4e71-b3-74-a2164c09d96b These coorelated to a set of ip addresses that belonged to one department which had reimaged a lot of lab machines last week. They are fixing their machines using the file they got from http://help.unc.edu/5667#d20492e47 Cheers Matt On 8/1/07, Jeffrey Altman [EMAIL PROTECTED] wrote: What is the UUID that you see repeated?
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
Matthew Cocker wrote: UUID: e3586602-1240-4f07-b1-d7-1b72dc8e715d UUID: ea457696-8eec-4e71-b3-74-a2164c09d96b These coorelated to a set of ip addresses that belonged to one department which had reimaged a lot of lab machines last week. They are fixing their machines using the file they got from http://help.unc.edu/5667#d20492e47 They must be using an old version of the Windows client. Get them to re-image with the latest build. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
I wish. I still have people using 1.3.64. They refuse to upgrade despite my efforts to show them the benefits of the upgrade. Alot of people on campus feel that the new clients are not as stable as the old ones. Probably because they all have the same uuid. Cheers Matt On 8/1/07, Jeffrey Altman [EMAIL PROTECTED] wrote: Matthew Cocker wrote: UUID: e3586602-1240-4f07-b1-d7-1b72dc8e715d UUID: ea457696-8eec-4e71-b3-74-a2164c09d96b These coorelated to a set of ip addresses that belonged to one department which had reimaged a lot of lab machines last week. They are fixing their machines using the file they got from http://help.unc.edu/5667#d20492e47 They must be using an old version of the Windows client. Get them to re-image with the latest build.
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
Matthew Cocker wrote: I wish. I still have people using 1.3.64. They refuse to upgrade despite my efforts to show them the benefits of the upgrade. Alot of people on campus feel that the new clients are not as stable as the old ones. Probably because they all have the same uuid. 1.3.64 clients will take down your file servers having nothing to do with multiple uuids. Clients older than 1.3.80 have a bug the generates a new rx connection per authenticated request. Client's older than 1.3.80 do not support UUIDs. 1.4.1 or later will prevent the cloning of UUIDs. 1.5.12 or later use a CIFS server implementation that passes Microsoft's protocol tests. 1.5.21 is current. If your users have stability problems, then should file bug reports. Otherwise, we won't know there are issues that need to be fixed. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
Tonight we had 10 of our afs fielserver lockup. I had upgraded so to 1.4.4but they dies as well. All run on redhat 3 up6. Only one process shows in ps listing and gcores on this process seem to give nothing. A pstack dump is below. Is it any good. This is now a real disaster and very weird. I have other fileserver that are setup identically which are not dying. The only difference is that these are on a different subnet and a different server room. Thread 22 (Thread -1218524240 (LWP 27894)): #0 0x0044cc84 in sigwait () from /lib/tls/libpthread.so.0 #1 0x08073a32 in ?? () #2 0xb75ec9f0 in ?? () #3 0xb75ec96c in ?? () #4 0x in ?? () Thread 21 (Thread -1229263952 (LWP 27895)): #0 0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x080b3c0e in ?? () #2 0x08646828 in ?? () #3 0x086467dc in ?? () #4 0xb6bae328 in ?? () #5 0x080af859 in ?? () #6 0x0001 in ?? () #7 0x in ?? () Thread 20 (Thread -1240114256 (LWP 27896)): #0 0x0044959b in pthread_cond_timedwait@@GLIBC_2.3.2 () #1 0x0808e5c0 in ?? () #2 0x080f9540 in stderr () #3 0x080f94c0 in stderr () #4 0xb6155a58 in ?? () #5 0x01cfe9b8 in ?? () #6 0x in ?? () Thread 19 (Thread -1254962256 (LWP 27897)): #0 0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0806fe71 in ?? () #2 0xb490bba8 in ?? () #3 0xb490bb60 in ?? () #4 0xb532c428 in ?? () #5 0x08085a21 in ?? () #6 0xab1da358 in ?? () #7 0x in ?? () Thread 18 (Thread -1265587280 (LWP 27898)): #0 0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0806fe71 in ?? () #2 0xb490bba8 in ?? () #3 0xb490bb60 in ?? () #4 0xb490a428 in ?? () #5 0x08085a21 in ?? () #6 0xab1c93a0 in ?? () #7 0x in ?? () Thread 17 (Thread -1276077136 (LWP 27899)): #0 0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0806fe71 in ?? () #2 0xb490bba8 in ?? () #3 0xb490bb60 in ?? () #4 0x0001 in ?? () #5 0xb3f093d8 in ?? () #6 0x00c650fd in malloc () from /lib/tls/libc.so.6 #7 0x0805ea56 in ?? () #8 0xb490bb5c in ?? () #9 0x0002 in ?? () #10 0x0001 in ?? () #11 0x00449ed5 in pthread_getspecific () from /lib/tls/libpthread.so.0 #12 0x0805f182 in ?? () #13 0xb490bb08 in ?? () #14 0x00448b20 in pthread_mutex_unlock () from /lib/tls/libpthread.so.0 #15 0x080604de in ?? () #16 0x07ead882 in ?? () #17 0x591b in ?? () #18 0xb3f09474 in ?? () #19 0x591b in ?? () #20 0x0854a3f8 in ?? () #21 0x591b5f70 in ?? () #22 0x in ?? () Thread 16 (Thread -1286566992 (LWP 27900)): #0 0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x080a3a82 in ?? () #2 0x087244cc in ?? () #3 0x080f9758 in stderr () #4 0xb3508a18 in ?? () #5 0x080a3e85 in ?? () #6 0x in ?? () Thread 15 (Thread -1297056848 (LWP 27901)): #0 0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0806fe71 in ?? () #2 0xb490bba8 in ?? () #3 0xb490bb60 in ?? () #4 0xb2b07428 in ?? () #5 0x08085a21 in ?? () #6 0xaabb1b90 in ?? () #7 0x0002 in ?? () #8 0xb490bb08 in ?? () #9 0xb490bb08 in ?? () #10 0xb490bb60 in ?? () #11 0x03ead882 in ?? () #12 0xb2b073f8 in ?? () #13 0x0805ea56 in ?? () #14 0xb490bb5c in ?? () #15 0x0002 in ?? () #16 0xaabb1b98 in ?? () #17 0x00449ed5 in pthread_getspecific () from /lib/tls/libpthread.so.0 Thread 14 (Thread -1307546704 (LWP 27902)): #0 0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0806fe71 in ?? () #2 0xb490bba8 in ?? () #3 0xb490bb60 in ?? () #4 0xb2106428 in ?? () #5 0x08085a21 in ?? () #6 0x08e01618 in ?? () #7 0x in ?? () Thread 13 (Thread -1318036560 (LWP 27903)): #0 0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0806fe71 in ?? () #2 0xb490bba8 in ?? () #3 0xb490bb60 in ?? () #4 0xb1705428 in ?? () #5 0x08085a21 in ?? () #6 0xaac696f8 in ?? () #7 0x0002 in ?? () #8 0xb490bb08 in ?? () #9 0xb490bb08 in ?? () #10 0xb490bb60 in ?? () #11 0x06ead882 in ?? () #12 0xb17053f8 in ?? () #13 0x0805ea56 in ?? () #14 0xb490bb5c in ?? () #15 0x0002 in ?? () #16 0xaac69700 in ?? () #17 0x00449ed5 in pthread_getspecific () from /lib/tls/libpthread.so.0 Thread 12 (Thread -1328526416 (LWP 27904)): #0 0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0806fe71 in ?? () #2 0xb490bba8 in ?? () #3 0xb490bb60 in ?? () #4 0xb0d04428 in ?? () #5 0x08085a21 in ?? () #6 0x087335e8 in ?? () #7 0x0002 in ?? () #8 0xb490bb08 in ?? () #9 0xb490bb08 in ?? () #10 0xb490bb60 in ?? () #11 0x2bebd882 in ?? () #12 0xb0d043f8 in ?? () #13 0x0805ea56 in ?? () #14 0xb490bb5c in ?? () #15 0x0002 in ?? () #16 0x087335f0 in ?? () #17 0x00449ed5 in pthread_getspecific () from /lib/tls/libpthread.so.0 Thread 11 (Thread -1339016272 (LWP 27905)): #0 0x0044bf5e in recvmsg () from /lib/tls/libpthread.so.0 #1 0x080b1a8f in ?? () #2 0x0005 in ?? () #3 0xb03039d0 in ?? () #4 0x in ?? () Thread 10 (Thread -1349506128 (LWP 27906)): #0 0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0806fe71 in ?? () #2 0xb490bba8 in ?? () #3 0xb490bb60 in ?? () #4 0xaf902428 in ?? ()
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
Matthew Cocker wrote: Tonight we had 10 of our afs fielserver lockup. I had upgraded so to 1.4.4 but they dies as well. All run on redhat 3 up6. Only one process shows in ps listing and gcores on this process seem to give nothing. A pstack dump is below. Is it any good. This is now a real disaster and very weird. I have other fileserver that are setup identically which are not dying. The only difference is that these are on a different subnet and a different server room. Thread 22 (Thread -1218524240 (LWP 27894)): #0 0x0044cc84 in sigwait () from /lib/tls/libpthread.so.0 #1 0x08073a32 in ?? () #2 0xb75ec9f0 in ?? () #3 0xb75ec96c in ?? () #4 0x in ?? () You will need to rebuild the server binaries without stripping the debug info in order for the stack data to be truly useful. The fact that your servers on another subnet are having no issues makes me wonder if there are networking issues involved. Perhaps a mis-configured router or firewall. Of course with so little data made available to us it would be hard to point you in the right direction. We don't know the states of the servers. We don't know the states of the clients. We haven't seen any of the log data. smime.p7s Description: S/MIME Cryptographic Signature
Re: [OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections
use gdb's generate-core-file, or gcore, or pstack if you have it, and get a backtrace. On 7/27/07, Matthew Cocker [EMAIL PROTECTED] wrote: Hi We are running about 20 redhat AS3 based openafs 1.4.2 fileservers. For the last three days between 4pm-10pm we have been getting 4-6 fileserver stop serving files with nagios monitoring warning of 200 blocked connections. I have turned on debug for the fileserver prcoess and have a log file but nothing seemed bad to me (not that I would know). The servers are basically idle during these distruptions with CPU or disk showing very low usage but we have to be restarted to get access to files back. We added the -L flag to the fileserver process today to see if this helps but we are wondering if we can do anything else to find the cause and/or prevent these disruptions. We have checked and there are no admin scripts running at these times. BTW It would not be so bad if the client would fail over to other readonly volumes but it does not seem to. The fileservers effected seem to have the user root readonly volume on them but when the servers go into this state all client that have this server as the highest in the prioirity list just lock up and need to be restarted. Also despite having 10 readonly volumes to pcik form the clients tend to hit only a couple. Cheers Matt