Re: imapd's hang when maxchild count is reached
On Thursday 06 February 2003 12:10 pm, Henrique de Moraes Holschuh wrote: > On Thu, 06 Feb 2003, Dave McMurtrie wrote: > > Would this actually work anyway? If the parent were to pass a file > > You can send descriptors over sockets if your unix kernel supports it. > Linux does, and apparently so does Solaris. > > Anyway, I dislike the idea of losing preforks heavily, it is bound to be a > major pain when the system is overloaded. The send descriptors idea would also allow a prefork. I would also agree that the one connection per child idea is out the door. One other idea, could it be portable enought to say, initialize a semaphore and set the resource count on it to the maximum number of connections. If a child accepts a connection, it will first have to gain the semaphore resource first. If it fails (all the semaphore resources are taken), then the child would know that the maximum number of connections has been exceeded and return an error to the connecting party. I did read an old Bugzilla entry that said fully threading imapd and friends was an idea... Don't know if that has any affect on any future direction. I haven't worked with semaphores for a long while so I could be talking outta my arse :). Jeremy
Re: imapd's hang when maxchild count is reached
On Thu, 6 Feb 2003, Igor Brezac wrote: > Not that simple. ;) Check out http://www.kohala.com/start/apue.html. Got it. I was just reading about stream pipes elsewhere and saw there was an ioctl() to handle passing file descriptors through them. Looks like there are other ways, as well. Thanks for the info. Dave -- Dave McMurtrie, Systems Programmer University of Pittsburgh Computing Services and Systems Development, Development Services -- UNIX and VMS Services 717P Cathedral of Learning (412)-624-6413
Re: imapd's hang when maxchild count is reached
On Thu, 6 Feb 2003, Dave McMurtrie wrote: > On Thu, 6 Feb 2003, Henrique de Moraes Holschuh wrote: > > > On Wed, 05 Feb 2003, Igor Brezac wrote: > > > > descriptors down to a child via a unix domain socket using sendmsg() or > > > > recvmsg(). In this case the master accepts the connection, passes the > > > > descriptor to a child via sendmsg(), closes the socket (the child should now > > > > be servicing it), and goes back to listening. > > > > > > This is not very portable. ;( > > > > Would it work on Linux and Solaris? If the answer is yes to both, then I > > would vote for adding that suport conditional to a configure.in check. > > > > I guess if it can be done on Linux, the BSDs can almost certainly do it as > > well. > > Would this actually work anyway? If the parent were to pass a file > descriptor (by putmsg or any other means) to the child, isn't he really > just passing an integer value? In other words, the integer value 5 in > process B is not the same thing as file descriptor 5 in process A. > Not that simple. ;) Check out http://www.kohala.com/start/apue.html. > When the parent initally forks the child, (if close on exec isn't set) the > child and parent will have the same file descriptors available to them. > Wouldn't any fd's opened in the parent after the child exec()s be unique > to the running instance of the parent process? > > Just asking -- I could be wrong & it wouldn't be the first time. master and spawned daemons will have to have some kind of communication channel (unix domain sockets or named pipes) in order for this to work. apache uses passing file-descriptor model. -- Igor
Re: imapd's hang when maxchild count is reached
On Thu, 06 Feb 2003, Dave McMurtrie wrote: > Would this actually work anyway? If the parent were to pass a file You can send descriptors over sockets if your unix kernel supports it. Linux does, and apparently so does Solaris. Anyway, I dislike the idea of losing preforks heavily, it is bound to be a major pain when the system is overloaded. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh
Re: imapd's hang when maxchild count is reached
On Thu, 6 Feb 2003, Henrique de Moraes Holschuh wrote: > On Wed, 05 Feb 2003, Igor Brezac wrote: > > > descriptors down to a child via a unix domain socket using sendmsg() or > > > recvmsg(). In this case the master accepts the connection, passes the > > > descriptor to a child via sendmsg(), closes the socket (the child should now > > > be servicing it), and goes back to listening. > > > > This is not very portable. ;( > > Would it work on Linux and Solaris? If the answer is yes to both, then I > would vote for adding that suport conditional to a configure.in check. > > I guess if it can be done on Linux, the BSDs can almost certainly do it as > well. Would this actually work anyway? If the parent were to pass a file descriptor (by putmsg or any other means) to the child, isn't he really just passing an integer value? In other words, the integer value 5 in process B is not the same thing as file descriptor 5 in process A. When the parent initally forks the child, (if close on exec isn't set) the child and parent will have the same file descriptors available to them. Wouldn't any fd's opened in the parent after the child exec()s be unique to the running instance of the parent process? Just asking -- I could be wrong & it wouldn't be the first time. Dave -- Dave McMurtrie, Systems Programmer University of Pittsburgh Computing Services and Systems Development, Development Services -- UNIX and VMS Services 717P Cathedral of Learning (412)-624-6413
Re: imapd's hang when maxchild count is reached
On Thu, 6 Feb 2003, Henrique de Moraes Holschuh wrote: > On Wed, 05 Feb 2003, Igor Brezac wrote: > > > descriptors down to a child via a unix domain socket using sendmsg() or > > > recvmsg(). In this case the master accepts the connection, passes the > > > descriptor to a child via sendmsg(), closes the socket (the child should now > > > be servicing it), and goes back to listening. > > > > This is not very portable. ;( > > Would it work on Linux and Solaris? If the answer is yes to both, then I > would vote for adding that suport conditional to a configure.in check. > > I guess if it can be done on Linux, the BSDs can almost certainly do it as > well. > It'll work on all SVR4 (Solaris, DGUX, HPUX, etc.) based OSs and Linux. I am not sure about BSD. I know SCO will not work, at least earlier versions will not work. -- Igor
Re: imapd's hang when maxchild count is reached
On Wed, 05 Feb 2003, Igor Brezac wrote: > > descriptors down to a child via a unix domain socket using sendmsg() or > > recvmsg(). In this case the master accepts the connection, passes the > > descriptor to a child via sendmsg(), closes the socket (the child should now > > be servicing it), and goes back to listening. > > This is not very portable. ;( Would it work on Linux and Solaris? If the answer is yes to both, then I would vote for adding that suport conditional to a configure.in check. I guess if it can be done on Linux, the BSDs can almost certainly do it as well. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh
Re: imapd's hang when maxchild count is reached
On Wed, 5 Feb 2003, Jeremy Rumpf wrote: > > > Hmmm... what does Sendmail do? It's got lots of children, but still > > manages to refuse connections when it gets busy (RefuseLA)... I kinda > > like that behavior. I definitely like it better than keeping more and > > more sockets open. > > > > -- > > Stephen L. Ulmer [EMAIL PROTECTED] > > Senior Systems Programmer http://www.ulmer.org/ > > Northeast Regional Data Center VOX: (352) 392-2061 > > University of Florida FAX: (352) 392-9440 > > It may not prefork it's processes. The master process could accept the > connection, fork, closes the socket (the child is now servicing it), and go > back into a listen state. Therefore the master process can choose to reject > connections without any coordination from the children. Children then have a > service life of one connection and that's it. > > They could also use a technique where a master process can pass file (socket) > descriptors down to a child via a unix domain socket using sendmsg() or > recvmsg(). In this case the master accepts the connection, passes the > descriptor to a child via sendmsg(), closes the socket (the child should now > be servicing it), and goes back to listening. > This is not very portable. ;( > Either way, in the above, the master process is the only process that actually > accepts the connections. > > I'm not sure how sendmail actually does it though, the above is purely > speculation... > > Jeremy > -- Igor
Re: imapd's hang when maxchild count is reached
> Hmmm... what does Sendmail do? It's got lots of children, but still > manages to refuse connections when it gets busy (RefuseLA)... I kinda > like that behavior. I definitely like it better than keeping more and > more sockets open. > > -- > Stephen L. Ulmer [EMAIL PROTECTED] > Senior Systems Programmer http://www.ulmer.org/ > Northeast Regional Data Center VOX: (352) 392-2061 > University of Florida FAX: (352) 392-9440 It may not prefork it's processes. The master process could accept the connection, fork, closes the socket (the child is now servicing it), and go back into a listen state. Therefore the master process can choose to reject connections without any coordination from the children. Children then have a service life of one connection and that's it. They could also use a technique where a master process can pass file (socket) descriptors down to a child via a unix domain socket using sendmsg() or recvmsg(). In this case the master accepts the connection, passes the descriptor to a child via sendmsg(), closes the socket (the child should now be servicing it), and goes back to listening. Either way, in the above, the master process is the only process that actually accepts the connections. I'm not sure how sendmail actually does it though, the above is purely speculation... Jeremy
Re: imapd's hang when maxchild count is reached
From: "Stephen L. Ulmer" <[EMAIL PROTECTED]> Date: Wed, 05 Feb 2003 16:57:15 -0500 [...] Hmmm... what does Sendmail do? It's got lots of children, but still manages to refuse connections when it gets busy (RefuseLA)... I kinda like that behavior. I definitely like it better than keeping more and more sockets open. Sendmail only accepts new connections in the parent process. When the LA goes over RefuseLA, it closes the socket. In order to reopen the socket, the parent always has root privs. (The cyrus master process changes to "cyrus" from "root" very early on.) So Sendmail's scheme doesn't work for us. I took a look at a random version of Postfix, and I'm pretty sure it does what we do now (build up the system listen queue). I also agree that it's more user friendly to quickly deny service. I contemplated having a special child to deny service (something like what Ohio has implemented) but it seems tricky. (Imagine the # of processes bouncing right next to the limit.) I definitely would like to avoid forking for every new connection once we're over the limit, since that will just strain the system after we're already heavily loaded. Larry
Re: imapd's hang when maxchild count is reached
-- "Stephen L. Ulmer" <[EMAIL PROTECTED]> is rumored to have mumbled on Mittwoch, 5. Februar 2003 16:57 Uhr -0500 regarding Re: imapd's hang when maxchild count is reached: Hmmm... what does Sendmail do? It's got lots of children, but still manages to refuse connections when it gets busy (RefuseLA)... I kinda like that behavior. Yes, but there are potential problems with it. I'm currently in discussion with Claus Assmann and the Red Hat support. 8.12.7 under Advanced Server has in several cases apparently lost signals in this situation. As a consequence sendmail permanently stopped accepting connections after delaying them ... -- Sebastian Hagedorn M.A. - RZKR-R1 (Flachbau), Zi. 18, Robert-Koch-Str. 10 Zentrum für angewandte Informatik - Universitätsweiter Service RRZK Universität zu Köln / Cologne University - Tel. +49-221-478-5587 msg10914/pgp0.pgp Description: PGP signature
Re: imapd's hang when maxchild count is reached
"leg+" == Lawrence Greenfield <[EMAIL PROTECTED]> writes: leg+> Yes, that would be desirable. The easiest way of doing this leg+> would be to close the socket used to accept() new leg+> connections. However, it's open in all of the children, so leg+> closing it is infeasible. leg+> The next option would be to have master accept() and then leg+> immediately close() the connection. This raises interesting leg+> blocking concerns and is thus somewhat harder to implement. leg+> Thus the current solution seems acceptable. Hmmm... what does Sendmail do? It's got lots of children, but still manages to refuse connections when it gets busy (RefuseLA)... I kinda like that behavior. I definitely like it better than keeping more and more sockets open. -- Stephen L. Ulmer [EMAIL PROTECTED] Senior Systems Programmer http://www.ulmer.org/ Northeast Regional Data Center VOX: (352) 392-2061 University of Florida FAX: (352) 392-9440
Re: imapd's hang when maxchild count is reached
On Wed, 05 Feb 2003 15:21:26 -0500 Scott Adkins <[EMAIL PROTECTED]> wrote: > I solved this problem a long time ago by passing an environment variable > from the master process to the child process when the child process is > spawned indicating that the server is full. I used CYRUS_MAXCHILD, and > the child process already checks for the CYRUS_VERBOSE variable when it > starts in order to properly set the debugging level. If that variable > was set, then the child would output an error message indicating that the > server was full and to try again later. It would then close the client > connection and then exit. > > A couple things to note: > > 1) I prefer the client to be notified when the server is full and not > simply get connection refused messages or have the email client just > sit there, appearing to hang, while the server waits for a connection > to become free. > > 2) My method worked, but it didn't take advantage of the process reuse > feature. Basically, the master process only gets one chance to pass > an environment variable off to the child process. So, once that > variable is set in the child, the child will always believe that the > max has been reached. That is why I had the child process go away, > as it is basically useless after handling that one connection. > > 3) If there was a good way for the master to notify the child process > on each connection pass (either when passing the connection to an > already available child, or when passing the connectioin to a newly > spawn child) what the current status of maxchild is, then it would > be quite efficient to send the server full messages, close the > connection and wait for the master process to hand it another. I > don't know enough about how the master and child process communicate > to know how to make this work. > > Scott Now that you're discussing the mechanisms to refuse new connections politely because of some condition, i would like to suggest another condition to check: system loadavg. Some programs (sendmail for example) know how to reject connections if loadavg is >= some admin defined value. -- Jure Pecar
Re: imapd's hang when maxchild count is reached
I solved this problem a long time ago by passing an environment variable from the master process to the child process when the child process is spawned indicating that the server is full. I used CYRUS_MAXCHILD, and the child process already checks for the CYRUS_VERBOSE variable when it starts in order to properly set the debugging level. If that variable was set, then the child would output an error message indicating that the server was full and to try again later. It would then close the client connection and then exit. A couple things to note: 1) I prefer the client to be notified when the server is full and not simply get connection refused messages or have the email client just sit there, appearing to hang, while the server waits for a connection to become free. 2) My method worked, but it didn't take advantage of the process reuse feature. Basically, the master process only gets one chance to pass an environment variable off to the child process. So, once that variable is set in the child, the child will always believe that the max has been reached. That is why I had the child process go away, as it is basically useless after handling that one connection. 3) If there was a good way for the master to notify the child process on each connection pass (either when passing the connection to an already available child, or when passing the connectioin to a newly spawn child) what the current status of maxchild is, then it would be quite efficient to send the server full messages, close the connection and wait for the master process to hand it another. I don't know enough about how the master and child process communicate to know how to make this work. Scott --On Wednesday, February 05, 2003 12:57 PM -0500 Lawrence Greenfield <[EMAIL PROTECTED]> wrote: Date: Wed, 05 Feb 2003 18:51:35 +0100 From: Sebastian Hagedorn <[EMAIL PROTECTED]> [...] Wouldn't it be possible (and better) to refuse further connections instead of having to wait for them to time out? Maybe I haven't thought this through properly, but it seems to me as if that were cleaner. Yes, that would be desirable. The easiest way of doing this would be to close the socket used to accept() new connections. However, it's open in all of the children, so closing it is infeasible. The next option would be to have master accept() and then immediately close() the connection. This raises interesting blocking concerns and is thus somewhat harder to implement. Thus the current solution seems acceptable. Larry -- +---+ Scott W. Adkinshttp://www.cns.ohiou.edu/~sadkins/ UNIX Systems Engineer mailto:[EMAIL PROTECTED] ICQ 7626282 Work (740)593-9478 Fax (740)593-1944 +---+ PGP Public Key available at http://www.cns.ohiou.edu/~sadkins/pgp/ msg10910/pgp0.pgp Description: PGP signature
Re: imapd's hang when maxchild count is reached
Date: Wed, 05 Feb 2003 18:51:35 +0100 From: Sebastian Hagedorn <[EMAIL PROTECTED]> [...] Wouldn't it be possible (and better) to refuse further connections instead of having to wait for them to time out? Maybe I haven't thought this through properly, but it seems to me as if that were cleaner. Yes, that would be desirable. The easiest way of doing this would be to close the socket used to accept() new connections. However, it's open in all of the children, so closing it is infeasible. The next option would be to have master accept() and then immediately close() the connection. This raises interesting blocking concerns and is thus somewhat harder to implement. Thus the current solution seems acceptable. Larry
Re: imapd's hang when maxchild count is reached
--On Wednesday, February 05, 2003 12:22:06 -0500 Lawrence Greenfield <[EMAIL PROTECTED]> wrote: They seem to be the same for all of the processes ... This is a totally normal backtrace for "waiting for more input from the client". Are you sure that your perl script is working correctly? You are absolutely right, the script is to blame. Sorry. However, there is one thing that's bothering me: when the max number of children has been reached, attempts to connect lead to this: [root@lvr1 root]# telnet cyrus 143 Trying 134.95.19.46... Connected to cyrus. Escape character is '^]'. Wouldn't it be possible (and better) to refuse further connections instead of having to wait for them to time out? Maybe I haven't thought this through properly, but it seems to me as if that were cleaner. Thanks, Sebastian -- Sebastian Hagedorn M.A. - RZKR-R1 (Gebäude 52), Zi. 18, Robert-Koch-Str. 10 Zentrum für angewandte Informatik - Universitätsweiter Service RRZK Universität zu Köln / Cologne University - Tel. +49-221-478-5587 msg10897/pgp0.pgp Description: PGP signature
Re: imapd's hang when maxchild count is reached
Date: Wed, 05 Feb 2003 14:04:38 +0100 From: Sebastian Hagedorn <[EMAIL PROTECTED]> [...] 0x402e3bee in __select () from /lib/i686/libc.so.6 (gdb) bt #0 0x402e3bee in __select () from /lib/i686/libc.so.6 #1 0x0811a994 in __DTOR_END__ () #2 0x0808410c in getword () #3 0x0804f37e in cmdloop () #4 0x0804efb2 in service_main () #5 0x0804d6a1 in main () #6 0x40218687 in __libc_start_main (main=0x804cd60 , argc=1, ubp_av=0xbffecdf4, init=0x804b9e4 <_init>, fini=0x8098120 <_fini>, rtld_fini=0x4000dcd4 <_dl_fini>, stack_end=0xbffecdec) at ../sysdeps/generic/libc-start.c:129 They seem to be the same for all of the processes ... This is a totally normal backtrace for "waiting for more input from the client". Are you sure that your perl script is working correctly? Larry
Re: imapd's hang when maxchild count is reached
--On Saturday, February 01, 2003 16:34:51 -0500 Lawrence Greenfield <[EMAIL PROTECTED]> wrote: Probably grabbing the "strace" and a gdb backtrace of a "hung" imapd process would help figure out what they're waiting for. Might as well do master, too. OK, I just recreated the situation. The imapd's all hang in a select()-call: [root@lvr1 cyrus]# strace -p 19411 select(1, [0], NULL, NULL, {1729, 93} [root@lvr1 cyrus]# strace -p 19417 select(1, [0], NULL, NULL, {1716, 43} gdb backtraces look like this: 0x402e3bee in __select () from /lib/i686/libc.so.6 (gdb) bt #0 0x402e3bee in __select () from /lib/i686/libc.so.6 #1 0x0811a994 in __DTOR_END__ () #2 0x0808410c in getword () #3 0x0804f37e in cmdloop () #4 0x0804efb2 in service_main () #5 0x0804d6a1 in main () #6 0x40218687 in __libc_start_main (main=0x804cd60 , argc=1, ubp_av=0xbffecdf4, init=0x804b9e4 <_init>, fini=0x8098120 <_fini>, rtld_fini=0x4000dcd4 <_dl_fini>, stack_end=0xbffecdec) at ../sysdeps/generic/libc-start.c:129 They seem to be the same for all of the processes ... This is still cyrus-imapd 2.1.11. master still responds to POP requests, so I don't think there's anything wrong with it. Does this help in any way? -- Sebastian Hagedorn M.A. - RZKR-R1 (Gebäude 52), Zi. 18, Robert-Koch-Str. 10 Zentrum für angewandte Informatik - Universitätsweiter Service RRZK Universität zu Köln / Cologne University - Tel. +49-221-478-5587 msg10887/pgp0.pgp Description: PGP signature
Re: imapd's hang when maxchild count is reached
-- Lawrence Greenfield <[EMAIL PROTECTED]> is rumored to have mumbled on Samstag, 1. Februar 2003 16:34 Uhr -0500 regarding Re: imapd's hang when maxchild count is reached: Date: Fri, 31 Jan 2003 23:25:29 +0100 From: Sebastian Hagedorn <[EMAIL PROTECTED]> [...] When the number of impad processes reaches 200, no more processes are spawned, just as it should be. However, sometimes, not immediately, but definitely after a while *all* imapd processes will hang if we try to open more connections to port 143. This is 100% reproducible. If we kill one of the scripts and the number of processes goes down, all the imapd's get unstuck, but not until that happens. What do you mean by "hang"? Do they actually stop answering their current IMAP commands? Do they stop answering new connections when their current one goes away? Sorry, I should've been more precise. I mean the former: *all* existing imapd processes stop functioning, i.e. they don't respond to commands anymore. I verified that at this point it is still possible to open other types of connections, e.g. POP. So master seems to be principally functional. I don't see any particular reason for (either) phenomenon making a quick gaze at the code. Probably grabbing the "strace" and a gdb backtrace of a "hung" imapd process would help figure out what they're waiting for. Might as well do master, too. We straced master, which stayed in a select() call. I'll do the other things you recommend next week. Thanks, Sebastian -- Sebastian Hagedorn M.A. - RZKR-R1 (Flachbau), Zi. 18, Robert-Koch-Str. 10 Zentrum für angewandte Informatik - Universitätsweiter Service RRZK Universität zu Köln / Cologne University - Tel. +49-221-478-5587 msg10791/pgp0.pgp Description: PGP signature
Re: imapd's hang when maxchild count is reached
Date: Fri, 31 Jan 2003 23:25:29 +0100 From: Sebastian Hagedorn <[EMAIL PROTECTED]> [...] When the number of impad processes reaches 200, no more processes are spawned, just as it should be. However, sometimes, not immediately, but definitely after a while *all* imapd processes will hang if we try to open more connections to port 143. This is 100% reproducible. If we kill one of the scripts and the number of processes goes down, all the imapd's get unstuck, but not until that happens. What do you mean by "hang"? Do they actually stop answering their current IMAP commands? Do they stop answering new connections when their current one goes away? I've checked the archives but haven't seen mention of this problem. Is this a known issue? Is there a workaround? I don't see any particular reason for (either) phenomenon making a quick gaze at the code. Probably grabbing the "strace" and a gdb backtrace of a "hung" imapd process would help figure out what they're waiting for. Might as well do master, too. Larry