FW: ajp13 (Tomcat 3.2.x) corrupting requests??
We are using Apache 1.3.26 and Tomcat 3.2.2. We've seen some clients to be hanging Apache on SSL read. The cause of this hang seems to be network/TCP/IP issue. The read will eventually times out. When it times out, here's where the problem starts. Tomcat seems to be grabbing some buffer that contains partial data of the previous request and call a servlet with it. The servlet will throw an exception because the data is bad. Here are the details (you will need to pull up the code to see what I mean). When Apache read times out, mod_jk's jk_ajp13_worker.c::read_into_msg_buff() will return error. mod_jk will drop its connection to Tomcat and retry (up to JK_RETRIES times - see the service() function). In the retry, the HTTP header gets resent to Tomcat. When Tomcat receives this header, it will eventually call Ajp13ConnectorRequest::decodeRequest(). This method will attempt to read more data from mod_jk (the call to con.receive()). This call will fail because mod_jk has yet again drop the connection (in its 2nd retry to call read_into_msg_buff() fails). Since this call (con.receive()) fails, the attribute blen is never set to the content-length of this request. Instead, blen has the value of the content-length of the previous request. Because the call to con.receive() fails, the method decodeRequest() will return -1. The caller to decodeRequest, Ajp13ConnectionHandler::processConnection(), doesn't check the error code and it proceeds to call the servlet. The servlet will issue a read and it will eventually call Ajp13ConnectorRequest::doRead(). In doRead(), I dumped some values of the variables. pos is always 0. blen is some number. The problem happens when blen is less than the length of the buffer (variable len). The method will copy some data from bodyBuff, which is a left-over from previous request. Note, I checked the source code for Tomcat 3.2.4. It doesn't seem to be fixed yet. We may not have the luxury to upgrade to Tomcat 3.3 yet. So I am attempting to fix this, but I have a question: Why does Ajp13ConnectionHandler::processConnection() not check the error code from decodeRequest()? It seems to me like if it gets an error in decodeRequest(), it should not call the service() method which calls the servlet. So the fix I'm proposing is to add these lines (see diff below). I've tested this minimally and seemed to fix the problem I saw. I will need to do more extensive testing, but please let me know what you think. Do you see any problems with doing this? thanks, shinta Index: Ajp13ConnectionHandler.java === RCS file: /home/cvspublic/jakarta-tomcat/src/share/org/apache/tomcat/service/conne ctor/Attic/Ajp13ConnectionHandler.java,v retrieving revision 1.4.2.1 diff -c -b -w -r1.4.2.1 Ajp13ConnectionHandler.java *** Ajp13ConnectionHandler.java 12 Dec 2000 09:41:43 - 1.4.2.1 --- Ajp13ConnectionHandler.java 19 Jul 2002 15:58:47 - *** *** 157,162 --- 157,166 case JK_AJP13_FORWARD_REQUEST: err = req.decodeRequest(msg); + if ( err0 ) { + moreRequests=false; + break; + } contextM.service(req, res); req.recycle(); -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: FW: problem w/ ajp13 - if Tomcat is shutdown
Title: RE: FW: problem w/ ajp13 - if Tomcat is shutdown Henri, Dan, There are some discussions below about the usage of msg/rmsg. What did you guys finally decide on that? Just reuse msg? I'm seeing the problem Dan mentioned on message larger than the ajp13 buffer (8K). thanks, shinta -Original Message- From: GOMEZ Henri [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 13, 2001 4:49 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: RE: FW: problem w/ ajp13 - if Tomcat is shutdown 1) First off, I strongly believe that this work (which I think is an excellent feature, BTW), belongs in the 3.3 branch, and *not* in 3.2. +1, there was a message about including latest mod_jk features/corrections back to Tomcat 3.2.x but TC 3.2 Release Manager and others clairly stated that TC 3.2.x must be just correction of bugs without any new features. I asked about back-porting mod_jk/ajp13 since now Apache 2.0 or upload works fine with mod_jk/ajp from TC 3.3 branches but are not in TC 3.2. The decision was not to provide corrections to TC 3.2. Although it feels like a bug that we need to restart Apache every time we restart TC, it's documented behavior, and I consider fixing it more of a feature enhancement. The scope of work you're doing here is considerable, and the code you're modifying is complex (and involves lots of different possible situations). I would not feel comfortable committing this to 3.2 unless it had seen a *lot* of testing in the 3.3 branch. (Am I incorrect in remembering that there has been discussion of putting this into 3.2?) Right many works, some testing but back-port could introduce bugs in TC3.2. The decision was to not port mod_jk back to tc 3.2. But people could still build mod_jk from Tomcat 3.3 and use against there Tomcat 3.2.x. That's why I milite for a jakarta sub-project (web-connector). These kind of connector work must outside Tomcat trees. (Votes ?) 2) Specific problems (line numbers by patch against the 3.3 code base) - msg/rmsg: Here is how I think you want the code to work: msg always holds the basic request information, so, in case of an error, we can resend the message. rmsg is used as the buffer for response information. Dan you're our resident hacker/expert on mod_jk native part This is a good idea, but there are a few problems -- first, and most importantly, if there is POST data, the msg buffer immediately gets reused to send that information to TC (ll. 596-600). In that case, on line 691, if you retry after a connection reset event, you'd be using a buffer filled with POST data instead of a buffer with header information. I'm not too confortable with msg/rmsg that's why I asked for your code revue (and it's excellent ;-) Possibly you could fix this by calling ajp13_marshal_into_msgb again when you retry -- I don't know how the state of the web_server_service object (s), changes over time, so I don't know for sure if this would work or not. But that's where I'd look next. Or you could pass rmsg into the sendrequest, and use that to send the POST data over. I dunno. Your use of msg/rmsg in getreply seems faulty to me. I don't think you need msg there at all, and I'm sure that you've got a problem with mixing the two of them. Specifically, on l. 636, you call rc = ajp13_process_callback(rmsg, p, s, l); But, then a few lines later, you do: else if(JK_AJP13_HAS_RESPONSE == rc) { rc = connection_tcp_send_message(p, msg, l); Right, we must drop rmsg use and do all stuff with msg. If you look at ajp13_process_callback, at the GET_BODY_CHUNK case, you'll see that it's reading request data into the buffer passed in as a param (which is what getreply is calling 'rmsg'). But then, when you call connection_tcp_send_message, you'll be sending over whatever was in msg, rather than what the web server has read from the browser. You can fix this by changing connection_tcp_send_message to use 'rmsg' (and then you don't need to pass msg into getreply at all). This problem will happen in case of a file upload, or whenever there is enough POST data to exceed a single ajp13 buffer (8K). Furthermore, in that case, I'm not sure if there's going to be any way to restart the connection intelligently. If the browser has already sent over 500 K of a 1M file upload when TC dies, I think we need to just kill that request, rather than trying to restart it. I don't think the request is recoverable. I'm not sure how to detect this in your code, but I think it needs to be thought about. Upload are a real problem since we just couldn't replay the upload in case of Tomcat failure. Solution could be to ask browser to resend but HOW (HTTP expert needed here) - The for loop on l. 689 makes me nervous. Could you rewrite that: int i; for(i = 0 ; i JK_RETRIES ; i++) That way
RE: FW: problem w/ ajp13 - if Tomcat is shutdown
Title: RE: FW: problem w/ ajp13 - if Tomcat is shutdown 1) First off, I strongly believe that this work (which I think is an excellent feature, BTW), belongs in the 3.3 branch, and *not* in 3.2. Although it feels like a bug that we need to restart Apache every time we restart TC, it's documented behavior, and I consider fixing it more of a feature enhancement. The scope of work you're doing here is considerable, and the code you're modifying is complex (and involves lots of different possible situations). I would not feel comfortable committing this to 3.2 unless it had seen a *lot* of testing in the 3.3 branch. (Am I incorrect in remembering that there has been discussion of putting this into 3.2?) Dan, sorry for not clarifying. I'm back porting the changes to my own company's CVS, not the Apache Tomcat's CVS. We have decided that we need this changes on our 3.2.1 Tomcat release but can't afford to release Tomcat 3.3 beta right now. So you can definitely check in the changes to 3.3 branch, and not 3.2.1. I think we've discussed that and that was agreed. thanks, shinta
RE: FW: problem w/ ajp13 - if Tomcat is shutdown
Title: RE: FW: problem w/ ajp13 - if Tomcat is shutdown Hi, Dan, I may not understand all of the issues here. But I really don't think we should close all connections when we detect one ECONNRESET. In my opinion, it is not necessary to close all connections, since the next ECONNRESET will close the proper dead socket, anyway. It's not needed to add all of those complexity. In the mean time, I have taken Henri's changes and back port it to 3.2.1 (because I need it on 3.2.1). Everything seems to work well. I've tested it in the normal scenarios (one Apache, one Tomcat) and in the load-balanced scenarios. In the load-balanced scenarios, when I restart TC worker 1, the code properly close the dead sockets and re-establish new ones to the same worker (TC worker 1). The good connections to TC worker 2 are untouched. They stay connected. I did notice something wierd. But this is un-related to the code edits. This happens with or without Henri's changes. When I restart TC worker 1, but shut down TC worker 2, requests that supposed to go to TC worker 2 (because they belong to the same session, thus the load balancer try to foward it to the same TC worker 2) took sometime to get forwarded to TC worker 1. This maybe another one of those improvements that can be done to the load balancer worker. Anyway, I'm pretty happy with Henri's changes. (Thanks Henri!). Henri, are you going to check in the changes? Let me know if I can do something else to help for this case. shinta -Original Message- From: Dan Milstein [mailto:[EMAIL PROTECTED]] Sent: Friday, March 09, 2001 3:08 PM To: [EMAIL PROTECTED] Subject: Re: FW: problem w/ ajp13 - if Tomcat is shutdown In terms of invalidating the cache: The jk_ajp13_worker objects hold onto a cache of endpoints -- ep_cache. It would be relatively simple to loop through this cache and close all the connections in case of ECONNRESET (you do have to call a macro to enter a critical section -- take a look at reuse_connection()). However, this cache only holds onto endpoints which are *not* being used. When an endpoint is checked out of the cache (by get_endpoint), or if the open socket descriptor is transfered to another endpoint (in reuse_connection), that connection is replaced by NULL in the cache. So if we shut down all the connections in the cache, we won't shut down the other connections which are handling requests at that moment. My only fear then is that, when those connections get their own ECONNRESET errors, they, too, will try to shutdown all the connections in the cache. If TC hasn't come back up yet, this won't be a problem, because there won't be any connections in the cache. But it does make me a bit nervous. Hope that's helpful... -Dan GOMEZ Henri wrote: La prise de conscience de votre propre ignorance est un grand pas vers la connaissance. -- Benjamin Disraeli -Original Message- From: Dan Milstein [mailto:[EMAIL PROTECTED]] Sent: Friday, March 09, 2001 6:34 AM To: [EMAIL PROTECTED] Subject: Re: FW: problem w/ ajp13 - if Tomcat is shutdown Henri, You say that checking errno isn't safe in a multithreaded env (which would certainly makes sense to me, since it looks like a global var). However, after searching online, and reading up in Programming Threads, by Kleiman, Shah and Smaalders, I find on p. 47: Each thread has its own independent version of the errno variable. This allows different threads to make system calls that may change the value of errno without interfering with each other. They are describing Posix threads. errno is actually a macro, apparently, which accesses the correct, thread-specific errno variable. Right, I checked in Linux errno.h for pthread Now, I am the first to admit that I don't understand all the weird intersections between threads and sockets in C, but this looks to me like checking errno against ECONNRESET may be fine. More generally when you got a read error on TCP/IP stream you could consider that the link to your server (tomcat) is broken : - no more route to tomcat (broken lan or routers) - server not working (tomcat was stopped or server restarted) Are there platforms where that's not true? I've no idea but we migth have problems in differents interpretation of platform. The nice thing about getting that ECONNRESET error, is it lets us go ahead and close out that connection, and try another one. Done. We could even close out a whole cache of connections, which would most likely be the right thing to do. Good idea, I'll find how to do that. If we loop/retry, than how do we know to close the connection? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED] -- Dan Milstein //
RE: [Bug 286] New - Tomcat mod_jk.so refuses to load with Apache 1.3.14 undefined symbol BugRat Report#532
I had the same problem earlier too. You just need to edit the build-unix.sh and add -DSOLARIS to the APXS call. My apxs doesn't have that by default. hope this helps shinta -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Friday, March 09, 2001 11:22 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: [Bug 286] New - Tomcat mod_jk.so refuses to load with Apache 1.3.14 undefined symbol BugRat Report#532 http://nagoya.apache.org/bugzilla/show_bug.cgi?id=286 *** shadow/286Fri Mar 9 21:21:33 2001 --- shadow/286.tmp.13587 Fri Mar 9 21:21:33 2001 *** *** 0 --- 1,26 + += ===+ + | Tomcat mod_jk.so refuses to load with Apache 1.3.14 undefined symbol BugRa | + +- ---+ + |Bug #: 286 Product: Tomcat 3 | + | Status: UNCONFIRMED Version: 3.2.1 Final | + | Resolution:Platform: All | + | Severity: Normal OS/Version: All | + | Priority: High Component: Connectors | + +- ---+ + | Assigned To: [EMAIL PROTECTED] | + | Reported By: [EMAIL PROTECTED] | + | CC list: Cc: | + +- ---+ + | URL: | + += ===+ + | DESCRIPTION | + I build Apache 1.3.14 from scratch on Solaris 2.8 using GCC + 2.95.2. I also built the mod_jk.so from scratch on the + same machine. Apache 1.3.14 refuses to load the + mod_jk.so module because of an undefined symbol fdatasync. + I found this in librt.so and libposix4.so on the machine. + + The offending call is in jk_util.c:112 + + + - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: Design Review for ajp13's changes: WAS problem w/ ajp13 - if Tomc at is shutdown
select() is supported? The reason I didn't do one for Win32 is because I hadn't been able to reproduce the problem on Win32, at least Windows 2000. They must handle socket differently. I hate to put unnecessary code, especially when I can't reproduce it, can't test it and can't make sure I do fix a problem. But I think select() is supported on Windows. I'll try to take a more thorough look (and do some testing). Do you have any comments as to which solution is better suited? Just to recap, my action items are: 1. Merge fixes to 3.3 branch 2. Test make sure no data is lost during retry. 3. Test some more on Windows. 4. If there're any changes because of the above action items, I will repost the changes. Thanks, y'all! shinta Thanks again, -Dan Shinta Tjio wrote: Attached are the unified diffs for the proposed changes. They are diffs against the 3.2.1 release code. I hope this is sufficient. I haven't got to use Solaris patch tool yet. These are tested on Solaris 2.8. Changes #1 is the one that's less platform specific, since I don't call any socket APIs. I will test these on Windows 2000 tomorrow. As of other UNIXes, we don't have those in house. So if someone can volunteer testing it on other UNIX flavors, that will be great! Unified diffs for the proposed changes #1: jk_ajp13_worker.c.1.diff mod_jk.c.1.diff Unified diffs for the proposed changes #2: jk_ajp13_worker.c.2.diff jk_connect.c.2.diff thanks so much! shinta -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Wednesday, March 07, 2001 6:57 PM To: '[EMAIL PROTECTED] ' Subject: Re: Design Review for ajp13's changes: WAS problem w/ ajp13 - if Tomc at is shutdown Hi Shinta, It's sounds like a solution to a real problem, please send a patch, I'm sure someone will read it. Dan and Henri are the best people to ask about this, I can also help a bit ( I've been using RPMs lately, it's too easy to get them and not worry about compile :-) My only sugestion/concern is that the code should work on both Windows and unix - or at least compile :-) Costin I would like to propose some changes to eliminate the requirement to restart Apache, when you restart Tomcat. I'm willing to give the code to anyone who needs it, when I'm done testing. But I need some help/suggestions so that I can put in the right code. If any of the proposed changes below should not exists ever, I'm open to other suggestions. This is my first time looking at mod_jk's ajp13 code. So any clue to make these better would be appreciated. Right now, if you use ajp13 and you restart Tomcat, you have to also restart Apache. See details in previous postings. For us, having to restart Apache is not a feasible solution in our customers' environment. After looking at the code, I have two possible solutions: 1. From mod_jk, I can detect that the socket has been closed by Tomcat. This is normally indicated by the recv() returning ECONNRESET. The recv() is called after the request has been sent to the socket. The send() unfortunately, doesn't give you an error. The proposed fixed is to check for errno ECONNRESET, then set the is_recoverable_error flag to TRUE, in the service() function in jk_ajp13_worker.c. I also add a code in mod_jk.c, to check for this flag, and call the service() method again if the flag is set TRUE. The 2nd time the service() method is called, it will reconnect to Tomcat like normal. 2. Another solution would be to put in a select() on the socket prior to send(), looking for the socket being read ready. Under normal condition, this select() should return nothing. But if Tomcat shuts down the socket, this select() should return the socket being read ready. When this happen, I issue a read() of 1 bytes. If the read() comes back with return code 0, this should be an indication that the socket was closed on the remote end. Then I will proceed to close the socket. The remaining logic already handles the reconnect, etc. I have both of these solution prototyped and minimally tested. They both Anyone care to comment which solution fits better with the overall code? Anyone voluteer to review the code? thanks, shinta -Original Message- From: Shinta Tjio To: [EMAIL PROTECTED] Cc: 'Dan Milstein' Sent: 3/6/01 7:01 PM Subject: RE: FW: problem w/ ajp13 - if Tomcat is shutdown I am using Tomcat 3.2.1, Apache 1.3.14, running on Solaris 2.8, Sun machines. After various attempts of debugging this, I have more information. 1. Even though I'm setting the worker's property cache_size to default (1), I'm finding there are up to 6
RE: FW: problem w/ ajp13 - if Tomcat is shutdown
Title: RE: FW: problem w/ ajp13 - if Tomcat is shutdown I revue your patch and I notice : We could use select to determine if something happen to the connection, typically readfds will be set if something is to read. With ajp13 protocol where everything is consumed after reading the previous tomcat reply, select will set read when the connection is closed (in TCP a closed tcp session is really a message send from closer). But : - Win32 platform are not handled with the select; This is because I haven't been able to reproduce this on Windows 2000. I admit I didn't try very hard last time. So I will try it again. - there is a check of errno which couldn't be safe in multi-threaded env like Apache 2.0 Any suggestions on how to do it better? The reason I check for ECONNRESET is I want to do this only if the error was caused by previously shut TC. Any other recv() error should return Internal Server Error, because they may not be recoverable. - if you remove your ethernet cable you may never see anything in select readdfs before TCP/IP stack timeout. well, I wasn't trying to handle that condition, though. The select() should still be harmless. - Why loose cpu cycle to check if the communication socket is available before each request to be sent. I'll be to use an exception mecanism, so - send request - read reply - if reply read fail, retry to send request, if it fail = ERR 500 Okay so this means, you would prefer my proposed solution #1? That was my inclination too. I'll send some modification to worker later
RE: Design Review for ajp13's changes: WAS problem w/ ajp13 - if Tomc at is shutdown
Title: RE: Design Review for ajp13's changes: WAS problem w/ ajp13 - if Tomc at is shutdown 3) For option (1), I have a few questions - Is there a way in which data could be lost? Specifically, as you state, the send() will return without error, and then it will only get the error on the following read(). Is all the data always preserved so that simply retrying will work correctly? I think most of that state is in the jk_ws_service_t object -- is it possible a read pointer will be advanced and data will be lost? This may be acceptable, but I'd like to understand it... A send to a closed or (in our case half-closed) socket will 95% of the time return a positive number (#bytes sent). It's sad but even if the IP stack known the socket was closed you'll know about it only at the next read/recv or select on readstate ! Checking send 0 is not a great help here. Yup. It's sad indeed. It has caused us some grieves. - You only retry once. If there are a number of connections open (from a single Apache process), isn't it possible that Tomcat has come back up, and that the next connection obtained (from the endpoint cache), will also be stale? Would it make sense in this case to trigger a shutdown of all the connections currently in the cache (and then retry once)? That would make sense if there were no other ways to get a ECONNRESET error. ECONNRESET MUST'NT be checked like this, mod_jk code run on Apache 1.3 AND 2.0 and this one is multithreaded. Is there a better way to do this? I want to handle only ECONNRESET, because that's recoverable. Other errors may not be recoverable and there's no point of retrying. But then again, we can just let it go to retry and fail there. - Or, more generally, just so I (and everyone) can understand, how does this new code deal with the following stages: 1) TC and Apache both up and running 2) TC is shutdown If mod_jk is in the middle of handling a request, what happens? There was an infinite loop in the 3.2.1 code, but that's been fixed in 3.2.2 and 3.3. Apache send datas (no error) and then wait reply with recv. There Apache got the error. We must restart the request sent at least one time. Little code to reorganize in ajp13_worker. Are you working on re-organizing the changes? Will you post the changes when you're done? 3) TC is shutdown, Apache is still up. While TC is down, requests come in. How are they handled? Are there any loops Apache gets stuck in? Apache will determine that TC is down and try another socket. But we must be carefull here with load-balancing configs. I'll try to test this with some load-balancing Tomcat. 4) TC starts back up. Now requests get handled smoothly again? Yes, but only if the socket were closed by Apache before. 4) For option (2): - If the user has Win32, you're just punting, correct? Why is that? I know nothing about Win32 socket programming, but I'm curious... You say you're testing on Win2k -- does Win2k support select(), but win32 doesn't? Does anyone know about how widely select() is supported? select is just to much time consuming to be used at EACH request. We must handle the potential error not loose to many cpu cycles when everything is fine. I'm -1 using select and errno. thanks, shinta
RE: FW: problem w/ ajp13 - if Tomcat is shutdown
Title: RE: FW: problem w/ ajp13 - if Tomcat is shutdown Okay, I basically agree with you. I'll take out the check for errno and just have recv() == -1 be considered a recoverable error (i.e: retry it). However, I disagree with making the retry in a loop for RETRIES times. This is because if one retry fails, this means this error condition may not be recoverable without any human interventions. What is the point of retrying more than once? My goal is not to wait for TC to come back up or to wait for TC to be in a good state. My goal is, if TC is in a good state already, why tell the caller that it's an error. Opinions? shinta -Original Message- From: GOMEZ Henri [mailto:[EMAIL PROTECTED]] Sent: Thursday, March 08, 2001 5:12 PM To: [EMAIL PROTECTED] Subject: RE: FW: problem w/ ajp13 - if Tomcat is shutdown Okay so this means, you would prefer my proposed solution #1? That was my inclination too. Proposed solution #1 without the errno check. My idea : get the service code in a loop for (i = 0; i RETRIES; i++) { if (send_request() 0) continue; if (read_reply() == 0) break; } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: FW: problem w/ ajp13 - if Tomcat is shutdown
Title: RE: FW: problem w/ ajp13 - if Tomcat is shutdown BTW we should redirect this to the tomcat-users list as I don't think its a development issue. Sorry about that. I don't see anything wrong, and the only drastic difference from my previous setup was that I had used the default 8007 and 8009. Now This is purely a guess, but I seem to remember reading that ports 1 were special somehow and required root access for some functions. What I would suggest is put those to the default, see if they work, then (I'm assuming you had a reason for ding this) find an acceptable port between 8000-. I think you get that backwards. Programs listening on port1024 will require root access. Anything larger than that, it doesn't require special priviledge. If that doesn't work then we have a couple more generic possiblities (such as running out of file handles which is very common with tomcat on solaris). I wish I still had access to a solaris box to try this.. The only other difference was that I was using Jdk 1.3 (Sun). Which I'd recommend for server side stuff unless you have a compelling reason not to. Its more stable, has less glitches and is faster. Jdk 1.2.x was suns wowee and jdk 1.3.x is a performance and stability release (IMHO).. If you haven't tweaked your file handles at any point then there is a good chance that is in the way, unfortunately I do not remember what the strings were to fix this, but I'm sure their documented somewhere (and they're not intuitive).. Thanks for the tips. We can't switch JDK for this release we're having. But next release, we are hoping to use JDK 1.3 shinta
Design Review for ajp13's changes: WAS problem w/ ajp13 - if Tomcat is shutdown
Title: Design Review for ajp13's changes: WAS problem w/ ajp13 - if Tomcat is shutdown Hi, all, I would like to propose some changes to eliminate the requirement to restart Apache, when you restart Tomcat. I'm willing to give the code to anyone who needs it, when I'm done testing. But I need some help/suggestions so that I can put in the right code. If any of the proposed changes below should not exists ever, I'm open to other suggestions. This is my first time looking at mod_jk's ajp13 code. So any clue to make these better would be appreciated. Right now, if you use ajp13 and you restart Tomcat, you have to also restart Apache. See details in previous postings. For us, having to restart Apache is not a feasible solution in our customers' environment. After looking at the code, I have two possible solutions: 1. From mod_jk, I can detect that the socket has been closed by Tomcat. This is normally indicated by the recv() returning ECONNRESET. The recv() is called after the request has been sent to the socket. The send() unfortunately, doesn't give you an error. The proposed fixed is to check for errno ECONNRESET, then set the is_recoverable_error flag to TRUE, in the service() function in jk_ajp13_worker.c. I also add a code in mod_jk.c, to check for this flag, and call the service() method again if the flag is set TRUE. The 2nd time the service() method is called, it will reconnect to Tomcat like normal. 2. Another solution would be to put in a select() on the socket prior to send(), looking for the socket being read ready. Under normal condition, this select() should return nothing. But if Tomcat shuts down the socket, this select() should return the socket being read ready. When this happen, I issue a read() of 1 bytes. If the read() comes back with return code 0, this should be an indication that the socket was closed on the remote end. Then I will proceed to close the socket. The remaining logic already handles the reconnect, etc. I have both of these solution prototyped and minimally tested. They both Anyone care to comment which solution fits better with the overall code? Anyone voluteer to review the code? thanks, shinta -Original Message- From: Shinta Tjio To: [EMAIL PROTECTED] Cc: 'Dan Milstein' Sent: 3/6/01 7:01 PM Subject: RE: FW: problem w/ ajp13 - if Tomcat is shutdown I am using Tomcat 3.2.1, Apache 1.3.14, running on Solaris 2.8, Sun machines. After various attempts of debugging this, I have more information. 1. Even though I'm setting the worker's property cache_size to default (1), I'm finding there are up to 6 connections opened from Apache to Tomcat. I deduce this by looking at the mod_jk.conf and by doing netstat. I finally find out, this is so because my Apache is set to spawn minimum of 6 children and each of those children are making separate connections to Tomcat. This is very bad because, I ended up having to reload 6 times before Tomcat starts serving me the page again. Each time it uses a different Apache children that has defunct socket. So the more Apache children I have, the longer it takes me to recover from this problem. 2. It seems when Tomcat dies restarts, the send() called by ajp13's jk_tcp_socket_sendfull() does not get an error. But the recv() does get an error, with errno ECONNRESET. After which, the socket is properly closed. 3. When I shutdown Tomcat, those sockets that were opened between Apache/Tomcat showed up in state CLOSE_WAIT, and FIN_WAIT2. I think this is normally solved by calling the shutdown() API after closing the socket. However, this would have to be done from the Tomcat side in Ajp13ConnectionHandler.java. I can't find the corresponding method of Socket in Java. So.. based on all of these, the only fix I can think of putting is to make mod_jk retry the send() if recv() comes back with an error ECONNRESET. The retry should happen after the old socket is properly closed. Anyone wants to comment? shinta -Original Message- From: Dan Milstein [ mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ] Sent: Tuesday, March 06, 2001 12:00 PM To: [EMAIL PROTECTED] Subject: Re: FW: problem w/ ajp13 - if Tomcat is shutdown What version of TC are you using? What version of Apache? I would look into the mod_jk docs -- I think this is the spec'd behavior (which, admittedly, is not great, but that makes it more of a feature request than a bug ;-). With ajp13, Apache opens up a persistent TCP/IP connection TC -- if TC restarts, I think that connection may just hang up and then timeout (since Apache doesn't know that TC has restarted). If anyone wants to work on this, you would have the undying thanks of the rest of the TC community -- having to restart Apache all the time bugs a *lot* of people. -Dan
RE: Design Review for ajp13's changes: WAS problem w/ ajp13 - if Tomc at is shutdown
Title: RE: Design Review for ajp13's changes: WAS problem w/ ajp13 - if Tomc at is shutdown Attached are the unified diffs for the proposed changes. They are diffs against the 3.2.1 release code. I hope this is sufficient. I haven't got to use Solaris patch tool yet. These are tested on Solaris 2.8. Changes #1 is the one that's less platform specific, since I don't call any socket APIs. I will test these on Windows 2000 tomorrow. As of other UNIXes, we don't have those in house. So if someone can volunteer testing it on other UNIX flavors, that will be great! Unified diffs for the proposed changes #1: jk_ajp13_worker.c.1.diff mod_jk.c.1.diff Unified diffs for the proposed changes #2: jk_ajp13_worker.c.2.diff jk_connect.c.2.diff thanks so much! shinta -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Wednesday, March 07, 2001 6:57 PM To: '[EMAIL PROTECTED] ' Subject: Re: Design Review for ajp13's changes: WAS problem w/ ajp13 - if Tomc at is shutdown Hi Shinta, It's sounds like a solution to a real problem, please send a patch, I'm sure someone will read it. Dan and Henri are the best people to ask about this, I can also help a bit ( I've been using RPMs lately, it's too easy to get them and not worry about compile :-) My only sugestion/concern is that the code should work on both Windows and unix - or at least compile :-) Costin I would like to propose some changes to eliminate the requirement to restart Apache, when you restart Tomcat. I'm willing to give the code to anyone who needs it, when I'm done testing. But I need some help/suggestions so that I can put in the right code. If any of the proposed changes below should not exists ever, I'm open to other suggestions. This is my first time looking at mod_jk's ajp13 code. So any clue to make these better would be appreciated. Right now, if you use ajp13 and you restart Tomcat, you have to also restart Apache. See details in previous postings. For us, having to restart Apache is not a feasible solution in our customers' environment. After looking at the code, I have two possible solutions: 1. From mod_jk, I can detect that the socket has been closed by Tomcat. This is normally indicated by the recv() returning ECONNRESET. The recv() is called after the request has been sent to the socket. The send() unfortunately, doesn't give you an error. The proposed fixed is to check for errno ECONNRESET, then set the is_recoverable_error flag to TRUE, in the service() function in jk_ajp13_worker.c. I also add a code in mod_jk.c, to check for this flag, and call the service() method again if the flag is set TRUE. The 2nd time the service() method is called, it will reconnect to Tomcat like normal. 2. Another solution would be to put in a select() on the socket prior to send(), looking for the socket being read ready. Under normal condition, this select() should return nothing. But if Tomcat shuts down the socket, this select() should return the socket being read ready. When this happen, I issue a read() of 1 bytes. If the read() comes back with return code 0, this should be an indication that the socket was closed on the remote end. Then I will proceed to close the socket. The remaining logic already handles the reconnect, etc. I have both of these solution prototyped and minimally tested. They both Anyone care to comment which solution fits better with the overall code? Anyone voluteer to review the code? thanks, shinta -Original Message- From: Shinta Tjio To: [EMAIL PROTECTED] Cc: 'Dan Milstein' Sent: 3/6/01 7:01 PM Subject: RE: FW: problem w/ ajp13 - if Tomcat is shutdown I am using Tomcat 3.2.1, Apache 1.3.14, running on Solaris 2.8, Sun machines. After various attempts of debugging this, I have more information. 1. Even though I'm setting the worker's property cache_size to default (1), I'm finding there are up to 6 connections opened from Apache to Tomcat. I deduce this by looking at the mod_jk.conf and by doing netstat. I finally find out, this is so because my Apache is set to spawn minimum of 6 children and each of those children are making separate connections to Tomcat. This is very bad because, I ended up having to reload 6 times before Tomcat starts serving me the page again. Each time it uses a different Apache children that has defunct socket. So the more Apache children I have, the longer it takes me to recover from this problem. 2. It seems when Tomcat dies restarts, the send() called by ajp13's jk_tcp_socket_sendfull() does not get an error. But the recv() does get an error, with errno ECONNRESET. After which, the socket is properly closed. 3. When I shutdown Tomcat, those sockets
FW: problem w/ ajp13 - if Tomcat is shutdown
Title: FW: problem w/ ajp13 - if Tomcat is shutdown I'm having problem with mod_jk if ajp13 is used. The problem is often reproduced when Tomcat is shut down without Apache being shut down. When a request is fired through Apache as soon as Tomcat starts, I often get Internal Server Error. The mod_jk.log will have the following: [jk_uri_worker_map.c (344)]: Into jk_uri_worker_map_t::map_uri_to_worker [jk_uri_worker_map.c (406)]: jk_uri_worker_map_t::map_uri_to_worker, Found a match ajp13 [jk_worker.c (123)]: Into wc_get_worker_for_name ajp13 [jk_worker.c (127)]: wc_get_worker_for_name, done found a worker [jk_ajp13_worker.c (651)]: Into jk_worker_t::get_endpoint [jk_ajp13_worker.c (536)]: Into jk_endpoint_t::service [jk_ajp13.c (346)]: Into ajp13_marshal_into_msgb [jk_ajp13.c (480)]: ajp13_marshal_into_msgb - Done [jk_ajp13_worker.c (203)]: connection_tcp_get_message: Error - jk_tcp_socket_recvfull failed [jk_ajp13_worker.c (619)]: Error reading request [jk_ajp13_worker.c (489)]: Into jk_endpoint_t::done If I hit reload multiple times, eventually Tomcat will serve the servlet fine. Did anyone see this problem before? Is there anyway around this? shinta
RE: FW: problem w/ ajp13 - if Tomcat is shutdown
Title: RE: FW: problem w/ ajp13 - if Tomcat is shutdown I am using Tomcat 3.2.1, Apache 1.3.14, running on Solaris 2.8, Sun machines. After various attempts of debugging this, I have more information. 1. Even though I'm setting the worker's property cache_size to default (1), I'm finding there are up to 6 connections opened from Apache to Tomcat. I deduce this by looking at the mod_jk.conf and by doing netstat. I finally find out, this is so because my Apache is set to spawn minimum of 6 children and each of those children are making separate connections to Tomcat. This is very bad because, I ended up having to reload 6 times before Tomcat starts serving me the page again. Each time it uses a different Apache children that has defunct socket. So the more Apache children I have, the longer it takes me to recover from this problem. 2. It seems when Tomcat dies restarts, the send() called by ajp13's jk_tcp_socket_sendfull() does not get an error. But the recv() does get an error, with errno ECONNRESET. After which, the socket is properly closed. 3. When I shutdown Tomcat, those sockets that were opened between Apache/Tomcat showed up in state CLOSE_WAIT, and FIN_WAIT2. I think this is normally solved by calling the shutdown() API after closing the socket. However, this would have to be done from the Tomcat side in Ajp13ConnectionHandler.java. I can't find the corresponding method of Socket in Java. So.. based on all of these, the only fix I can think of putting is to make mod_jk retry the send() if recv() comes back with an error ECONNRESET. The retry should happen after the old socket is properly closed. Anyone wants to comment? shinta -Original Message- From: Dan Milstein [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 06, 2001 12:00 PM To: [EMAIL PROTECTED] Subject: Re: FW: problem w/ ajp13 - if Tomcat is shutdown What version of TC are you using? What version of Apache? I would look into the mod_jk docs -- I think this is the spec'd behavior (which, admittedly, is not great, but that makes it more of a feature request than a bug ;-). With ajp13, Apache opens up a persistent TCP/IP connection TC -- if TC restarts, I think that connection may just hang up and then timeout (since Apache doesn't know that TC has restarted). If anyone wants to work on this, you would have the undying thanks of the rest of the TC community -- having to restart Apache all the time bugs a *lot* of people. -Dan Shinta Tjio wrote: I'm having problem with mod_jk if ajp13 is used. The problem is often reproduced when Tomcat is shut down without Apache being shut down. When a request is fired through Apache as soon as Tomcat starts, I often get Internal Server Error. The mod_jk.log will have the following: [jk_uri_worker_map.c (344)]: Into jk_uri_worker_map_t::map_uri_to_worker [jk_uri_worker_map.c (406)]: jk_uri_worker_map_t::map_uri_to_worker, Found a match ajp13 [jk_worker.c (123)]: Into wc_get_worker_for_name ajp13 [jk_worker.c (127)]: wc_get_worker_for_name, done found a worker [jk_ajp13_worker.c (651)]: Into jk_worker_t::get_endpoint [jk_ajp13_worker.c (536)]: Into jk_endpoint_t::service [jk_ajp13.c (346)]: Into ajp13_marshal_into_msgb [jk_ajp13.c (480)]: ajp13_marshal_into_msgb - Done [jk_ajp13_worker.c (203)]: connection_tcp_get_message: Error - jk_tcp_socket_recvfull failed [jk_ajp13_worker.c (619)]: Error reading request [jk_ajp13_worker.c (489)]: Into jk_endpoint_t::done If I hit reload multiple times, eventually Tomcat will serve the servlet fine. Did anyone see this problem before? Is there anyway around this? shinta -- Dan Milstein // [EMAIL PROTECTED]
RE: FW: problem w/ ajp13 - if Tomcat is shutdown
Title: RE: FW: problem w/ ajp13 - if Tomcat is shutdown Here's exactly what I did to reproduce the problem. Again, I'm running this on a Solaris 2.8 SunOS machine, using JDK1.2.2, Apache 1.3.14, Tomcat 3.2.1. 1. Start Apache 2. Start Tomcat 3. Start hitting Apache with multiple requests, such as /example/servlet/HelloWorldExample. Make sure there are some connections opened from Apache to Tomcat. To make sure, you can do netstat -a | grep Tomcat port where Tomcat port is the ajp13 port. It should show some socket in ESTABLISHED state. 4. While the connections are in the ESTABLISHED state (this should be the state because ajp13 reuse connections), shutdown Tomcat. Now you will notice the same netstat will show some sockets in FIN_WAIT2 and CLOSE_WAIT state. 5. Now restart Tomcat. 6. Repeat step 3. You will get Internal Server Error, up to the number of children Apache has. After that number, the page will be served. Attached is the server.xml, workers.properties, httpd.conf. My test servlet is called /mytest/servlet/testServlet but I think you can try it with any kind of servlet. I have been trying to code up the retry I mentioned below. I think I got it working. I just need to clean up the code some more. shinta -Original Message- From: oliver2, andy [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 06, 2001 6:33 PM To: 'Shinta Tjio '; '[EMAIL PROTECTED] ' Cc: ''Dan Milstein' ' Subject: RE: FW: problem w/ ajp13 - if Tomcat is shutdown I was running recently with that exact same configuration but did not experience that problem. Can you show some config files. -Andy -Original Message- From: Shinta Tjio To: [EMAIL PROTECTED] Cc: 'Dan Milstein' Sent: 3/6/01 7:01 PM Subject: RE: FW: problem w/ ajp13 - if Tomcat is shutdown I am using Tomcat 3.2.1, Apache 1.3.14, running on Solaris 2.8, Sun machines. After various attempts of debugging this, I have more information. 1. Even though I'm setting the worker's property cache_size to default (1), I'm finding there are up to 6 connections opened from Apache to Tomcat. I deduce this by looking at the mod_jk.conf and by doing netstat. I finally find out, this is so because my Apache is set to spawn minimum of 6 children and each of those children are making separate connections to Tomcat. This is very bad because, I ended up having to reload 6 times before Tomcat starts serving me the page again. Each time it uses a different Apache children that has defunct socket. So the more Apache children I have, the longer it takes me to recover from this problem. 2. It seems when Tomcat dies restarts, the send() called by ajp13's jk_tcp_socket_sendfull() does not get an error. But the recv() does get an error, with errno ECONNRESET. After which, the socket is properly closed. 3. When I shutdown Tomcat, those sockets that were opened between Apache/Tomcat showed up in state CLOSE_WAIT, and FIN_WAIT2. I think this is normally solved by calling the shutdown() API after closing the socket. However, this would have to be done from the Tomcat side in Ajp13ConnectionHandler.java. I can't find the corresponding method of Socket in Java. So.. based on all of these, the only fix I can think of putting is to make mod_jk retry the send() if recv() comes back with an error ECONNRESET. The retry should happen after the old socket is properly closed. Anyone wants to comment? shinta -Original Message- From: Dan Milstein [ mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ] Sent: Tuesday, March 06, 2001 12:00 PM To: [EMAIL PROTECTED] Subject: Re: FW: problem w/ ajp13 - if Tomcat is shutdown What version of TC are you using? What version of Apache? I would look into the mod_jk docs -- I think this is the spec'd behavior (which, admittedly, is not great, but that makes it more of a feature request than a bug ;-). With ajp13, Apache opens up a persistent TCP/IP connection TC -- if TC restarts, I think that connection may just hang up and then timeout (since Apache doesn't know that TC has restarted). If anyone wants to work on this, you would have the undying thanks of the rest of the TC community -- having to restart Apache all the time bugs a *lot* of people. -Dan Shinta Tjio wrote: I'm having problem with mod_jk if ajp13 is used. The problem is often reproduced when Tomcat is shut down without Apache being shut down. When a request is fired through Apache as soon as Tomcat starts, I often get Internal Server Error. The mod_jk.log will have the following: [jk_uri_worker_map.c (344)]: Into jk_uri_worker_map_t::map_uri_to_worker [jk_uri_worker_map.c (406)]: jk_uri_worker_map_t::map_uri_to_worker, Found a match ajp13 [jk_worker.c (123)]: Into wc_get_worker_for_name ajp13