Re: ICAP connections under heavy loads

2012-09-01 Thread Amos Jeffries

On 1/09/2012 5:05 a.m., Alex Rousskov wrote:

On 08/31/2012 09:01 AM, Alexander Komyagin wrote:

Alex, I figured it out, finally! The bug was in comm_connect_addr()
function (I suppose it is kernel-dependent though).

Consider following call trace:
1) Xaction starts ConnOpener in order to create new connection to ICAP
2) ConnOpener calls comm_connect_addr()
3) comm_connect_addr() initiates connection through connect() and
returns COMM_INPROGRESS
...since our ICAP service is too busy connection will eventually timeout
(connectTimeout_), so...
4) Comm::ConnOpener::timeout() is called
5) Comm::ConnOpener::timeout calls connect()
6) connect() calls comm_connect_addr()
7) comm_connect_addr() calls getsockopt(SOL_SOCKET, SO_ERROR) to check
current connection status
8) getsockopt(..) returns errno 0 --- BUG

Actually, the bug is in step #5. If we timed out, we should not try to
connect again.

Step #8 seems correct -- from the OS point of view, internal Squid
timeout is not an error. If you fix step #5, step #8 will not happen though.



9) comm_connect_addr() returns COMM_OK so ConnOpener would think that
connection was successfully established and pass the connection back to
Xaction object, then we would have noteCommRead and noteCommWrote
exceptions and so on...

According to `man connect`, when connect() returns errno EINPROGRESS:
EINPROGRESS
 The socket is nonblocking and the connection cannot be completed
 immediately. It is possible to select(2) or poll(2) for
 completion by selecting the socket for writing. After select(2)
 indicates writability, use getsockopt(2) to read the SO_ERROR
 option at level SOL_SOCKET to determine whether connect()
 completed successfully (SO_ERROR is zero) or unsuccessfully
 (SO_ERROR is one of the usual error codes listed here,
 explaining the reason for the failure).

Actually ConnOpener uses SetSelect(..) in order to wait before calling
comm_connect_addr() (and connect() ) again, but timeout in fact breaks
the rule calling getsockopt() right after connect().

The above man page excerpt says to call getsockopt() after successful
select() notification. I think that is what we do now.

I do not think we are or should be calling connect() twice -- the OS
knows that we want to connect and does not need reminders. Note the
(!F-flags.called_connect) guard in comm.cc.


The connect() mentioned in step #5 and #6 is 
Comm::ConnOpener::connect() which calls the comm.cc code.




If you are looking for connect-related bugs, I recommend fixing step #5
and also examining this hack:


 x = connect(sock, AI-ai_addr, AI-ai_addrlen);

 // XXX: ICAP code refuses callbacks during a pending comm_ call
 // Async calls development will fix this.
 if (x == 0) {
 x = -1;
 errno = EINPROGRESS;
 }

I hope the above XXX code is no longer needed (and removing it will save
a lot of CPU cycles for everybody, not just ICAP!), but this needs to be
double checked using bzr logs to restore the exact reason behind that
hack and by examining the current ICAP code.





In my environment we use uClibc and rsbac kernel, but I suppose that the
problem may persist in other environment too.




I've attached two small patches. One - to avoid connection loss in case
client aborted the request before connection to the ICAP was established
(remember that BUG: Orphaned... stuff I asked you before?).
Another one is to actually fix the ICAP connection timeout (not only
ICAP actually) problem. Comments will be appreciated.


* For the conn-open-timeout patch:

Overall, I think this patch is fighting the side-effects of other bugs.
Most importantly, the timeout handler should abort the ConnOpener job on
the spot rather than go through one more select() try. I will comment on
specific changes below, just in case, but please keep this overall
problem in mind.


ConnOpener should be obeying squid.conf connect_retries option for 
number of times it closes the socket, re-open and re-connect()s.



  case COMM_INPROGRESS:
  // check for timeout FIRST.
-if (squid_curtime - connectStart_  connectTimeout_) {
+if (squid_curtime - connectStart_ = connectTimeout_) {
  debugs(5, 5, HERE  conn_  : * - ERR took too long already.);

Please revert this change. In theory, we can be called back again before
(squid_curtime - connectStart_) changes. Not a big deal, of course, (the
equality should be a very rare event) but the old code was more accurate
from logic/math point of view.

Also, if the debug message were to print by how much we were late,
printing zero would look strange.

If there is some hidden side-effect or dependency here that I missed,
please let me know (but note that I think this code must not be called
from the timeout handler -- as discussed above, the timeout handler
should abort instead of double checking whether there is a timeout).



  

Re: [PATCH] Do not reuse persistent connections for PUTs

2012-09-01 Thread Henrik Nordström
fre 2012-08-31 klockan 10:58 +0300 skrev Tsantilas Christos:

  1) To decide if it can reuse a healthy persistent connection. Inside
 FwdState::connectStart, we are getting a persistant connection and even
 if it is healthy, if we want to send eg a PUT request we are closing the
 persistent connection and we are opening a new one.

How do we know it's healthy?

The issue here is that we cannot retry some kinds of requests, and HTTP
requires us to handle when the server closes the connection while we are
sending the request. It is a speed of light problem, we cannot know if
the server is just about to close the connection or if it have already
closed the connection but the FIN have not yet been seen on our side.

 From what I can understand there is not any reason to not reuse a
 healthy persistent connection, for PUT requests.  Am I correct? If yes
 then the problem is only in the case (2).

There is, because we often cannot retry the PUT because we have not kep
a copy of the already partially sent PUT body, and it's expected to
sometimes fail tp send requestso n a persistent connection due to above
race condition.

THe race condition between HTTP server and client (squid) is fairly big
and easily fits many MB of traffic.

Regards
Henrik



Re: [PATCH] Do not reuse persistent connections for PUTs

2012-09-01 Thread Henrik Nordström
fre 2012-08-31 klockan 21:44 +0300 skrev Tsantilas Christos:

 So looks that a good solution should be (similar to) the solution
 proposed by Henrik...

100 Continue aviods the race entirely on requests with bodies, leaving
only bodyless requests in the we may not retry this on failure but we
need to.. problem.

Keeping a copy of the sent body allows the request to be retried in
cases where 100 Continue were not used or could not be used.

Regards
Henrik



Re: [PATCH] Do not reuse persistent connections for PUTs

2012-09-01 Thread Alex Rousskov
On 09/01/2012 07:11 AM, Henrik Nordström wrote:
 fre 2012-08-31 klockan 21:44 +0300 skrev Tsantilas Christos:
 
 So looks that a good solution should be (similar to) the solution
 proposed by Henrik...
 
 100 Continue aviods the race entirely on requests with bodies, leaving
 only bodyless requests in the we may not retry this on failure but we
 need to.. problem.

There are at least two problems with 100-continue:

Performance: Recall that we got into this mess because of the
performance complaint. 100-continue will not cause more connections to
be opened, but it will introduce a significant delay in some environments.

Compatibility: I am sure there will be cases where HTTP/1.1 servers will
not respond with 100 Continue. While Squid will not be at fault from
compliance point of view, the but it used to work fine or it works
without a proxy arguments will force us to add exceptions. I do not
know whether the problems will be widespread enough to warrant default
exceptions.


 Keeping a copy of the sent body allows the request to be retried in
 cases where 100 Continue were not used or could not be used.

I suspect the ideal solution will involve a combination of 100-continue
(where possible and desired) and buffering (otherwise). Unfortunately,
this combination doubles the amount of rather complex coding required to
arrive at that ideal.


Cheers,

Alex.