Re: [PATCH] Do not reuse persistent connections for PUTs

2012-09-10 Thread Alex Rousskov
> ons 2012-08-29 klockan 12:42 -0600 skrev Alex Rousskov:
>> Hello,
>>
>> I saw bogus ERR_ZERO_SIZE_OBJECT responses while testing Squid v3.1,
>> but the same problem ought to be present in v3.2 as well.
>>
>> A compliant proxy may retry PUTs, but Squid lacks the [rather
>> complicated] code required to protect the PUT request body from being
>> nibbled during the first try, when pconn races are possible.
> 
> +1 on the patch from me.

Committed to trunk as r12319.


>> Today, we have a choice:
>>
>>   A. Fail some PUTs. Close fewer pconns (current code).
>>   B. Handle all PUTs. Open more connections (patched code).
>>
>> If this patch is accepted, performance bug 3398 will resurface. Henrik,
>> do you think committing the patch is the right decision here even though
>> it will reopen bug 3398?
> 
> Yes, but with a plan to fix it correctly.
...

>> The bug title is wrong. There is a long discussion in the bug report
>> about what the bug is really about. I think a better bug title would be:
>> "persistent server connection closed before PUT".
> 
> yes.
> 
> Fix the bug title, reopen it with comment above and restore the check.

Done. Quality patches preserving PUT bodies (via 100-continue and/or
buffering) to optimize Squid pconn handling are welcomed.


Thank you,

Alex.



Re: [PATCH] Do not reuse persistent connections for PUTs

2012-09-01 Thread Alex Rousskov
On 09/01/2012 07:11 AM, Henrik Nordström wrote:
> fre 2012-08-31 klockan 21:44 +0300 skrev Tsantilas Christos:
> 
>> So looks that a good solution should be (similar to) the solution
>> proposed by Henrik...
> 
> 100 Continue aviods the race entirely on requests with bodies, leaving
> only bodyless requests in the "we may not retry this on failure but we
> need to.." problem.

There are at least two problems with 100-continue:

Performance: Recall that we got into this mess because of the
performance complaint. 100-continue will not cause more connections to
be opened, but it will introduce a significant delay in some environments.

Compatibility: I am sure there will be cases where HTTP/1.1 servers will
not respond with 100 Continue. While Squid will not be at fault from
compliance point of view, the "but it used to work fine" or "it works
without a proxy" arguments will force us to add exceptions. I do not
know whether the problems will be widespread enough to warrant default
exceptions.


> Keeping a copy of the sent body allows the request to be retried in
> cases where 100 Continue were not used or could not be used.

I suspect the ideal solution will involve a combination of 100-continue
(where possible and desired) and buffering (otherwise). Unfortunately,
this combination doubles the amount of rather complex coding required to
arrive at that ideal.


Cheers,

Alex.



Re: [PATCH] Do not reuse persistent connections for PUTs

2012-09-01 Thread Henrik Nordström
fre 2012-08-31 klockan 21:44 +0300 skrev Tsantilas Christos:

> So looks that a good solution should be (similar to) the solution
> proposed by Henrik...

100 Continue aviods the race entirely on requests with bodies, leaving
only bodyless requests in the "we may not retry this on failure but we
need to.." problem.

Keeping a copy of the sent body allows the request to be retried in
cases where 100 Continue were not used or could not be used.

Regards
Henrik



Re: [PATCH] Do not reuse persistent connections for PUTs

2012-09-01 Thread Henrik Nordström
fre 2012-08-31 klockan 10:58 +0300 skrev Tsantilas Christos:

>  1) To decide if it can reuse a healthy persistent connection. Inside
> FwdState::connectStart, we are getting a persistant connection and even
> if it is healthy, if we want to send eg a PUT request we are closing the
> persistent connection and we are opening a new one.

How do we know it's healthy?

The issue here is that we cannot retry some kinds of requests, and HTTP
requires us to handle when the server closes the connection while we are
sending the request. It is a speed of light problem, we cannot know if
the server is just about to close the connection or if it have already
closed the connection but the FIN have not yet been seen on our side.

> From what I can understand there is not any reason to not reuse a
> healthy persistent connection, for PUT requests.  Am I correct? If yes
> then the problem is only in the case (2).

There is, because we often cannot retry the PUT because we have not kep
a copy of the already partially sent PUT body, and it's expected to
sometimes fail tp send requestso n a persistent connection due to above
race condition.

THe race condition between HTTP server and client (squid) is fairly big
and easily fits many MB of traffic.

Regards
Henrik



Re: [PATCH] Do not reuse persistent connections for PUTs

2012-08-31 Thread Tsantilas Christos
On 08/31/2012 05:31 PM, Alex Rousskov wrote:
> On 08/31/2012 01:58 AM, Tsantilas Christos wrote:
>> On 08/30/2012 06:11 AM, Henrik Nordström wrote:
>>> ons 2012-08-29 klockan 15:32 -0600 skrev Alex Rousskov:
>>>
 Today, we have a choice:

   A. Fail some PUTs. Close fewer pconns (current code).
   B. Handle all PUTs. Open more connections (patched code).
>>
>> The FwdState::checkRetriable() method is used in the following two cases:
>>
>>  1) To decide if it can reuse a healthy persistent connection. Inside
>> FwdState::connectStart, we are getting a persistant connection and even
>> if it is healthy, if we want to send eg a PUT request we are closing the
>> persistent connection and we are opening a new one.
>>
>>  2) To decide if a connection can be retried (inside FwdState::checkRetry())
>>
>> From what I can understand there is not any reason to not reuse a
>> healthy persistent connection, for PUT requests.  Am I correct?
> 
> I am afraid you are not correct. There is a reason. Today, the code may
> destroy PUT content before we can detect a pconn race. If we reuse a
> pconn for PUT, we may later discover a pconn race but would be unable to
> retry the failed PUT because the PUT content would have been already
> nibbled by then.

I see...

> 
> In other words (and simplifying a bit), the current "not nibbled"
> condition in checkRetriable() should be replaced with "not nibbled and
> will not be nibbled" condition. Since we cannot guarantee the "will not"
> part, we have to exclude all requests carrying body from being sent on
> reused pconns.
> 
> I can reproduce this bug in the lab so it is not just a theory:
> HttpStateData consumes PUT request body and only then gets a zero-length
> read from the server, resulting in ERR_ZERO_SIZE_OBJECT returned to the
> innocent client.
> 
> 
> 
>> In the case (2) we are using the "if (request->bodyNibbled())" to
>> examine if any body data consumed and probably sent to the server. So we
>> should not have any problem at all.
>> However in the case the HttpRequest::bodyNibbled() is not enough we can
>> add the check "if (request->body_pipe != NULL)" inside
>> FwdState::checkRetry() method after calling checkRetriable.
> 
> The current bodyNibbled() check is enough to avoid sending a nibbled
> request to the server. Unfortunately, that means we will not retry some
> failed PUTs, and we must retry them.
> 
> In other words, after (the pconn race happened and we nibbled) it is too
> late to fix the situation by retrying a PUT. Thus, we have to prevent
> either pconn races or nibbling. We can easily prevent races (the
> proposed patch does that). Eventually, we will prevent nibbling (to get
> a performance boost).
> 
> 
> Does this clarify?

Yep.
So looks that a good solution should be (similar to) the solution
proposed by Henrik...


> 
> Alex.
> 
> 



Re: [PATCH] Do not reuse persistent connections for PUTs

2012-08-31 Thread Alex Rousskov
On 08/31/2012 01:58 AM, Tsantilas Christos wrote:
> On 08/30/2012 06:11 AM, Henrik Nordström wrote:
>> ons 2012-08-29 klockan 15:32 -0600 skrev Alex Rousskov:
>>
>>> Today, we have a choice:
>>>
>>>   A. Fail some PUTs. Close fewer pconns (current code).
>>>   B. Handle all PUTs. Open more connections (patched code).
> 
> The FwdState::checkRetriable() method is used in the following two cases:
> 
>  1) To decide if it can reuse a healthy persistent connection. Inside
> FwdState::connectStart, we are getting a persistant connection and even
> if it is healthy, if we want to send eg a PUT request we are closing the
> persistent connection and we are opening a new one.
> 
>  2) To decide if a connection can be retried (inside FwdState::checkRetry())
> 
> From what I can understand there is not any reason to not reuse a
> healthy persistent connection, for PUT requests.  Am I correct?

I am afraid you are not correct. There is a reason. Today, the code may
destroy PUT content before we can detect a pconn race. If we reuse a
pconn for PUT, we may later discover a pconn race but would be unable to
retry the failed PUT because the PUT content would have been already
nibbled by then.

In other words (and simplifying a bit), the current "not nibbled"
condition in checkRetriable() should be replaced with "not nibbled and
will not be nibbled" condition. Since we cannot guarantee the "will not"
part, we have to exclude all requests carrying body from being sent on
reused pconns.

I can reproduce this bug in the lab so it is not just a theory:
HttpStateData consumes PUT request body and only then gets a zero-length
read from the server, resulting in ERR_ZERO_SIZE_OBJECT returned to the
innocent client.



> In the case (2) we are using the "if (request->bodyNibbled())" to
> examine if any body data consumed and probably sent to the server. So we
> should not have any problem at all.
> However in the case the HttpRequest::bodyNibbled() is not enough we can
> add the check "if (request->body_pipe != NULL)" inside
> FwdState::checkRetry() method after calling checkRetriable.

The current bodyNibbled() check is enough to avoid sending a nibbled
request to the server. Unfortunately, that means we will not retry some
failed PUTs, and we must retry them.

In other words, after (the pconn race happened and we nibbled) it is too
late to fix the situation by retrying a PUT. Thus, we have to prevent
either pconn races or nibbling. We can easily prevent races (the
proposed patch does that). Eventually, we will prevent nibbling (to get
a performance boost).


Does this clarify?

Alex.



Re: [PATCH] Do not reuse persistent connections for PUTs

2012-08-31 Thread Tsantilas Christos
On 08/30/2012 06:11 AM, Henrik Nordström wrote:
> ons 2012-08-29 klockan 15:32 -0600 skrev Alex Rousskov:
> 
>> Today, we have a choice:
>>
>>   A. Fail some PUTs. Close fewer pconns (current code).
>>   B. Handle all PUTs. Open more connections (patched code).

The FwdState::checkRetriable() method is used in the following two cases:

 1) To decide if it can reuse a healthy persistent connection. Inside
FwdState::connectStart, we are getting a persistant connection and even
if it is healthy, if we want to send eg a PUT request we are closing the
persistent connection and we are opening a new one.

 2) To decide if a connection can be retried (inside FwdState::checkRetry())

>From what I can understand there is not any reason to not reuse a
healthy persistent connection, for PUT requests.  Am I correct? If yes
then the problem is only in the case (2).

In the case (2) we are using the "if (request->bodyNibbled())" to
examine if any body data consumed and probably sent to the server. So we
should not have any problem at all.
However in the case the HttpRequest::bodyNibbled() is not enough we can
add the check "if (request->body_pipe != NULL)" inside
FwdState::checkRetry() method after calling checkRetriable.

I think this is easier, at least for a short term fix, than delaying
sending body...

>>
>> If this patch is accepted, performance bug 3398 will resurface. Henrik,
>> do you think committing the patch is the right decision here even though
>> it will reopen bug 3398?
> 
> Yes, but with a plan to fix it correctly.
> 
> A suggested correct fix for 3398 is to make use of Expect: 100-continue.
> This delays sending the body a bit, allowing detection of broken
> connections before starting to send the body.
> 
> An optimization from that is to add some buffering of request bodies to
> allow skipping the 100-continue to speed up forwarding.
> 
> Regards
> Henrik
> 
>> The bug title is wrong. There is a long discussion in the bug report
>> about what the bug is really about. I think a better bug title would be:
>> "persistent server connection closed before PUT".
> 
> yes.
> 
> Fix the bug title, reopen it with comment above and restore the check.
> 
> Regatds
> Henrik
> 
> 



Re: [PATCH] Do not reuse persistent connections for PUTs

2012-08-29 Thread Henrik Nordström
ons 2012-08-29 klockan 15:32 -0600 skrev Alex Rousskov:

> Today, we have a choice:
> 
>   A. Fail some PUTs. Close fewer pconns (current code).
>   B. Handle all PUTs. Open more connections (patched code).
> 
> If this patch is accepted, performance bug 3398 will resurface. Henrik,
> do you think committing the patch is the right decision here even though
> it will reopen bug 3398?

Yes, but with a plan to fix it correctly.

A suggested correct fix for 3398 is to make use of Expect: 100-continue.
This delays sending the body a bit, allowing detection of broken
connections before starting to send the body.

An optimization from that is to add some buffering of request bodies to
allow skipping the 100-continue to speed up forwarding.

Regards
Henrik

> The bug title is wrong. There is a long discussion in the bug report
> about what the bug is really about. I think a better bug title would be:
> "persistent server connection closed before PUT".

yes.

Fix the bug title, reopen it with comment above and restore the check.

Regatds
Henrik



Re: [PATCH] Do not reuse persistent connections for PUTs

2012-08-29 Thread Alex Rousskov
On 08/29/2012 02:32 PM, Henrik Nordström wrote:
> ons 2012-08-29 klockan 22:20 +0200 skrev Henrik Nordström:
>> ons 2012-08-29 klockan 12:42 -0600 skrev Alex Rousskov:
>>> I saw bogus ERR_ZERO_SIZE_OBJECT responses while testing Squid v3.1,
>>> but the same problem ought to be present in v3.2 as well.
>>>
>>> A compliant proxy may retry PUTs, but Squid lacks the [rather
>>> complicated] code required to protect the PUT request body from being
>>> nibbled during the first try, when pconn races are possible.
>>
>> +1 on the patch from me.
> 
> It's a reincarnation of Bug #569, kind of. The check for a request body
> have "always" been there (well since that bug was fixed), but was
> deleted in revision 11589 for some reason.

Indeed! I think Christos deleted the check to address the complaint that
Squid was closing a previously established pconn before sending a new
POST or PUT.

What nobody (including me) realized is that Squid is not yet ready to
retry some PUTs that fail due to a pconn race. There is code to check
that we do not retry a PUT when we cannot retry it, but there is no code
to preserve PUT so that it can be retried if needed. That missing code
would be quite tricky to write correctly.

Today, we have a choice:

  A. Fail some PUTs. Close fewer pconns (current code).
  B. Handle all PUTs. Open more connections (patched code).

If this patch is accepted, performance bug 3398 will resurface. Henrik,
do you think committing the patch is the right decision here even though
it will reopen bug 3398?


Regardless of the A/B decision today, it would be nice to add code to
protect [small] PUT contents if necessary, so that we can do (B) without
sacrificing performance in common cases. That requires a lot more work
as I already mentioned above.


> Log message on the delete says
> 
>   Bug 3398: persistent server connection closed after PUT/DELETE


The bug title is wrong. There is a long discussion in the bug report
about what the bug is really about. I think a better bug title would be:
"persistent server connection closed before PUT".


Thank you,

Alex.



Re: [PATCH] Do not reuse persistent connections for PUTs

2012-08-29 Thread Henrik Nordström
ons 2012-08-29 klockan 22:20 +0200 skrev Henrik Nordström:
> ons 2012-08-29 klockan 12:42 -0600 skrev Alex Rousskov:
> > Hello,
> > 
> > I saw bogus ERR_ZERO_SIZE_OBJECT responses while testing Squid v3.1,
> > but the same problem ought to be present in v3.2 as well.
> > 
> > A compliant proxy may retry PUTs, but Squid lacks the [rather
> > complicated] code required to protect the PUT request body from being
> > nibbled during the first try, when pconn races are possible.
> 
> +1 on the patch from me.

It's a reincarnation of Bug #569, kind of. The check for a request body
have "always" been there (well since that bug was fixed), but was
deleted in revision 11589 for some reason.

Log message on the delete says

  Bug 3398: persistent server connection closed after PUT/DELETE

Regards
Henrik



Re: [PATCH] Do not reuse persistent connections for PUTs

2012-08-29 Thread Henrik Nordström
ons 2012-08-29 klockan 12:42 -0600 skrev Alex Rousskov:
> Hello,
> 
> I saw bogus ERR_ZERO_SIZE_OBJECT responses while testing Squid v3.1,
> but the same problem ought to be present in v3.2 as well.
> 
> A compliant proxy may retry PUTs, but Squid lacks the [rather
> complicated] code required to protect the PUT request body from being
> nibbled during the first try, when pconn races are possible.

+1 on the patch from me.

Regards
Henrik



[PATCH] Do not reuse persistent connections for PUTs

2012-08-29 Thread Alex Rousskov
Hello,

I saw bogus ERR_ZERO_SIZE_OBJECT responses while testing Squid v3.1,
but the same problem ought to be present in v3.2 as well.

A compliant proxy may retry PUTs, but Squid lacks the [rather
complicated] code required to protect the PUT request body from being
nibbled during the first try, when pconn races are possible. Thus, Squid
cannot safely retry some PUTs today, and FwdState::checkRetriable() must
return false for all PUTs, to avoid bogus ERR_ZERO_SIZE_OBJECT errors
(especially for clients that did not reuse a pconn and, hence, may not
be ready to handle/retry an error response).

In theory, requests with safe or idempotent methods other than PUT might
have bodies so the patch applies the same logic to them as well.


Thank you,

Alex.
Do not reuse persistent connections for PUTs to avoid ERR_ZERO_SIZE_OBJECT.

A compliant proxy may retry PUTs, but Squid lacks the [rather complicated]
code required to protect the PUT request body from being nibbled during the
first try. Thus, Squid cannot safely retry some PUTs today, and
FwdState::checkRetriable() must return false for all PUTs, to avoid
bogus ERR_ZERO_SIZE_OBJECT errors (especially for clients that did not
reuse a pconn and, hence, may not be ready to handle/retry an error response).

In theory, requests with safe or idempotent methods other than PUT might have
bodies so we apply the same logic to them as well.

=== modified file 'src/forward.cc'
--- src/forward.cc	2012-08-23 00:01:00 +
+++ src/forward.cc	2012-08-29 18:10:42 +
@@ -459,60 +459,66 @@
  * Return TRUE if this is the kind of request that can be retried
  * after a failure.  If the request is not retriable then we don't
  * want to risk sending it on a persistent connection.  Instead we'll
  * force it to go on a new HTTP connection.
  */
 bool
 FwdState::checkRetriable()
 {
 /* RFC2616 9.1 Safe and Idempotent Methods */
 switch (request->method.id()) {
 /* 9.1.1 Safe Methods */
 
 case METHOD_GET:
 
 case METHOD_HEAD:
 /* 9.1.2 Idempotent Methods */
 
 case METHOD_PUT:
 
 case METHOD_DELETE:
 
 case METHOD_OPTIONS:
 
 case METHOD_TRACE:
 break;
 
 default:
 return false;
 }
 
+// Optimize: A compliant proxy may retry PUTs, but Squid lacks the [rather
+// complicated] code required to protect the PUT request body from being
+// nibbled during the first try. Thus, Squid cannot retry some PUTs today.
+if (request->body_pipe != NULL)
+return false;
+
 return true;
 }
 
 void
 FwdState::serverClosed(int fd)
 {
 debugs(17, 2, "fwdServerClosed: FD " << fd << " " << entry->url());
 assert(server_fd == fd);
 server_fd = -1;
 
 retryOrBail();
 }
 
 void
 FwdState::retryOrBail()
 {
 if (checkRetry()) {
 int originserver = (servers->_peer == NULL);
 debugs(17, 3, "fwdServerClosed: re-forwarding (" << n_tries << " tries, " << (squid_curtime - start_t) << " secs)");
 
 if (servers->next) {
 /* use next, or cycle if origin server isn't last */
 FwdServer *fs = servers;
 FwdServer **T, *T2 = NULL;
 servers = fs->next;
 
 for (T = &servers; *T; T2 = *T, T = &(*T)->next);
 if (T2 && T2->_peer) {
 /* cycle */
 *T = fs;