Re: [squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

2015-07-15 Thread Alex Rousskov
On 06/26/2015 06:44 AM, Amos Jeffries wrote:
 +// invalid character somewhere in the line.
 +// As long as we can find the LF, accept the characters
 +// which we know are invalid in any URI, but actively used
 +LfDelim.add('\0'); // Java
 +LfDelim.add(' ');  // IIS
 +LfDelim.add('\'); // Bing
 +LfDelim.add('\\'); // MSIE, Firefox
 +LfDelim.add('|');  // Amazon


On 06/26/2015 09:40 AM, Alex Rousskov wrote:

 In your patch, please add support for all URI characters that we can
 support (or at least all the unwise ones from RFC 2396), not just the
 characters that recent deployments have already confirmed as necessary
 to accommodate. We do not want to come back to this every time some app
 starts sending slightly malformed URIs.


Just got another bug report from the real world. This time it is about
the ^ character used in URLs on a Microsoft news site (probably coming
from some affiliated advertisement services).

I am posting this not to just to emphasize that the list of added
characters is too limited, but to re-emphasize that the whole allow
what we know is actively used approach is unfortunate. In your
long-term patch, please add support for all URI characters that we can
support (or at least all the unwise ones from RFC 2396).


Thank you,

Alex.

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

2015-06-26 Thread Alex Rousskov
On 06/26/2015 06:44 AM, Amos Jeffries wrote:
 On 26/06/2015 7:55 a.m., Alex Rousskov wrote:
 Tokenizer cannot handle URIs with whitespaces directly, like your patch
 attempts to do: Tokenizer alone cannot handle ambiguous grammars. To
 handle such URIs well, you have two options IMO:

 A. The old do it by hand way:

   0. Trim request line to remove trailing whitespace and CR*LF.
   1. Find the *last* whitespace on the trimmed request line.
   2. To get the request method and URI, apply the tokenizer to the
  request line prefix before that last whitespace.
   3. To get the protocol parts, apply the tokenizer to the
  request line suffix after that last whitespace.
   4. Make the code even messier to handle HTTP/0.9 cases if needed.


 B. The new optimized and simplified optimistic Tokenizer way:

 Here is a sketch:

 SBuf uri; // accumulates request URI characters
 if (!tok.prefix(uri, StrictUriChars))
 return false; // URI does not start with a valid character

 // in the order of correctness (and popularity!):
 const bool parsedSuffix =
 // canonical HTTP/1 format
 parseCanonicalReqLineTail(uri, tok) ||
 // HTTP/1 format but with bad characters in URI
 parseBadUriReqLineTail(uri, tok) ||
 // HTTP/1 format but with bad characters and spaces in URI
 parseSpacedReqLineTail(uri, tok) ||
 // HTTP/0 format (without the protocol part at all)
 parseHttpZeroReqLineTail(uri, tok);

 Assert(parsedSuffix); // parseHttpZeroReqLineTail() cannot fail
 ...


 C. locate LF character and parse in reverse. HTTP-Version label, WSP
 then remainder is URI.

Yes, this is a better variant of option A, using Tokenizer::suffix(); I
missed that method. It is still manual labor to accommodate rare special
cases while complicating and slowing down parsing of the common case.



 (B) is Interesting alternative to what we have now (C). Although you are
 missing the relaxed vs strict characters which the RFC mandates (for
 both URI and whitspace separately permutating). That complicates things
 a bit.

I do not see where design B is missing something. Obviously(?), if there
are more special cases than the sketch shows, they should be added. I
was just illustrating the approach. The actual implementation should be
different.


 If I can find some time I'll give this (B) style a bit of testing and
 see how it matches up against the current code. Could prove faster on
 very long URIs.

Given correct implementations of A and B, B should be a tiny bit faster
for an average URI because it does less work in common cases.


In your patch, please add support for all URI characters that we can
support (or at least all the unwise ones from RFC 2396), not just the
characters that recent deployments have already confirmed as necessary
to accommodate. We do not want to come back to this every time some app
starts sending slightly malformed URIs.

I would also recommend accommodating whitespace after HTTP-version.

Ideally, parseRequestFirstLine(), or at least the new code, should be
split into several methods. It is too long, convoluted, and is getting
worse.


I remain opposed to downgrading HTTP/1 requests with slightly malformed
URIs to HTTP/0.


HTH,

Alex.

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

2015-06-26 Thread Amos Jeffries
On 26/06/2015 7:55 a.m., Alex Rousskov wrote:
 On 06/25/2015 08:13 AM, Amos Jeffries wrote:
 Which is why I want to go the route of HTTP/0.9 handling. Its
 clear when products encounter it and cause themselves problems.
 
 Sigh. You are repeating essentially the same argument as before. Any
 let's create problems for something that appears to work without Squid
 approach often results in *us* wasting time on workarounds for those
 problems.
 
 You may be convinced that creating short-term problems for others will
 make the world a better place long-term, but there is no way to prove or
 disprove that assumption, and there is no objective way to even compare
 the possible long-term gain with the likely short-term pain (especially
 when it is somebody else who is hurting).
 
 The history is full of examples where the future happiness justifies
 short-term suffering approach either works great or fails miserably. I
 doubt we can come up with a formula that predicts the outcome.
 
 
 Was the '|' character
 non-compliance bug reported against the product?
 
 Unknown to me, but I suspect folks suffering from this do not report
 bugs to companies they have no relationship with (and may not even know
 what the product is!).
 
 
 Anyhow. I've thrown the URI you mentioned through my test copy of trunk
 and its not hanging looking for the end of mime headers like you said.
 Its waiting for the end of the URI:
 
 parseHttpRequest: Incomplete request, waiting for end of request line
 
 Please forgive me for misinterpreting the Squid log line that lies about
 an Incomplete request and waiting for end of request line.

No lie there. It hasn't got near starting to search for URL yet. When
relaxed parser is enabled (ie by default) it has to find the end of the
request-line first and work backwards. It found characters that are
invalid anywhere in the HTTP first-line / request-line before it found
the LF. It just happens that they are in the URI portion of the
request-line.
 It would report the same thing on receiving GET / HTTP/1.0|\r\n

 
 If you prefer the waiting for the end of the URI description of the
 broken state (where the complete URI, the complete request line, and the
 complete request are all available), I am not going to argue that it is
 also inaccurate.
 
 
 Attached is a patch that replaces the hang behaviour with an
 invalid-request error page on standards compliant builds. But, when
 --enable-http-violations is used it accepts the invalid characters we
 know are in use and passes the request through as HTTP/0.9.
 
 
 ... which does not work because the request does not actually use the
 HTTP/0.9 format. You end up with:
 
 
 Forwarding client request ... 
 url=http://localhost:8080/path|with|unwise|charactersHTTP/1.1
 
 GET /path|with|unwise|charactersHTTP/1.1 HTTP/1.1
 Via: 0.9 protofroot (squid/4.0.0-BZR)
 
 
 The trailing  HTTP/1.1 part of the request line is not a part of the
 URI but your changes make it so.

Sorry. Attached is a patch which fixes that and resolves accel/intercept
issues with the mime block as well.



 
 Tokenizer cannot handle URIs with whitespaces directly, like your patch
 attempts to do: Tokenizer alone cannot handle ambiguous grammars. To
 handle such URIs well, you have two options IMO:
 
 A. The old do it by hand way:
 
   0. Trim request line to remove trailing whitespace and CR*LF.
   1. Find the *last* whitespace on the trimmed request line.
   2. To get the request method and URI, apply the tokenizer to the
  request line prefix before that last whitespace.
   3. To get the protocol parts, apply the tokenizer to the
  request line suffix after that last whitespace.
   4. Make the code even messier to handle HTTP/0.9 cases if needed.
 
 
 B. The new optimized and simplified optimistic Tokenizer way:
 
 Here is a sketch:
 
 SBuf uri; // accumulates request URI characters
 if (!tok.prefix(uri, StrictUriChars))
 return false; // URI does not start with a valid character
 
 // in the order of correctness (and popularity!):
 const bool parsedSuffix =
 // canonical HTTP/1 format
 parseCanonicalReqLineTail(uri, tok) ||
 // HTTP/1 format but with bad characters in URI
 parseBadUriReqLineTail(uri, tok) ||
 // HTTP/1 format but with bad characters and spaces in URI
 parseSpacedReqLineTail(uri, tok) ||
 // HTTP/0 format (without the protocol part at all)
 parseHttpZeroReqLineTail(uri, tok);
 
 Assert(parsedSuffix); // parseHttpZeroReqLineTail() cannot fail
 ...

C. locate LF character and parse in reverse. HTTP-Version label, WSP
then remainder is URI.


(B) is Interesting alternative to what we have now (C). Although you are
missing the relaxed vs strict characters which the RFC mandates (for
both URI and whitspace separately permutating). That complicates things
a bit.

If I can find some time I'll give this (B) style a bit of testing and
see how it matches up against the 

Re: [squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

2015-06-25 Thread Alex Rousskov
On 06/25/2015 08:13 AM, Amos Jeffries wrote:
 Which is why I want to go the route of HTTP/0.9 handling. Its
 clear when products encounter it and cause themselves problems.

Sigh. You are repeating essentially the same argument as before. Any
let's create problems for something that appears to work without Squid
approach often results in *us* wasting time on workarounds for those
problems.

You may be convinced that creating short-term problems for others will
make the world a better place long-term, but there is no way to prove or
disprove that assumption, and there is no objective way to even compare
the possible long-term gain with the likely short-term pain (especially
when it is somebody else who is hurting).

The history is full of examples where the future happiness justifies
short-term suffering approach either works great or fails miserably. I
doubt we can come up with a formula that predicts the outcome.


 Was the '|' character
 non-compliance bug reported against the product?

Unknown to me, but I suspect folks suffering from this do not report
bugs to companies they have no relationship with (and may not even know
what the product is!).


 Anyhow. I've thrown the URI you mentioned through my test copy of trunk
 and its not hanging looking for the end of mime headers like you said.
 Its waiting for the end of the URI:

 parseHttpRequest: Incomplete request, waiting for end of request line

Please forgive me for misinterpreting the Squid log line that lies about
an Incomplete request and waiting for end of request line.

If you prefer the waiting for the end of the URI description of the
broken state (where the complete URI, the complete request line, and the
complete request are all available), I am not going to argue that it is
also inaccurate.


 Attached is a patch that replaces the hang behaviour with an
 invalid-request error page on standards compliant builds. But, when
 --enable-http-violations is used it accepts the invalid characters we
 know are in use and passes the request through as HTTP/0.9.


... which does not work because the request does not actually use the
HTTP/0.9 format. You end up with:


 Forwarding client request ... 
 url=http://localhost:8080/path|with|unwise|charactersHTTP/1.1

 GET /path|with|unwise|charactersHTTP/1.1 HTTP/1.1
 Via: 0.9 protofroot (squid/4.0.0-BZR)


The trailing  HTTP/1.1 part of the request line is not a part of the
URI but your changes make it so.

Tokenizer cannot handle URIs with whitespaces directly, like your patch
attempts to do: Tokenizer alone cannot handle ambiguous grammars. To
handle such URIs well, you have two options IMO:

A. The old do it by hand way:

  0. Trim request line to remove trailing whitespace and CR*LF.
  1. Find the *last* whitespace on the trimmed request line.
  2. To get the request method and URI, apply the tokenizer to the
 request line prefix before that last whitespace.
  3. To get the protocol parts, apply the tokenizer to the
 request line suffix after that last whitespace.
  4. Make the code even messier to handle HTTP/0.9 cases if needed.


B. The new optimized and simplified optimistic Tokenizer way:

Here is a sketch:

SBuf uri; // accumulates request URI characters
if (!tok.prefix(uri, StrictUriChars))
return false; // URI does not start with a valid character

// in the order of correctness (and popularity!):
const bool parsedSuffix =
// canonical HTTP/1 format
parseCanonicalReqLineTail(uri, tok) ||
// HTTP/1 format but with bad characters in URI
parseBadUriReqLineTail(uri, tok) ||
// HTTP/1 format but with bad characters and spaces in URI
parseSpacedReqLineTail(uri, tok) ||
// HTTP/0 format (without the protocol part at all)
parseHttpZeroReqLineTail(uri, tok);

Assert(parsedSuffix); // parseHttpZeroReqLineTail() cannot fail
...

where, for example, parseCanonicalReqLineTail() expects whitespace
followed by protocol info while parseBadUriReqLineTail() expects a URI
suffix (using RelaxedUriChars which do not include whitepsace) followed
by a call to parseCanonicalReqLineTail().

If any of the helper functions returns false, it should not change its
uri and tok parameters, of course.


HTH,

Alex.

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

2015-06-25 Thread Amos Jeffries
On 25/06/2015 4:05 p.m., Alex Rousskov wrote:
 On 06/24/2015 02:24 PM, Kinkie wrote:
 My 2c: I vote for reality; possibly with a shaming announce message;
 
 I would recommend against shaming any particular company unless somebody
 can personally vouch that that specific company is the one to blame. I
 cannot (without a lot more triage work which I have no time for).
 
 The benefits of shaming in this case are tiny, but the danger of hurting
 an innocent person (or even company) is always there.
 

Indeed. Which is why I want to go the route of HTTP/0.9 handling. Its
clear when products encounter it and cause themselves problems. But we
dont need to name and shame any particular source.

pPS. in my testing the URIs you presented as reason for the patch were
accepted by Squid and handled as HTTP/0.9.


 
 I wouldn't even recommend logging the violation: there is nothing the
 average admin can do about it.
 
 
 An admin has reminded me privately that Squid already has a mechanism
 that can be used here if somebody wants to enforce strict parsing [and
 wants to see violations]:
 
 
 NAME: relaxed_header_parser
 In the default on setting Squid accepts certain forms
 of non-compliant HTTP messages where it is unambiguous
 what the sending application intended
 
 If set to warn then a warning will be emitted in cache.log
 each time such HTTP error is encountered.
 
 If set to off then such HTTP errors will cause the request
 or response to be rejected.
 
 The temporary fix can be enhanced to include all unwise URI characters
 if (and only if) relaxed_header_parser is on, and also to warn about
 those characters when relaxed_header_parser is warn.
 

That would be better than always doing it in the strict parsing pathway.
And is a useful temporary workaround for people having this issue you
found since related parse should be on by default IIRC.


 On Wed, Jun 24, 2015 at 10:12 PM, Alex Rousskov wrote:
 On 06/24/2015 05:26 AM, Amos Jeffries wrote:

 On 24/06/2015 5:55 p.m., Alex Rousskov wrote:
 This temporary trunk fix adds support for request URIs containing
 '|' characters. Such URIs are used by popular Amazon product (and
 probably other) sites: /images/I/ID1._RC|ID2.js,ID3.js,ID4.js_.js

 Without this fix, all requests for affected URIs timeout while Squid
 waits for the end of request headers it has already received(*).


 This is not right. Squid should be identifying the message as
 non-HTTP/1.x (which it isn't due to the URI syntax violation) and
 treating it as such.

 I agree that Amazon violates URI syntax. On the other hand, the message
 can be interpreted as HTTP/1.x for all practical purposes AFAICT. If you
 want to implement a different fix, please do so. Meanwhile, folks
 suffering from this serious regression can try the temporary fix I posted.

SMTP messages can be interpreted as HTTP/1.1 for all intents and
purposes as well. That does not make it a good idea to do so.

I don't care who it is sending the broken messages (AIUI, its not the
company anyway, its the particular UA software you found). They need to
stop getting away with it and the best incentive is bad behaviour
leading to bad performance. Loosing the benefits of HTTP/1.1 does that
cleanly.


NP: We have a track record of several vendors who fixed their products
after this type of bug was reported to them.  Was the '|' character
non-compliance bug reported against the product? They are probably able
to roll out the proper fix faster than we can anyway.


Anyhow. I've thrown the URI you mentioned through my test copy of trunk
and its not hanging looking for the end of mime headers like you said.
Its waiting for the end of the URI:

RequestParser.cc(390) parse: request-line: method: GET
RequestParser.cc(391) parse: request-line: url:
RequestParser.cc(392) parse: request-line: proto: NONE/0.0
RequestParser.cc(393) parse: Parser: bytes processed=4
SBuf.cc(149) assign: assigning SBuf1845 from SBuf1850
parseHttpRequest: Incomplete request, waiting for end of request line


Attached is a patch that replaces the hang behaviour with an
invalid-request error page on standards compliant builds. But, when
--enable-http-violations is used it accepts the invalid characters we
know are in use and passes the request through as HTTP/0.9.



 The proper long-term fix is to allow any character in URI as long as we
 can reliably parse the request line (and, later, URI components). There
 is no point in hurting users by rejecting requests while slowly
 accumulating the list of benign characters used by web sites but
 prohibited by some RFC.

 The *proper* long term fix is to obey the standards in regard to message
 syntax so applications stop using these invalid (when un-encoded)
 characters and claiming HTTP/1.1 support.

 We had standards vs reality and policing traffic discussions several
 times in the past, with no signs of convergence towards a single
 approach, so I am not going to revisit that discussion now. We 

Re: [squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

2015-06-25 Thread Eliezer Croitoru

About the shaming idea:
I do not like in any way and time to write bad things on anyone.
To make sure there is a bug and to make sure where the bug is from I 
would try to first understand the bug and maybe someone would be able to 
make this world a better place for programs and programmeres.


I will not run to triage this issue for now but the last idea of amos 
seems to me pretty reasonable(reading the description and not the code).


Eliezer

On 24/06/2015 08:55, Alex Rousskov wrote:

Hello,

 This temporary trunk fix adds support for request URIs containing
'|' characters. Such URIs are used by popular Amazon product (and
probably other) sites: /images/I/ID1._RC|ID2.js,ID3.js,ID4.js_.js

Without this fix, all requests for affected URIs timeout while Squid
waits for the end of request headers it has already received(*).


The proper long-term fix is to allow any character in URI as long as we
can reliably parse the request line (and, later, URI components). There
is no point in hurting users by rejecting requests while slowly
accumulating the list of benign characters used by web sites but
prohibited by some RFC.


HTH,

Alex.
P.S. (*) which is probably another parsing regression bug (not addressed
by this temporary patch or the long-term fix discussed above), but I
have not tested whether the very latest trunk still suffers from this
other bug.



___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev




___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

2015-06-24 Thread Amos Jeffries
On 24/06/2015 5:55 p.m., Alex Rousskov wrote:
 Hello,
 
 This temporary trunk fix adds support for request URIs containing
 '|' characters. Such URIs are used by popular Amazon product (and
 probably other) sites: /images/I/ID1._RC|ID2.js,ID3.js,ID4.js_.js
 
 Without this fix, all requests for affected URIs timeout while Squid
 waits for the end of request headers it has already received(*).

This is not right. Squid should be identifying the message as
non-HTTP/1.x (which it isn't due to the URI syntax violation) and
treating it as such.

There is no reason for HTTP/0.9 traffic to hang (thus the bug
agreement). Squid should read in the first line, find a reply for the
URI and close the connection once that reply is delivered. Note how
thare are no mime headers involved to be waited on. I agree there is a
bug there if its hanging.

NP: it means broken syntax does work - but gets a slow response with
no HTTP feature benefits. That slow vs fast is the designed incentive
for such softwares authors to fix things.

 
 The proper long-term fix is to allow any character in URI as long as we
 can reliably parse the request line (and, later, URI components). There
 is no point in hurting users by rejecting requests while slowly
 accumulating the list of benign characters used by web sites but
 prohibited by some RFC.

The *proper* long term fix is to obey the standards in regard to message
syntax so applications stop using these invalid (when un-encoded)
characters and claiming HTTP/1.1 support.

Amos

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

2015-06-24 Thread Kinkie
My 2c: I vote for reality; possibly with a shaming announce message; I
wouldn't even recommend logging the violation: there is nothing the
average admin can do about it. I would consider adding a shaming
comment in the release notes.

On Wed, Jun 24, 2015 at 10:12 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 06/24/2015 05:26 AM, Amos Jeffries wrote:

 On 24/06/2015 5:55 p.m., Alex Rousskov wrote:
 This temporary trunk fix adds support for request URIs containing
 '|' characters. Such URIs are used by popular Amazon product (and
 probably other) sites: /images/I/ID1._RC|ID2.js,ID3.js,ID4.js_.js

 Without this fix, all requests for affected URIs timeout while Squid
 waits for the end of request headers it has already received(*).


 This is not right. Squid should be identifying the message as
 non-HTTP/1.x (which it isn't due to the URI syntax violation) and
 treating it as such.

 I agree that Amazon violates URI syntax. On the other hand, the message
 can be interpreted as HTTP/1.x for all practical purposes AFAICT. If you
 want to implement a different fix, please do so. Meanwhile, folks
 suffering from this serious regression can try the temporary fix I posted.


 The proper long-term fix is to allow any character in URI as long as we
 can reliably parse the request line (and, later, URI components). There
 is no point in hurting users by rejecting requests while slowly
 accumulating the list of benign characters used by web sites but
 prohibited by some RFC.

 The *proper* long term fix is to obey the standards in regard to message
 syntax so applications stop using these invalid (when un-encoded)
 characters and claiming HTTP/1.1 support.

 We had standards vs reality and policing traffic discussions several
 times in the past, with no signs of convergence towards a single
 approach, so I am not going to revisit that discussion now. We continue
 to disagree [while Squid users continue to suffer].


 Thank you,

 Alex.

 ___
 squid-dev mailing list
 squid-dev@lists.squid-cache.org
 http://lists.squid-cache.org/listinfo/squid-dev



-- 
Francesco
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

2015-06-24 Thread Marcus Kool



On 06/24/2015 05:24 PM, Kinkie wrote:

My 2c: I vote for reality; possibly with a shaming announce message; I
wouldn't even recommend logging the violation: there is nothing the
average admin can do about it. I would consider adding a shaming
comment in the release notes.


more cents:

correct.

A standard can be considered a strong guideline but if important sites
violate the standard (i.e. users/admins complain) then Squid
should be able to cope with it or it risks getting abandoned because
Squid cannot cope with traffic of sites that otherwise work without Squid.

For an admin it is irrelevant if the problem is caused by Squid or by
a website.  And the admin who dares to say to its users only visit
sites that comply with the standards probably gets fired.



On Wed, Jun 24, 2015 at 10:12 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:

On 06/24/2015 05:26 AM, Amos Jeffries wrote:


On 24/06/2015 5:55 p.m., Alex Rousskov wrote:

 This temporary trunk fix adds support for request URIs containing
'|' characters. Such URIs are used by popular Amazon product (and
probably other) sites: /images/I/ID1._RC|ID2.js,ID3.js,ID4.js_.js

Without this fix, all requests for affected URIs timeout while Squid
waits for the end of request headers it has already received(*).




This is not right. Squid should be identifying the message as
non-HTTP/1.x (which it isn't due to the URI syntax violation) and
treating it as such.


I agree that Amazon violates URI syntax. On the other hand, the message
can be interpreted as HTTP/1.x for all practical purposes AFAICT. If you
want to implement a different fix, please do so. Meanwhile, folks
suffering from this serious regression can try the temporary fix I posted.



The proper long-term fix is to allow any character in URI as long as we
can reliably parse the request line (and, later, URI components). There
is no point in hurting users by rejecting requests while slowly
accumulating the list of benign characters used by web sites but
prohibited by some RFC.



The *proper* long term fix is to obey the standards in regard to message
syntax so applications stop using these invalid (when un-encoded)
characters and claiming HTTP/1.1 support.


We had standards vs reality and policing traffic discussions several
times in the past, with no signs of convergence towards a single
approach, so I am not going to revisit that discussion now. We continue
to disagree [while Squid users continue to suffer].


Thank you,

Alex.

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev





___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

2015-06-24 Thread Alex Rousskov
On 06/24/2015 02:24 PM, Kinkie wrote:
 My 2c: I vote for reality; possibly with a shaming announce message;

I would recommend against shaming any particular company unless somebody
can personally vouch that that specific company is the one to blame. I
cannot (without a lot more triage work which I have no time for).

The benefits of shaming in this case are tiny, but the danger of hurting
an innocent person (or even company) is always there.


 I wouldn't even recommend logging the violation: there is nothing the
 average admin can do about it.


An admin has reminded me privately that Squid already has a mechanism
that can be used here if somebody wants to enforce strict parsing [and
wants to see violations]:


 NAME: relaxed_header_parser
 In the default on setting Squid accepts certain forms
 of non-compliant HTTP messages where it is unambiguous
 what the sending application intended

 If set to warn then a warning will be emitted in cache.log
 each time such HTTP error is encountered.

 If set to off then such HTTP errors will cause the request
 or response to be rejected.

The temporary fix can be enhanced to include all unwise URI characters
if (and only if) relaxed_header_parser is on, and also to warn about
those characters when relaxed_header_parser is warn.


HTH,

Alex.



 On Wed, Jun 24, 2015 at 10:12 PM, Alex Rousskov
 rouss...@measurement-factory.com wrote:
 On 06/24/2015 05:26 AM, Amos Jeffries wrote:

 On 24/06/2015 5:55 p.m., Alex Rousskov wrote:
 This temporary trunk fix adds support for request URIs containing
 '|' characters. Such URIs are used by popular Amazon product (and
 probably other) sites: /images/I/ID1._RC|ID2.js,ID3.js,ID4.js_.js

 Without this fix, all requests for affected URIs timeout while Squid
 waits for the end of request headers it has already received(*).


 This is not right. Squid should be identifying the message as
 non-HTTP/1.x (which it isn't due to the URI syntax violation) and
 treating it as such.

 I agree that Amazon violates URI syntax. On the other hand, the message
 can be interpreted as HTTP/1.x for all practical purposes AFAICT. If you
 want to implement a different fix, please do so. Meanwhile, folks
 suffering from this serious regression can try the temporary fix I posted.


 The proper long-term fix is to allow any character in URI as long as we
 can reliably parse the request line (and, later, URI components). There
 is no point in hurting users by rejecting requests while slowly
 accumulating the list of benign characters used by web sites but
 prohibited by some RFC.

 The *proper* long term fix is to obey the standards in regard to message
 syntax so applications stop using these invalid (when un-encoded)
 characters and claiming HTTP/1.1 support.

 We had standards vs reality and policing traffic discussions several
 times in the past, with no signs of convergence towards a single
 approach, so I am not going to revisit that discussion now. We continue
 to disagree [while Squid users continue to suffer].


 Thank you,

 Alex.

 ___
 squid-dev mailing list
 squid-dev@lists.squid-cache.org
 http://lists.squid-cache.org/listinfo/squid-dev
 
 
 

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

2015-06-24 Thread Alex Rousskov
On 06/24/2015 05:26 AM, Amos Jeffries wrote:

 On 24/06/2015 5:55 p.m., Alex Rousskov wrote:
 This temporary trunk fix adds support for request URIs containing
 '|' characters. Such URIs are used by popular Amazon product (and
 probably other) sites: /images/I/ID1._RC|ID2.js,ID3.js,ID4.js_.js

 Without this fix, all requests for affected URIs timeout while Squid
 waits for the end of request headers it has already received(*).


 This is not right. Squid should be identifying the message as
 non-HTTP/1.x (which it isn't due to the URI syntax violation) and
 treating it as such.

I agree that Amazon violates URI syntax. On the other hand, the message
can be interpreted as HTTP/1.x for all practical purposes AFAICT. If you
want to implement a different fix, please do so. Meanwhile, folks
suffering from this serious regression can try the temporary fix I posted.


 The proper long-term fix is to allow any character in URI as long as we
 can reliably parse the request line (and, later, URI components). There
 is no point in hurting users by rejecting requests while slowly
 accumulating the list of benign characters used by web sites but
 prohibited by some RFC.

 The *proper* long term fix is to obey the standards in regard to message
 syntax so applications stop using these invalid (when un-encoded)
 characters and claiming HTTP/1.1 support.

We had standards vs reality and policing traffic discussions several
times in the past, with no signs of convergence towards a single
approach, so I am not going to revisit that discussion now. We continue
to disagree [while Squid users continue to suffer].


Thank you,

Alex.

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev