[squid-dev] [PATCH] Parser-NG conversion of ICAP pt2

2015-08-23 Thread Amos Jeffries
It turns out that ICAP implements has three distinct protocol parsers.

I begin the ICAP parser conversion to the Parser-NG model with
ModXact::parseHeaders() - which was conflating both ICAP and HTTP, and
the HTTP directional parsers.


* splits the exiting parse method into 3 distinct stages; ICAP-reply,
HTTP-request, HTTP-reply. Each stage is sequential and controlled by the
Encapsulated header contents.

I'm not sure yet if we need to be tolerant of out-of-order segments in
the payload. The spec is pretty clear that order is explicit and
specific. But the old parser actually ignored the Encapsulated header
byte offsets (!!).


* adds "ICAP" / "icap" to the registered protocol types and scheme
names, and associated Icap::ProtocolVersion() infrastructure.

* adds Adaptation::Icap::ResponseParser class extending
Http1::ResponseParser with ICAP related details and first-line parser.

* documented some security and performance improvements that can be made
as a followup in ModXact::parseHeaders()



There is some weird race behaviour I still want to verify if trunk has
too. But have gone with PATCH instead of PREVIEW since this seems like a
good place to pause. Leaving most polish and some major bug fixes to
followups. That includes the other two parsers in adaptation/icap/.

Amos

=== modified file 'src/adaptation/icap/Makefile.am'
--- src/adaptation/icap/Makefile.am 2015-01-13 07:25:36 +
+++ src/adaptation/icap/Makefile.am 2015-03-01 02:01:34 +
@@ -1,36 +1,39 @@
 ## Copyright (C) 1996-2015 The Squid Software Foundation and contributors
 ##
 ## Squid software is distributed under GPLv2+ license and includes
 ## contributions from numerous individuals and organizations.
 ## Please see the COPYING and CONTRIBUTORS files for details.
 ##
 
 include $(top_srcdir)/src/Common.am
 include $(top_srcdir)/src/TestHeaders.am
 
 noinst_LTLIBRARIES = libicap.la
 
 libicap_la_SOURCES = \
Client.cc \
Client.h \
-   InOut.h \
Config.cc \
Config.h \
Elements.cc \
Elements.h \
-   Options.cc \
-   Options.h \
-   ServiceRep.cc \
-   ServiceRep.h \
+   History.cc \
+   History.h \
+   icap_log.cc \
+   icap_log.h \
+   InOut.h \
Launcher.cc \
Launcher.h \
+   ModXact.cc \
+   ModXact.h \
+   Options.cc \
+   Options.h \
OptXact.cc \
OptXact.h \
+   ProtocolVersion.h \
+   ResponseParser.cc \
+   ResponseParser.h \
+   ServiceRep.cc \
+   ServiceRep.h \
Xaction.cc \
-   Xaction.h \
-   ModXact.cc \
-   ModXact.h \
-   icap_log.cc \
-   icap_log.h \
-   History.cc \
-   History.h
+   Xaction.h

=== modified file 'src/adaptation/icap/ModXact.cc'
--- src/adaptation/icap/ModXact.cc  2015-08-04 19:57:07 +
+++ src/adaptation/icap/ModXact.cc  2015-08-23 17:15:21 +
@@ -1,50 +1,52 @@
 /*
  * Copyright (C) 1996-2015 The Squid Software Foundation and contributors
  *
  * Squid software is distributed under GPLv2+ license and includes
  * contributions from numerous individuals and organizations.
  * Please see the COPYING and CONTRIBUTORS files for details.
  */
 
 /* DEBUG: section 93ICAP (RFC 3507) Client */
 
 #include "squid.h"
 #include "AccessLogEntry.h"
 #include "adaptation/Answer.h"
 #include "adaptation/History.h"
 #include "adaptation/icap/Client.h"
 #include "adaptation/icap/Config.h"
 #include "adaptation/icap/History.h"
 #include "adaptation/icap/Launcher.h"
 #include "adaptation/icap/ModXact.h"
+#include "adaptation/icap/ProtocolVersion.h"
 #include "adaptation/icap/ServiceRep.h"
 #include "adaptation/Initiator.h"
 #include "auth/UserRequest.h"
 #include "base/TextException.h"
 #include "base64.h"
 #include "comm.h"
 #include "comm/Connection.h"
 #include "err_detail_type.h"
+#include "http/one/RequestParser.h"
+#include "http/one/ResponseParser.h"
 #include "http/one/TeChunkedParser.h"
 #include "HttpHeaderTools.h"
-#include "HttpMsg.h"
 #include "HttpReply.h"
 #include "HttpRequest.h"
 #include "SquidTime.h"
 #include "URL.h"
 
 // flow and terminology:
 // HTTP| --> receive --> encode --> write --> |network
 // end | <-- send<-- parse  <-- read  <-- |end
 
 // TODO: replace gotEncapsulated() with something faster; we call it often
 
 CBDATA_NAMESPACED_CLASS_INIT(Adaptation::Icap, ModXact);
 CBDATA_NAMESPACED_CLASS_INIT(Adaptation::Icap, ModXactLauncher);
 
 static const size_t TheBackupLimit = BodyPipe::MaxCapacity;
 
 Adaptation::Icap::ModXact::State::State()
 {
 memset(this, 0, sizeof(*this));
 }
@@ -57,44 +59,40 @@
 bodyParser(NULL),
 canStartBypass(false), // too early
 protectGroupBypass(true),
 replyHttpHeaderSize(-1),
 replyHttpBodySize(-1),
 adaptHistoryId(-1),
 alMaster(alp)
 {
 assert(virginHeader);
 
 virgin.setHeader(virginHeader); // sets virgin.body_pipe if needed
 virgin.setCause(virginCause); // may be NULL
 
 // ada

Re: [squid-dev] [PATCH] Parser-NG conversion of ICAP pt2

2015-09-21 Thread Amos Jeffries
On 24/08/2015 8:40 a.m., Amos Jeffries wrote:
> It turns out that ICAP implements has three distinct protocol parsers.
> 
> I begin the ICAP parser conversion to the Parser-NG model with
> ModXact::parseHeaders() - which was conflating both ICAP and HTTP, and
> the HTTP directional parsers.
> 
> 
> * splits the exiting parse method into 3 distinct stages; ICAP-reply,
> HTTP-request, HTTP-reply. Each stage is sequential and controlled by the
> Encapsulated header contents.
> 
> I'm not sure yet if we need to be tolerant of out-of-order segments in
> the payload. The spec is pretty clear that order is explicit and
> specific. But the old parser actually ignored the Encapsulated header
> byte offsets (!!).

FTR: that is 

> 
> 
> * adds "ICAP" / "icap" to the registered protocol types and scheme
> names, and associated Icap::ProtocolVersion() infrastructure.
> 
> * adds Adaptation::Icap::ResponseParser class extending
> Http1::ResponseParser with ICAP related details and first-line parser.
> 
> * documented some security and performance improvements that can be made
> as a followup in ModXact::parseHeaders()
> 
> 
> 
> There is some weird race behaviour I still want to verify if trunk has
> too. But have gone with PATCH instead of PREVIEW since this seems like a
> good place to pause. Leaving most polish and some major bug fixes to
> followups. That includes the other two parsers in adaptation/icap/.
> 

Amos

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Parser-NG conversion of ICAP pt2

2015-09-21 Thread Alex Rousskov
On 09/21/2015 08:46 AM, Amos Jeffries wrote:
> On 24/08/2015 8:40 a.m., Amos Jeffries wrote:
>> It turns out that ICAP implements has three distinct protocol parsers.
>>
>> I begin the ICAP parser conversion to the Parser-NG model with
>> ModXact::parseHeaders() - which was conflating both ICAP and HTTP, and
>> the HTTP directional parsers.
>>
>>
>> * splits the exiting parse method into 3 distinct stages; ICAP-reply,
>> HTTP-request, HTTP-reply. Each stage is sequential and controlled by the
>> Encapsulated header contents.
>>
>> I'm not sure yet if we need to be tolerant of out-of-order segments in
>> the payload. The spec is pretty clear that order is explicit and
>> specific. But the old parser actually ignored the Encapsulated header
>> byte offsets (!!).
> 
> FTR: that is 


We should continue to ignore offsets [by default] IMO. Not doing so is
likely to result in many needless interoperability problems. There are
many poorly written ICAP services out there.

Skipping bogus parts (bug 2480) is fine as a minor interoperability
improvement if it is easy to do so. We do not need to parse offsets to
skip a part.

I would not add support for out-of-order parts unless it does not
complicate the code at all and is explicitly requested.


HTH,

Alex.

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Parser-NG conversion of ICAP pt2

2015-09-21 Thread Amos Jeffries
On 22/09/2015 3:50 a.m., Alex Rousskov wrote:
> On 09/21/2015 08:46 AM, Amos Jeffries wrote:
>> On 24/08/2015 8:40 a.m., Amos Jeffries wrote:
>>> It turns out that ICAP implements has three distinct protocol parsers.
>>>
>>> I begin the ICAP parser conversion to the Parser-NG model with
>>> ModXact::parseHeaders() - which was conflating both ICAP and HTTP, and
>>> the HTTP directional parsers.
>>>
>>>
>>> * splits the exiting parse method into 3 distinct stages; ICAP-reply,
>>> HTTP-request, HTTP-reply. Each stage is sequential and controlled by the
>>> Encapsulated header contents.
>>>
>>> I'm not sure yet if we need to be tolerant of out-of-order segments in
>>> the payload. The spec is pretty clear that order is explicit and
>>> specific. But the old parser actually ignored the Encapsulated header
>>> byte offsets (!!).
>>
>> FTR: that is 
> 
> 
> We should continue to ignore offsets [by default] IMO. Not doing so is
> likely to result in many needless interoperability problems. There are
> many poorly written ICAP services out there.
> 
> Skipping bogus parts (bug 2480) is fine as a minor interoperability
> improvement if it is easy to do so. We do not need to parse offsets to
> skip a part.
> 
> I would not add support for out-of-order parts unless it does not
> complicate the code at all and is explicitly requested.

Okay. Then the patch should be suitable for commit as-is AFAIK. I await
your audit, and/or any testing that can be thrown at it from Factory's
end :-)

Amos

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Parser-NG conversion of ICAP pt2

2015-10-21 Thread Alex Rousskov
On 08/23/2015 02:40 PM, Amos Jeffries wrote:
> It turns out that ICAP implements has three distinct protocol parsers.
> 
> I begin the ICAP parser conversion to the Parser-NG model with
> ModXact::parseHeaders() - which was conflating both ICAP and HTTP, and
> the HTTP directional parsers.
> 
> 
> * splits the exiting parse method into 3 distinct stages; ICAP-reply,
> HTTP-request, HTTP-reply. Each stage is sequential and controlled by the
> Encapsulated header contents.
> 
> I'm not sure yet if we need to be tolerant of out-of-order segments in
> the payload. The spec is pretty clear that order is explicit and
> specific. 

AFAIK, we do not need to be tolerant to out-of-order segments -- there
are no known cases of those. We might want to be tolerant to unexpected
segments (bug 2480).


> But the old parser actually ignored the Encapsulated header
> byte offsets (!!).


The new parsers should continue to ignore them IMO. Those offsets are an
ICAP mis-feature -- a duplication of information that, as most
duplicates, leads to bugs. Ignoring them makes Squid ICAP code more
robust and probably decreases compatibility problems without [known] bad
side effects. We should parse the headers declared by Encapsulated.
Ignore numerical offset values.


> + * ICAP header parse does not share the HTTP segment code, and
> + * Encapsulated: tells us how many bytes and where each payload segment 
> is.
> + * We can pull N bytes into a child SBuf for parsing.
> + *
> + * 1) if there are not enough bytes we need more before even attempting 
> the parse
> + *
> + * 2) after parse we can verify that it consumed all of the child buf.
> + *if there are leftovers ... smuggling attack from the ICAP server?
> + */

This part of the TODO comment should be removed IMO. See my comments
above for the rationale.


> +/* TODO: we do not need to rely on readBuf anymore for the parser logic.

The comment seems to be misplaced which makes it unclear. The
surrounding code does not rely on readBuf parser logic, whatever that
is. Please reword it in a context-specific way or move it to a more
appropriate place.


> +/* Attempt to parse the ICAP message */

s/parse the ICAP message/isolate the ICAP response header/

(if that is what you are doing here; note that there is a "parse
headers" comment further below so you are not parsing headers here)


> +bool parsedOk = hp->parse(readBuf);

If you can make it constant, make it const.


> +// sync the buffers after parsing.
> +readBuf = hp->remaining();

Too early? The "unrecoverable parsing error" code below should show what
we failed to parse, not what remained after we failed to parse.


> +/* We know the whole response is in parser now */

s/response/response header/


> +// XXX: performance regression. SBuf::c_str() reallocates
> +SBuf tmpPhrase(hp->reasonPhrase());
> +icapReply->sline.set(hp->messageProtocol(), hp->messageStatus(), 
> tmpPhrase.c_str());

Do we need to introduce a regression here? How about using a few
hard-coded reasons for supported codes and a catch-all for all others?


Many of the above comments apply to the other parsing methods you have
created: parseHttpRequestHead and parseHttpRequestHead. Please check
those as well.


> +// const_cast is okay, the buffer area behind the c_str will not be 
> used again by this xaction
> +// and that will only change when urlParse() starts taking the 
> requestUri() SBuf directly

This violates, and I quote, "DO NOT EVER USE THE RETURNED POINTER FOR
WRITING" c_str() API. Explaining why that API violation may be safe
today is better than nothing, but still deserves a big fat XXX.

Even if the above assumption is accurate today, a code change in a
seemingly unrelated place may make it false, leading to hard-to-find
bugs. All it takes is for some code to copy the underlying SBuf
somewhere... If you want to improve this, please contact Christos: IIRC,
we have seen a similar problem elsewhere in the code and ended up adding
a new SBuf method to combat it. Christos should have the details.



> +state.parsing = State::psHttpRequestHeader; // 'reqhdr' segment maybe 
> first

To clarify, I would replace that comment with something like

// some buggy ICAP servers send HTTP request headers even when
// we expect only HTTP response headers (e.g., during RESPMOD)
state.parsing = State::psHttpRequestHeader


Alternatively, you can set state.parsing based on the Encapsulated
header. This will be more complex but a bit faster because you will
avoid calling parseHttpRequestHead() for 99.% of RESPMOD
responses. That method has a non-trivial initial overhead.


> +// parse headers
> +adapted.header->pstate = psReadyToParseHeaders;
> +
> Must(adapted.header->httpMsgParseStep(httpReqParser->mimeHeader().rawContent(),
>  httpReqParser->mimeHeader().length(), true) >= 0);
> +
> +setOutcome(xoModified);

If I