On Mon, Sep 12, 2016 at 3:06 PM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
> On Mon, Sep 12, 2016 at 10:49 AM, William A Rowe Jr <wr...@rowe-clan.net> > wrote: > >> On Mon, Aug 29, 2016 at 1:04 PM, Ruediger Pluem <rpl...@apache.org> >> wrote: >> >>> >>> On 08/29/2016 06:25 PM, William A Rowe Jr wrote: >>> > Thanks all for the feedback. Status and follow-up questions inline >>> > >>> > On Thu, Aug 25, 2016 at 10:02 PM, William A Rowe Jr < >>> wr...@rowe-clan.net <mailto:wr...@rowe-clan.net>> wrote: >>> > >>> > 4. Should the next 2.4/2.2 releases default to Strict[URI] at all? >>> > >>> > Real world direct observation especially appreciated from actual >>> deployments. >>> > >>> > Strict (and StrictURI) remain the default. >>> >>> StrictURI as a default only makes sense if we have our own house in >>> order (see above), otherwise it should be opt in. >> >> >> So it's not only our house [our %3B encoding in httpd isn't a showstopper >> here]... but also whether widely used user-agent browsers and tooling >> have >> their houses in order, so I started to study the current browser >> behaviors. >> The applicable spec is https://tools.ietf.org/html/rfc3986#section-3.3 >> > > The character '|' is also invalid. However, Firefox fails to follow the >> spec >> again here (although Chrome gets it right). >> >> With respect to these characters, recall this 18 year old document, >> last paragraph describes the rational; >> https://tools.ietf.org/html/rfc2396.html#section-2.4.3 >> >> unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`" >> >> Data corresponding to excluded characters must be escaped in order to >> be properly represented within a URI. >> >> >> While it was labeled 'unsafe', 'unwise', and now disallowed-by-omission >> from RFC3986, the 'must' designation couldn't have been any clearer. >> We've had this right for 2 decades at httpd. >> >> Second paragraph of https://tools.ietf.org/html/rfc3986#appendix-D.1 >> goes into some detail about this change, and while it is hard to parse, >> the paragraph is stating that '[' ']' were once invalid, now are reserved, >> and remain disallowed in all other path segments and use cases. >> >> The upshot, right now StrictURI will accept '[' and ']', but this won't >> survive >> a rewrite of the apr parser operating with a 'strict' toggle. StrictURI does >> not accept '|'. The remaining question is what to do, if anything, about >> carving a specific exception here due to modern Firefox issues. >> >> Thoughts/Comments/Additional test data? TIA! >> >> It really seems that if a major client is not handling "|" correctly, we need to carve out an exception, as well as disallow the "#" fragment gen-delim which is not allowed to be presented. e.g.; --- server/gen_test_char.c (revision 1760444) +++ server/gen_test_char.c (working copy) @@ -143,10 +143,11 @@ * and unreserved (2.3) that are possible somewhere within a URI. * Spec requires all others to be %XX encoded, including obs-text. */ - if (c && (strchr("%" /* pct-encode */ - ":/?#[]@" /* gen-delims */ - "!$&'()*+,;=" /* sub-delims */ - "-._~", c) || apr_isalnum(c))) { /* unreserved */ + if (c && (strchr("%" /* pct-encode */ + ":/?[]@" /* gen-delims - "#" */ + "!$&'()*+,;=" /* sub-delims */ + "-._~" /* unreserved */ + "|", c) || apr_isalnum(c))) { /* permit firefox bug */ flags |= T_URI_RFC3986; } so my only remaining question is what of the others in the not-mentioned, entirely invalid set? <"> | "<" | ">" | "\" | "^" | "`" | "{" | "}" ... so far the modern browsers reviewed handle these correctly, but if anyone has old browsers still installed for testing/validation, double checking the test queries would be a big help still, as well as confirming on Safari, Dolphin, etc. Are we ok with adding one invalid exception for firefox to StrictURI (and later, two more "[" "]" when we code segment-by-segment validation into apr) while still disallowing the rest of this list?