[squid-dev] [PATCH] Better support for unknown URL schemes
Squid already contains AnyP::PROTO_UNKNOWN support for unknown protocols but currently does not preserve the actual string value received for them. This adds a textual representation ('image') to the UriScheme object to fill that gap and ensure that all URL representatinos (ie cache keys, logs and outgoing messages) are generated with the scheme string as it was received. Future work: * add ACL support for arbitrary scheme names * support for comparisons of unknown schemes Amos === modified file 'src/URL.h' --- src/URL.h 2016-02-23 08:51:22 + +++ src/URL.h 2016-03-15 11:03:44 + @@ -43,6 +43,7 @@ /// convert the URL scheme to that given void setScheme(const AnyP::ProtocolType &p) {scheme_=p; touch();} +void setSchemeImage(const char *str) {scheme_.setImage(str); touch();} void userInfo(const SBuf &s) {userInfo_=s; touch();} const SBuf &userInfo() const {return userInfo_;} === modified file 'src/anyp/UriScheme.cc' --- src/anyp/UriScheme.cc 2016-01-01 00:12:18 + +++ src/anyp/UriScheme.cc 2016-03-15 15:10:17 + @@ -14,19 +14,17 @@ char const * AnyP::UriScheme::c_str() const { +if (!schemeImage_.isEmpty()) +return schemeImage_.c_str(); + if (theScheme_ == AnyP::PROTO_UNKNOWN) -return "(unknown)"; - -static char out[BUFSIZ]; -int p = 0; +return "(unknown)"; // should have been set explicitly if (theScheme_ > AnyP::PROTO_NONE && theScheme_ < AnyP::PROTO_MAX) { -const char *in = AnyP::ProtocolType_str[theScheme_]; -for (; p < (BUFSIZ-1) && in[p] != '\0'; ++p) -out[p] = xtolower(in[p]); +schemeImage_ = AnyP::ProtocolType_str[theScheme_]; +schemeImage_.toLower(); } -out[p] = '\0'; -return out; +return schemeImage_.c_str(); } unsigned short === modified file 'src/anyp/UriScheme.h' --- src/anyp/UriScheme.h2016-01-01 00:12:18 + +++ src/anyp/UriScheme.h2016-03-15 11:19:04 + @@ -10,6 +10,7 @@ #define SQUID_ANYP_URISCHEME_H #include "anyp/ProtocolType.h" +#include "sbuf/SBuf.h" #include @@ -27,7 +28,7 @@ ~UriScheme() {} operator AnyP::ProtocolType() const { return theScheme_; } - +// XXX: does not account for comparison of unknown schemes (by image) bool operator != (AnyP::ProtocolType const & aProtocol) const { return theScheme_ != aProtocol; } /** Get a char string representation of the scheme. @@ -41,9 +42,16 @@ unsigned short defaultPort() const; +/// Sets the string representation of this scheme. +/// Only needed if the scheme type is PROTO_UNKNOWN. +void setImage(const char *str) {assert(theScheme_ == AnyP::PROTO_UNKNOWN); schemeImage_ = str;} + private: /// This is a typecode pointer into the enum/registry of protocols handled. AnyP::ProtocolType theScheme_; + +/// the string representation to use for theScheme_ +mutable SBuf schemeImage_; }; } // namespace AnyP === modified file 'src/url.cc' --- src/url.cc 2016-01-01 00:12:18 + +++ src/url.cc 2016-03-15 13:52:00 + @@ -18,6 +18,7 @@ static HttpRequest *urlParseFinish(const HttpRequestMethod& method, const AnyP::ProtocolType protocol, + const char *const protoStr, const char *const urlpath, const char *const host, const SBuf &login, @@ -157,6 +158,9 @@ if (strncasecmp(b, "whois", len) == 0) return AnyP::PROTO_WHOIS; +if (len > 0) +return AnyP::PROTO_UNKNOWN; + return AnyP::PROTO_NONE; } @@ -215,7 +219,7 @@ URL::Asterisk().cmp(url) == 0) { protocol = AnyP::PROTO_HTTP; port = AnyP::UriScheme(protocol).defaultPort(); -return urlParseFinish(method, protocol, url, host, SBuf(), port, request); +return urlParseFinish(method, protocol, proto, url, host, SBuf(), port, request); } else if (!strncmp(url, "urn:", 4)) { return urnParse(method, url, request); } else { @@ -420,7 +424,7 @@ } } -return urlParseFinish(method, protocol, urlpath, host, SBuf(login), port, request); +return urlParseFinish(method, protocol, proto, urlpath, host, SBuf(login), port, request); } /** @@ -431,6 +435,7 @@ static HttpRequest * urlParseFinish(const HttpRequestMethod& method, const AnyP::ProtocolType protocol, + const char *const protoStr, // for unknown protocols const char *const urlpath, const char *const host, const SBuf &login, @@ -443,6 +448,9 @@ request->initHTTP(method, protocol, urlpath); } +if (protocol == AnyP::PROTO_UNKNOWN) +request->url.setSchemeImage(protoStr); + request->url.host(host); request->url.userInfo(login); request->url.port(port); _
Re: [squid-dev] [PATCH] Better support for unknown URL schemes
On 03/15/2016 09:36 AM, Amos Jeffries wrote: > Squid already contains AnyP::PROTO_UNKNOWN support for unknown protocols > but currently does not preserve the actual string value received for them. > > This adds a textual representation ('image') to the UriScheme object to > fill that gap and ensure that all URL representatinos (ie cache keys, > logs and outgoing messages) are generated with the scheme string as it > was received. Preserving textual representation is the right thing to do, but the implementation itself needs a lot more work IMO. > /// convert the URL scheme to that given > void setScheme(const AnyP::ProtocolType &p) {scheme_=p; touch();} > +void setSchemeImage(const char *str) {scheme_.setImage(str); touch();} The parameter type should be changed instead. Yes, that will require more changes, but those changes are essential to properly support foreign protocol names. Without those changes, you are creating an API bug in addition to adding a useful feature. We (mostly you) have already done similar work for foreign HTTP request methods. This is similar -- UriScheme should be used nearly everywhere (instead of ProtocolType that does not carry enough information). It could be argued that the UriScheme class itself should be renamed to something like Protocol, but I do not want to argue about that. > +/// Sets the string representation of this scheme. > +/// Only needed if the scheme type is PROTO_UNKNOWN. > +void setImage(const char *str) {assert(theScheme_ == > AnyP::PROTO_UNKNOWN); schemeImage_ = str;} This method should be a constructor. > char const * > AnyP::UriScheme::c_str() const > { This method should be replaced with AnyP::UriScheme::image() returning SBuf. > +if (!schemeImage_.isEmpty()) > +return schemeImage_.c_str(); > + Why are we not lowering the case of a foreign protocol image if we are lowering the case of standard protocol images below? Either both should be lowered or there should be a source code comment explaining the discrepancy. If both should be lowered, the lowering should be done in the constructor to avoid lowering the same string multiple times. > if (theScheme_ > AnyP::PROTO_NONE && theScheme_ < AnyP::PROTO_MAX) { > -const char *in = AnyP::ProtocolType_str[theScheme_]; > -for (; p < (BUFSIZ-1) && in[p] != '\0'; ++p) > -out[p] = xtolower(in[p]); > +schemeImage_ = AnyP::ProtocolType_str[theScheme_]; > +schemeImage_.toLower(); > } This slow code may not be needed at all if the rest of the changes are implemented. If this code stays, please add an "XXX: Slow." or a similar comment after toLower(). There are many ways to fix/optimize this, but the best way probably depends on the rest of the changes so I am not going to suggest any specific TODO here. > @@ -157,6 +158,9 @@ > if (strncasecmp(b, "whois", len) == 0) > return AnyP::PROTO_WHOIS; > > +if (len > 0) > +return AnyP::PROTO_UNKNOWN; > + > return AnyP::PROTO_NONE; > } This function should return a UriScheme object instead. Ideally, using created-once constants for common protocols (so that we do not have to deal with to-lower-case conversion every time a scheme image is needed). > @@ -27,7 +28,7 @@ > ~UriScheme() {} ... > +/// the string representation to use for theScheme_ > +mutable SBuf schemeImage_; Please do not name Foo members FooX. One Foo is enough! > +/// the string representation to use for theScheme_ s/to use for theScheme_// or s/to use for theScheme_/of the scheme// If you can make such a guarantee after all other changes, please add something like "; always in lower case". > +mutable SBuf schemeImage_; Mutable may not be needed after all other changes. It is difficult for me to review 4-line no-function-names diff, so forgive me if I missed any context-dependent caveats. Thank you, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [PATCH] Better support for unknown URL schemes
On 16/03/2016 5:52 a.m., Alex Rousskov wrote: > On 03/15/2016 09:36 AM, Amos Jeffries wrote: >> Squid already contains AnyP::PROTO_UNKNOWN support for unknown protocols >> but currently does not preserve the actual string value received for them. >> >> This adds a textual representation ('image') to the UriScheme object to >> fill that gap and ensure that all URL representatinos (ie cache keys, >> logs and outgoing messages) are generated with the scheme string as it >> was received. > > Preserving textual representation is the right thing to do, but the > implementation itself needs a lot more work IMO. > > >> /// convert the URL scheme to that given >> void setScheme(const AnyP::ProtocolType &p) {scheme_=p; touch();} >> +void setSchemeImage(const char *str) {scheme_.setImage(str); touch();} > > > The parameter type should be changed instead. Yes, that will require > more changes, but those changes are essential to properly support > foreign protocol names. Without those changes, you are creating an API > bug in addition to adding a useful feature. > > We (mostly you) have already done similar work for foreign HTTP request > methods. This is similar -- UriScheme should be used nearly everywhere > (instead of ProtocolType that does not carry enough information). It > could be argued that the UriScheme class itself should be renamed to > something like Protocol, but I do not want to argue about that. > There are two distinct layers involved. I'm sure you know this, but I'll reiterate for the sake of clarity. - ProtocolType is the transfer or transport protocol type. - UriScheme is the URL/URI/URN scheme label. They have been conflated in the past and are still in the difficult process of de-tangling. The URI scheme implies a protocol, but is not one in and of itself. For example HTTP ProtocolType can be used to transfer ftp:// scheme URLs, etc. Untangling them is happening, but not part of the scope of this patch except to the point that the URL image() is being made to no longer generate directly from the ProtocolType value in most cases. > >> +/// Sets the string representation of this scheme. >> +/// Only needed if the scheme type is PROTO_UNKNOWN. >> +void setImage(const char *str) {assert(theScheme_ == >> AnyP::PROTO_UNKNOWN); schemeImage_ = str;} > > This method should be a constructor. Done. Note that this has side effects on HttpRequest construction and requires move semantics to be added to UriScheme. Not a problem exactly, but scope creep. > >> char const * >> AnyP::UriScheme::c_str() const >> { > > This method should be replaced with AnyP::UriScheme::image() returning SBuf. > Done. > >> +if (!schemeImage_.isEmpty()) >> +return schemeImage_.c_str(); >> + > > Why are we not lowering the case of a foreign protocol image if we are > lowering the case of standard protocol images below? Either both should > be lowered or there should be a source code comment explaining the > discrepancy. Transfer Protocol type and Scheme labels are both case-sensitive (and differently cased). So when using an externally supplied scheme label we must take what was given. But, since our registered ProtocolType and its ProtocolType_str is actually the message transfer protocol label (upper case) we need to lower the case of ProtocolType_str when using it as implicit URI scheme for one of the registered/known scheme names. > > If both should be lowered, the lowering should be done in the > constructor to avoid lowering the same string multiple times. > Trunk is what lowers the string multiple times (on each display of the URL, including debugs). These patches make that happen only once and stores the lowered result in the scheme image_ to re-use it for any followup calls after the initial down-case. Initial patch did it on first display. After the ctor changes it now happens one on construction regardless of whether display ever happens. > >> if (theScheme_ > AnyP::PROTO_NONE && theScheme_ < AnyP::PROTO_MAX) { >> -const char *in = AnyP::ProtocolType_str[theScheme_]; >> -for (; p < (BUFSIZ-1) && in[p] != '\0'; ++p) >> -out[p] = xtolower(in[p]); >> +schemeImage_ = AnyP::ProtocolType_str[theScheme_]; >> +schemeImage_.toLower(); >> } > > This slow code may not be needed at all if the rest of the changes are > implemented. If this code stays, please add an "XXX: Slow." or a similar > comment after toLower(). There are many ways to fix/optimize this, but > the best way probably depends on the rest of the changes so I am not > going to suggest any specific TODO here. AFAICT the best optimization we can do is just take the image string from the urlParse logic. Which is what this patch is doing now after the ctor changes. The slow logic path is now only done for URI which are generated internally by Squid from just a ProtocolType enum. > >> @@ -157,6 +158,9 @@ >> if (strncasecmp(b, "whois", le
Re: [squid-dev] [PATCH] Better support for unknown URL schemes
Applied as trunk rev.14802. Amos ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev