Dear all,

since urlencode with its hex escapes is as well used as encoding technique for cookies, the definitions of the RFC 3986 and the HTML 4.01 recommendation (for the query-part) are not sufficient. The encoding of the cookies is surprisingly unclear defined (see [1]). All definitions require that the characters "," and ";" have to be pct-escaped, although these characters do not require encoding in RFC 3986. In order to fix the new code I've added these characters to the query-part encoding and documented this behavior to understand the design choices and the where-abouts of the characters in the the encoding table.

An alternative option would have be to add a separate table for "cookie-encodings" and to specify "cookie" as an additional kind of "-part", especially since the cookie encoding/decoding is performed internally.
-g

[1] http://stackoverflow.com/questions/1969232/allowed-characters-in-cookies

Am 24.03.17 um 18:03 schrieb Gustaf Neumann:
Dear all

I've committed some code concerning the discussed urlencode reform.
The code is now conforming to RFC3986 (2005), and recommendation
of HTML 4.01  (for www-url-encoded).  It spits out warnings when
characters are included  in the "location" header fields misses
encodings (in the obvious cases). Furthermore, it contains a new
flag "-uppercase" for OAuth ... and is able to decode upper
case hex codes as well.  The new code is already running
now on OpenACS.org to see if i missed  something obvious.

I've not yet updated the documentation and the regression test.
... and - not to forget - i  am planning to raise exceptions,
when invalid hex codes are trying to be decoded.

-gn

PS: Below the more detailed commit messages:

- The new code conforms to RFC3986 (2005), which has a more precise
  and differently structured definition (e.g. no 'unwise' characters)
  of characters encoded in URLs. The coding of the query part is
  actually defined in the HTML 4.01 definitions.

- Coding of space (" ") in the path and query part is still different
  as in older versions of NaviServer (spaces are coded as "*" in the
  query part and "%20" in the path segments. This distinction is not
  not necessary according to the RFCs.

- The coding tables are documented in detail, containing design
  considerations.

- A new flag "-uppercase" added for supporting encoding for OAuth
  (RFC 5849); note that the "path" segment encoding has to be used to
  avoid coding space as "+".

- A warning is produced now, when an URL is passed to the location
  field (e.g. ns_returnredirect), which is not properly encoded.
  Only characters, which have to be always coded, are flagged.

- One can obtain the previous encoding behavior when compiling with
  "RFC1738" defined.


Am 23.03.17 um 09:00 schrieb Michael Aram:
Dear all,

I agree with Wolfgang that this should generally be taken care of by the app developer. However, as there is probably much legacy code out there that passes the URL unencoded to ns_returnredirect (at least in our OpenACS installations there are many such places), I would opt for a solution where this can be parameterized via a boolean parameter of the function (b). Ideally, the default of this parameter could be changed via the NaviServer config file (setting the default behavior to e.g. "encode" instead of the current "leave untouched"). If this is considered to much effort, at least a warning would be very useful.

As a side note: if the ns_urlencode code is touched, I would suggest to add a parameter that allows for choosing the case of the encoded characters. The background for this is that OAuth 1 strictly requires the percent-encoded characters to be uppercase only, as these become part of the signature base string (without this restriction, two signatures could differ only because of different encoding implementations at the endpoints). See https://tools.ietf.org/html/rfc5849 chapter 3.6.

All the best,
Michael



On Thu, Mar 23, 2017 at 7:49 AM <naviserver-devel-requ...@lists.sourceforge.net <mailto:naviserver-devel-requ...@lists.sourceforge.net>> wrote:

    Send naviserver-devel mailing list submissions to
    naviserver-devel@lists.sourceforge.net
    <mailto:naviserver-devel@lists.sourceforge.net>

    To subscribe or unsubscribe via the World Wide Web, visit
    https://lists.sourceforge.net/lists/listinfo/naviserver-devel
    or, via email, send a message with subject or body 'help' to
    naviserver-devel-requ...@lists.sourceforge.net
    <mailto:naviserver-devel-requ...@lists.sourceforge.net>

    You can reach the person managing the list at
    naviserver-devel-ow...@lists.sourceforge.net
    <mailto:naviserver-devel-ow...@lists.sourceforge.net>

    When replying, please edit your Subject line so it is more specific
    than "Re: Contents of naviserver-devel digest..."


    Today's Topics:

       1. url-encoding and ns_returnredirect, RFC updates (Gustaf
    Neumann)
       2. Re: url-encoding and ns_returnredirect, RFC updates
          (Wolfgang Winkler)


    ----------------------------------------------------------------------

    Message: 1
    Date: Wed, 22 Mar 2017 11:50:49 +0100
    From: Gustaf Neumann <neum...@wu.ac.at <mailto:neum...@wu.ac.at>>
    Subject: [naviserver-devel] url-encoding and ns_returnredirect, RFC
            updates
    To: Navidevel <naviserver-devel@lists.sourceforge.net
    <mailto:naviserver-devel@lists.sourceforge.net>>
    Message-ID: <888d5801-65ab-459b-e97c-6ed133498...@wu.ac.at
    <mailto:888d5801-65ab-459b-e97c-6ed133498...@wu.ac.at>>
    Content-Type: text/plain; charset="utf-8"

    Dear all,

    as it looks, edge is more picky about the encoding of URLs in the
    location: header field (see e.g. recent entry in the OpenACS issue
    tracker [1]). RFC 7231 states [2] that

          Location = URI-reference

    but as well:

           Note: Some recipients attempt to recover from Location
    fields that
           are not valid URI references.  This specification does not
    mandate
           or define such processing, but does allow it for the sake
    of robustness.

    The BNF in [3] clear, that it has to be encoded (see snippet for path
    segments)

           URI-reference = URI / relative-ref
           relative-ref  = relative-part [ "?" query ] [ "#" fragment ]
           relative-part = "//" authority path-abempty
                         / path-absolute
                         / path-noscheme
                         / path-empty

           path-abempty  = *( "/" segment )
           path-absolute = "/" [ segment-nz *( "/" segment ) ]
           path-noscheme = segment-nz-nc *( "/" segment )
           path-rootless = segment-nz *( "/" segment )
           path-empty    = 0<pchar>


           segment       = *pchar
           segment-nz    = 1*pchar
           segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims
    / "@" )
                         ; non-zero-length segment without any colon ":"
           pchar         = unreserved / pct-encoded / sub-delims /
    ":" / "@"


    Naviserver passes the URL as is from e.g. a ns_returnredirect to the
    "Location:" field.

    So the question is, should ns

    a) take care about this encoding b) take care about this encoding via
    optional flag c) do nothing and leave  the responsibility to the
    application programmer (current situation) d) provide a warning
    when an
    "obviously" unencoded url is passed to ns_returnredirect

    I think, (a) is not useful, since ns can't decide from the string,
    whether a "/" in the part is e.g. a delimiter or part of the segment.
    Furthermore, it would break existing programs that encode already the
    urls correctly. (b) might be useful in simple cases.

    I am inclined towards (d), although an exact check for every char
    which
    should have been escaped might be to costly on some characters
    (checking
    if "%" was used just as an escape indicator, etc.); however, an
    application developer can get hints via (d), where the
    url-encoding was
    probably lacking.

    While looking at the nsd/urlencode.c i saw that the encoding is more
    conservative than commented (.... "All ASCII control characters
    (00-1f
    and 7f) and the URI 'delim'  and 'unwise' characters are encoded"
    ...),
    but it encodes as well the characters from 0x80 to 0xff. Do I
    interprete
    this correctly, that this refers to the differences/confusions
    between
    RFC1738 (1994) and RFC1808 (1995) vs. RFC2396 (1998), see [5].
    The code
    says, it conforms with RFC1738, so probably an update to at least
    RFC2396 seems appropriate.

    Comments?

    -g

    [1] http://openacs.org/bugtracker/openacs/bug?bug_number=3312 [2]
    https://tools.ietf.org/html/rfc7231#page-68 [3]
    https://tools.ietf.org/html/rfc3986#appendix-A [4]
    https://tools.ietf.org/html/rfc2396 [5]
    https://tools.ietf.org/html/rfc2396#appendix-G.2

    -------------- next part --------------
    An HTML attachment was scrubbed...

    ------------------------------

    Message: 2
    Date: Thu, 23 Mar 2017 07:22:54 +0100
    From: Wolfgang Winkler <wolfgang.wink...@digital-concepts.com
    <mailto:wolfgang.wink...@digital-concepts.com>>
    Subject: Re: [naviserver-devel] url-encoding and ns_returnredirect,
            RFC updates
    To: naviserver-devel@lists.sourceforge.net
    <mailto:naviserver-devel@lists.sourceforge.net>
    Message-ID:
<1294b10d-8ebd-18e0-bfb4-5c3c685c5...@digital-concepts.com
    <mailto:1294b10d-8ebd-18e0-bfb4-5c3c685c5...@digital-concepts.com>>
    Content-Type: text/plain; charset="windows-1252"

    Hi!

    I opt for version c (programmers responsibility) as url encoding
    can be
    tricky stuff if you don't know the context of the passed in url.
    A flag
    or maybe an extra proc for checking an url for problems could be
    useful,
    but that's something that can be done easily with ns_urlencode and
    ns_urldecode.

    Updating to RFC2396 would be most welcome. As stated in the RFC:

        This document
        defines the generic syntax of URI, including both absolute and
        relative forms, and guidelines for their use; it revises and
    replaces
        the generic definitions in RFC 1738 and RFC 1808.

    and

        This document updates and merges "Uniform Resource Locators"
        [RFC1738] and "Relative Uniform Resource Locators" [RFC1808]
    in order
        to define a single, generic syntax for all URI.

    To me it seems, that the RFC1738 has been, at least in parts,
    deprecated.

    regards,

    Wolfgang


    Am 2017-03-22 um 11:50 schrieb Gustaf Neumann:
    >
    > Dear all,
    >
    > as it looks, edge is more picky about the encoding of URLs in the
    > location: header field (see e.g. recent entry in the OpenACS issue
    > tracker [1]). RFC 7231 states [2] that
    >
    >       Location = URI-reference
    >
    > but as well:
    >
    >        Note: Some recipients attempt to recover from Location
    fields that
    >        are not valid URI references.  This specification does
    not mandate
    >        or define such processing, but does allow it for the
    sake of robustness.
    >
    > The BNF in [3] clear, that it has to be encoded (see snippet
    for path
    > segments)
    >
    >        URI-reference = URI / relative-ref
    >        relative-ref  = relative-part [ "?" query ] [ "#" fragment ]
    >        relative-part = "//" authority path-abempty
    >                      / path-absolute
    >                      / path-noscheme
    >                      / path-empty
    >
    >        path-abempty  = *( "/" segment )
    >        path-absolute = "/" [ segment-nz *( "/" segment ) ]
    >        path-noscheme = segment-nz-nc *( "/" segment )
    >        path-rootless = segment-nz *( "/" segment )
    >        path-empty    = 0<pchar>
    >        segment       = *pchar
    >        segment-nz    = 1*pchar
    >        segment-nz-nc = 1*( unreserved / pct-encoded /
    sub-delims / "@" )
    >                      ; non-zero-length segment without any
    colon ":"
    >        pchar         = unreserved / pct-encoded / sub-delims /
    ":" / "@"
    >
    > Naviserver passes the URL as is from e.g. a ns_returnredirect
    to the
    > "Location:" field.
    >
    > So the question is, should ns
    >
    > a) take care about this encoding b) take care about this
    encoding via
    > optional flag c) do nothing and leave  the responsibility to the
    > application programmer (current situation) d) provide a warning
    when
    > an "obviously" unencoded url is passed to ns_returnredirect
    >
    > I think, (a) is not useful, since ns can't decide from the string,
    > whether a "/" in the part is e.g. a delimiter or part of the
    segment.
    > Furthermore, it would break existing programs that encode
    already the
    > urls correctly. (b) might be useful in simple cases.
    >
    > I am inclined towards (d), although an exact check for every char
    > which should have been escaped might be to costly on some
    characters
    > (checking if "%" was used just as an escape indicator, etc.);
    however,
    > an application developer can get hints via (d), where the
    url-encoding
    > was probably lacking.
    >
    > While looking at the nsd/urlencode.c i saw that the encoding is
    more
    > conservative than commented (.... "All ASCII control characters
    (00-1f
    > and 7f) and the URI 'delim'  and 'unwise' characters are encoded"
    > ...), but it encodes as well the characters from 0x80 to 0xff. Do I
    > interprete this correctly, that this refers to the
    > differences/confusions between RFC1738 (1994) and RFC1808
    (1995) vs.
    > RFC2396 (1998), see [5]. The code says, it conforms with
    RFC1738, so
    > probably an update to at least RFC2396 seems appropriate.
    >
    > Comments?
    >
    > -g
    >
    > [1] http://openacs.org/bugtracker/openacs/bug?bug_number=3312 [2]
    > https://tools.ietf.org/html/rfc7231#page-68 [3]
    > https://tools.ietf.org/html/rfc3986#appendix-A [4]
    > https://tools.ietf.org/html/rfc2396 [5]
    > https://tools.ietf.org/html/rfc2396#appendix-G.2

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Reply via email to