#36131: URLValidator not correctly validating URLs
-------------------------------+----------------------------------------
     Reporter:  Ludwig Kraatz  |                     Type:  Bug
       Status:  new            |                Component:  Core (Other)
      Version:  5.1            |                 Severity:  Normal
     Keywords:  URL Validator  |             Triage Stage:  Unreviewed
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  1              |                    UI/UX:  0
-------------------------------+----------------------------------------
 == Abstract

 An URL is a way of describing a Resource.

 https://resource  -> is a valid URL.

 == Why do i raise this as issue

 An URL resource-descriptor is constructed like that [RFC 3986#section-3]:

 {{{
          foo://example.com:8042/over/there?name=ferret#nose
          \_/   \______________/\_________/ \_________/ \__/
           |           |            |            |        |
        scheme     authority       path        query   fragment
 }}}

 so: scheme, authority, rest...

 The issue in djangos URLValidation I want to address, is a over-
 specification and 'selective circumvention of wrongful parsing' when it
 comes to the -host- compnent of the authority part.

 What djangos URLValidator currently does:
 host_re = "( FQDN-REGEX | localhost )"

 Basically, django parses IP-OR-FQDN-OR-LOCALHOST-URLs.

 This is basically the 'selective circumvention of wrongful parsing' i
 mentioned earlier. By ”| localhost" the URL field "feels" more okay,
 because all the obvious URLs on localhost that exist, now pass. But there
 is so much more than "localhost" besides FQDN as used for "(global) DNS
 URLs".

 The RFC also acknowledges this. It is recommending using a syntax for
 hosts that conforms to the DNS syntax.

 [https://datatracker.ietf.org/doc/html/rfc3986#section-3.2.2]
 {{{
 A host identified by a registered name is a sequence of characters
    usually intended for lookup within a locally defined host or service
    name registry, though the URI's scheme-specific semantics may require
    that a specific registry (or fixed name table) be used instead.  The
    most common name registry mechanism is the Domain Name System (DNS).
    A registered name intended for lookup in the DNS uses the syntax

    defined in Section 3.5 of [RFC1034] and Section 2.1 of [RFC1123].
    Such a name consists of a sequence of domain labels separated by ".",
    each domain label starting and ending with an alphanumeric character
    and possibly also containing "-" characters.  The rightmost domain
    label of a fully qualified domain name in DNS may be followed by a
    single "." and should be if it is necessary to distinguish between
    the complete domain name and some local domain.

       reg-name    = *( unreserved / pct-encoded / sub-delims )

    If the URI scheme defines a default for host, then that default
    applies when the host subcomponent is undefined or when the
    registered name is empty (zero length).  For example, the "file" URI
    scheme is defined so that no authority, an empty host, and
    "localhost" all mean the end-user's machine, whereas the "http"
    scheme considers a missing authority or empty host invalid.

    This specification does not mandate a particular registered name
    lookup technology and therefore does not restrict the syntax of reg-
    name beyond what is necessary for interoperability.  Instead, it
    delegates the issue of registered name syntax conformance to the
    operating system of each application performing URI resolution, and
    that operating system decides what it will allow for the purpose of
    host identification.  A URI resolution implementation might use DNS,
    host tables, yellow pages, NetInfo, WINS, or any other system for
    lookup of registered names.  However, a globally scoped naming
    system, such as DNS fully qualified domain names, is necessary for
    URIs intended to have global scope.  URI producers should use names
    that conform to the DNS syntax, even when use of DNS is not
    immediately apparent, and should limit these names to no more than
    255 characters in length.
 }}}
 What is said in many ways:
 - local host resolution is completely okay.
 - no "." is required // as, a sequence (which is not further specified to
 length restrictions) can consist of 1, which would lack a "." seperator
 - host names that are -compatible-, are valid.


 [RFC 6762 Multicast DNS # Section 3]
 {{{
 It is unimportant whether a name ending with ".local." occurred
    because the user explicitly typed in a fully qualified domain name
    ending in ".local.", or because the user entered an unqualified
    domain name and the host software appended the suffix ".local."
    because that suffix appears in the user's search list.
 }}}
 It is stated clearly, that a user can describe a resource with the
 implication, that if its not a fully qualified domain name, the TLD .local
 is to be assumed. As such - the URL, which is what the user would be
 referencing, was to be able to deal with more non-FQDN than just
 "localhost". This is in the context of Multicast DNS, which seems more
 than close enough to be considered relevant, when talking about URLs - as
 the URL RFC was so closely described around DNS.

 [RFC 3986 URI/URL # 1.1]
 {{{
  URIs that
    identify in relation to the end-user's local context should only be
    used when the context itself is a defining aspect of the resource,
    such as when an on-line help manual refers to a file on the end-
    user's file system (e.g., "file:///etc/hosts").
 }}}
 - clearly states, that URI's are valid, even if they clearly only 'make
 sense' in a end-users local context.
 As such - restricting django URLs to only Fully Qualified Domain
 Names/IPs, (except localhost.. for whatever reason except inconsitency :-*
 ) - is a restriction that contradicts that notion.

 == What i am proposing:

 fully allowing for URLs as per rfc3986#section-3.2.2 - with a regex
 solution for localhost (and whatever else is possible) instead of a
 hardcoded < "magicnumber"-80%-"solution" >

 To be Commited to django repository and pull requested. My earlier pull
 request is more - a starting point for discussion.


 == Why this is necessary & usefull:

 Single-label URLs might be used
 - in intranet situations
 - for URLs that represent services / schemes that do not comply to
 FQDNaming conventions
 - for local testing (local DNS resolution that is not based on FQDN)
 - mDNS [RFC 6762] solutions, operating under .local TLD (which as of that
 RFC can be ommitted in a local context)
 - the django validator is named URLValidator, not
 FQDN_IP_LOCALHOST_URLValidator

 == Further notes:

 i already submitted a pull request - which probably isn't mature enough..
 given i did not even check which tests would break..

 but - there was one test, that should not have broken:

 FAIL: test_urlfield_clean_invalid
 
(forms_tests.field_tests.test_urlfield.URLFieldTest.test_urlfield_clean_invalid)
 [<object object at 0x000001C1038C1760>] (value='foo')

 URL <= "foo" should not be valid, even with my little changes, replacing
 'localhost' with hostname_re

 It feels like there are some (- -)  missing - but i did not check.. i
 focused on providing a more solid ticket first..
 So - if i am not mistaken, there is another issue besides what i propose.
 It seems, limiting hosts via FQDN was the thing, preventing missing URI-
 scheme's to be rejected by the validator, not a correct validation of uri-
 schemes themselves.

 == PS

 its kindof late - i might polish this ticket tomorrow. if you feel like
 i'm drunk or disorganized - its just my brain thats screaming for relief.
 sry.
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36131>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/010701949476ec7c-13c806a6-5f63-4c13-a1a2-231bb3ec0c17-000000%40eu-central-1.amazonses.com.

Reply via email to