In message <alpine.lsu.2.00.1011261558520.4...@hermes-2.csi.cam.ac.uk>, Tony Fi
nch writes:
> On Thu, 25 Nov 2010, Andrew Sullivan wrote:
> 
> > So what aside from [...] do you want?
> 
> Something like this:
> 
> 
> Abstract
> 
> This memo clarifies the syntax of top-level domain labels in the domain
> name system as specified in RFC 1123, and how this syntax relates to the
> allocationn policy for TLDs. It describes the current
> 
> [...blah...]
> 
> Background
> 
> [RFC0952] defines a host name in the first paragraph under "ASSUMPTIONS",
> as follows:
> 
>       A "name" ... is a text string up to 24 characters drawn from the
>       alphabet (A-Z), digits (0-9), minus sign (-), and period (.).
>       Note that periods are only allowed when they serve to delimit
>       components of "domain style names".  (See RFC-921, "Domain Name
>       System Implementation Schedule", for background).  No blank or
>       space characters are permitted as part of a name.  No distinction
>       is made between upper and lower case.  The first character must be
>       an alpha character.  The last character must not be a minus sign
>       or period.
> 
> [RFC1123] section 2.1 reaffirms this definition, but makes one change
> to the syntax:
> 
>       The syntax of a legal Internet host name was specified in RFC-952
>       [DNS:4].  One aspect of host name syntax is hereby changed: the
>       restriction on the first character is relaxed to allow either a
>       letter or a digit.  Host software MUST support this more liberal
>       syntax.
> 
> In addition, the DISCUSSION in Section 2.1 says:
> 
>       'However, a valid host name can never have the dotted-decimal form
>       #.#.#.#, since at least the highest-level component label will be
>       alphabetic.'  [Section 2.1]
> 
> Some implementers may have understood the above phrase "will be
> alphabetic" to be a protocol restriction. This is incorrect. It is in fact
> a description of the TLD allocation policy at that time.
> 
> The TLD allocation policy has since had two significant syntactic changes.
> 
> On 16 November 2000 the first long TLD (.museum) was allocated, and it
> was added to the root zone in June 2001.
> 
> In October 2007, the first IDNA test TLDs were added to the root zone.
> These were the first TLDs with non-alphabetic characters. ICANN approved a
> policy for allocating IDNA ccTLDs in October 2009 and the first production
> IDNA TLDs were added to the root zone in January 2010.
>
> Deployed software that checks DNS top-level labels for conformance with
> past allocation policy is likely to reject domain names allocated after a
> policy change.
> 
> 
> Syntax of TLD labels - protocol level
> 
> All labels of a domain name have the same syntax. The syntax of TLDs is
> not specially restricted at the protocol level.
> 
>    domain  = *(label ".") label ["."]
> 
>    label   = let-dig [ldh-str]
> 
>    let-dig = ALPHA / DIGIT
> 
>    ldh-str = *( ALPHA / DIGIT / "-" ) let-dig
> 
> A label can be up to 63 characters long. A domain name can be up to 255
> characters long.
> 
> A domain name as a whole shall not match the dotted quad representation of
> an IPv4 address.
> 
>    IPv4    = 3(digits ".")
> 
>    digits  = 1*DIGIT

Which doesn't come close to matching all the possible component
forms supported by inet(3) on POSIX machines.

The octet 10 (decimal) can be expressed in all these forms on POSIX
machines.

         0x0a, 0xa, 012, 10

Additionally 10.6666 *is* a valid IPv4 address so one has to worry
about all values up to 4294967295 expressed in decimal, octal and
hexadecimal as a minimum to precent accidental collisions.

        0 .. 037777777777
        0 .. 0xffffffff
        0 .. 4294967295

> Syntax of TLD labels - allocation policy
> 
> The syntax of allocated TLDs is restricted in order to ensure that no
> domain name can match an IPv4 dotted quad, and for compatibility with past
> practice and deployed software. The policy is subject to change by ICANN.
> This section describes the syntax of domain names permitted by the current
> allocation policy.
> 
> IDNS encodes Unicode strings within the syntax permitted for domain name
> labels. The Unicode string used by applications is known as a U-Label;
> its corresponding encoding in the DNS is known as an A-Label. The terms
> A-Label and U-Label are used in this document as defined in [RFC5890].
> Valid A-Labels always contain non-alphabetic characters.
> 
> In order to accommodate the wish to express TLD names in scripts other
> than the ASCII subset of Latin, it is necessary to allow non-alphabetic
> characters in the corresponding TLD DNS-Labels.  Following past practice,
> the U-label form of a TLD name is restricted by applying rules analogous
> to those already imposed on ASCII TLD DNS-Labels.
> 
> ASCII TLDs have the following syntax:
> 
>    TLD = 1*63(ALPHA)
> 
> IDNA TLDs obey the following requirements:
> 
>    1.  the DNS-Label is a valid A-Label according to [RFC5890];
> 
>    2.  the derived property value of all code points, as defined by
>        [RFC5890], is PVALID;
> 
>    3.  the general category of all code points, is one of { Ll, Lo, Lm, Mn }.
> 
> 
> [... etc etc ...]
> 
> Tony.
> -- 
> f.anthony.n.finch  <d...@dotat.at>  http://dotat.at/
> HUMBER THAMES DOVER WIGHT PORTLAND: NORTH BACKING WEST OR NORTHWEST, 5 TO 7,
> DECREASING 4 OR 5, OCCASIONALLY 6 LATER IN HUMBER AND THAMES. MODERATE OR
> ROUGH. RAIN THEN FAIR. GOOD.
> _______________________________________________
> DNSOP mailing list
> DNSOP@ietf.org
> https://www.ietf.org/mailman/listinfo/dnsop
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: ma...@isc.org
_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to