Re: Are single character sub-domains allowed in the envelope?

Peter J. Holzer Sat, 25 Sep 2004 06:51:19 -0700

On 2004-09-25 11:43:45 +0200, Michael Holzt wrote:
> > I don't understand what problem you and Michael are trying to 
> > solve with the messages after this one.  What doesn't match?
> 
> The old code did not match for single character sub-domains.


Yes, but Robert already fixed that:

On 2004-09-14 08:49:57 -0700, Robert Spier wrote:
|     my $subdomain = '(?:[a-zA-Z0-9](?:[-a-zA-Z0-9]*[a-zA-Z0-9]))';
| 
| I think what we want is this:
| 
|     my $subdomain = '(?:[a-zA-Z0-9](?:[-a-zA-Z0-9]*[a-zA-Z0-9])?)';

(and as an indication that the error was a simply typo, I offer that the
second grouping is rather useless without the ? after it).


> > All the examples you provide match fine, assuming we're talking 
> > about /^$subdomain(?:\.$subdomain)*$/. The added look-aheads 
> > and alternations and other complications don't add anything, as 
> > far as I can see.
> 
> Oh, they are there for a reason: Because without them we would
> match too much, e.g. illegal subdomains. A subdomain can not
> start or end with a hyphen, and to ensure this, complicated 
> regexps like posted are needed.

No, they aren't. Robert's fix is fine. Indeed, if you look at at the 
BNF notation

> #   sub-domain = Let-dig [Ldh-str]
> #   Let-dig = ALPHA / DIGIT
> #   Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig

you can see that it matches the BNF notation exactly:

     my $subdomain = '(?:[a-zA-Z0-9](?:[-a-zA-Z0-9]*[a-zA-Z0-9])?)';
                                       <-----------><--------->
                                       *(A/D/"-")   Let-dig
                         <---------><--------------------------->
                         Let-dig    [Ldh-str]


> However it looks to me like the lookahead for the 
> digit or char after the hyphen is not needed. We can
> just match it normally. So i submit now the following
> pattern for evaluation:
> 
>   (?:[a-zA-z0-9]+(?:[a-zA-Z0-9]|-[a-zA-Z0-9])*)

Nope, this is wrong. It doesn't match 

xn--bcher-kva

which is a perfectly legitimate domain component (which also happens to
exist, see http://xn--bcher-kva.ch/).

> Btw: Why don't we use case-insensitive matching?

No special reason, I think. When I wrote the regexps, I just literally
translated the BNF to regexps. I was sorely tempted to "optimize" the
regexps, but resisted (mostly) because I wanted to make it easy to
verify that the regexps do indeed implement the BNF.

        hp


-- 
   _  | Peter J. Holzer    | Je höher der Norden, desto weniger wird
|_|_) | Sysadmin WSR       | überhaupt gesprochen, also auch kein Dialekt.
| |   | [EMAIL PROTECTED]         | Hallig Gröde ist fast gänzlich dialektfrei.
__/   | http://www.hjp.at/ |   -- Hannes Petersen in desd

pgpxPvbMJdAmW.pgp
Description: PGP signature

Re: Are single character sub-domains allowed in the envelope?

Reply via email to