On 2004-09-25 11:43:45 +0200, Michael Holzt wrote: > > I don't understand what problem you and Michael are trying to > > solve with the messages after this one. What doesn't match? > > The old code did not match for single character sub-domains.
Yes, but Robert already fixed that: On 2004-09-14 08:49:57 -0700, Robert Spier wrote: | my $subdomain = '(?:[a-zA-Z0-9](?:[-a-zA-Z0-9]*[a-zA-Z0-9]))'; | | I think what we want is this: | | my $subdomain = '(?:[a-zA-Z0-9](?:[-a-zA-Z0-9]*[a-zA-Z0-9])?)'; (and as an indication that the error was a simply typo, I offer that the second grouping is rather useless without the ? after it). > > All the examples you provide match fine, assuming we're talking > > about /^$subdomain(?:\.$subdomain)*$/. The added look-aheads > > and alternations and other complications don't add anything, as > > far as I can see. > > Oh, they are there for a reason: Because without them we would > match too much, e.g. illegal subdomains. A subdomain can not > start or end with a hyphen, and to ensure this, complicated > regexps like posted are needed. No, they aren't. Robert's fix is fine. Indeed, if you look at at the BNF notation > # sub-domain = Let-dig [Ldh-str] > # Let-dig = ALPHA / DIGIT > # Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig you can see that it matches the BNF notation exactly: my $subdomain = '(?:[a-zA-Z0-9](?:[-a-zA-Z0-9]*[a-zA-Z0-9])?)'; <-----------><---------> *(A/D/"-") Let-dig <---------><---------------------------> Let-dig [Ldh-str] > However it looks to me like the lookahead for the > digit or char after the hyphen is not needed. We can > just match it normally. So i submit now the following > pattern for evaluation: > > (?:[a-zA-z0-9]+(?:[a-zA-Z0-9]|-[a-zA-Z0-9])*) Nope, this is wrong. It doesn't match xn--bcher-kva which is a perfectly legitimate domain component (which also happens to exist, see http://xn--bcher-kva.ch/). > Btw: Why don't we use case-insensitive matching? No special reason, I think. When I wrote the regexps, I just literally translated the BNF to regexps. I was sorely tempted to "optimize" the regexps, but resisted (mostly) because I wanted to make it easy to verify that the regexps do indeed implement the BNF. hp -- _ | Peter J. Holzer | Je höher der Norden, desto weniger wird |_|_) | Sysadmin WSR | überhaupt gesprochen, also auch kein Dialekt. | | | [EMAIL PROTECTED] | Hallig Gröde ist fast gänzlich dialektfrei. __/ | http://www.hjp.at/ | -- Hannes Petersen in desd
pgpxPvbMJdAmW.pgp
Description: PGP signature