On matching URLs, it might be nice to start with a *definition* of 
the proper syntax of a, no?  The actual spec is fairly complicated 
[see RFC 2396].  It is hard even to give a *summary* syntax for the 
darn things [especially since the different URL "schemes" can have 
their own syntactic rules : http: versus telnet: versus mailto:, 
etc].

Roughly:

    <scheme>://<authority><path>?<query>

the scheme can be almost anything [I don't think there's any aprori 
way to tell a 'defined' scheme from a random one, so any alph-
sequence is at least legal].  Then the authority looks like:

    <userinfo>@<host>:<port>

Where 'userinfo' is intended to be authentication info corresponding 
to the desired access to the given @host. For the 'mailto' scheme, 
that's the desired target email-box, of course.  You omit the '@' if 
there is no userinfo.

Almost anything that can be resolved to an IP address *somehow* [DNS, 
dot-quads, wahtever] is legal for the 'host' part.  Port is the 
decimal port number to use [use the default port associated with the 
given scheme if the ":<port>" is omitted].

And so on, for the spec on what's legal in a <path> and what is legal 
in a <query> [and about how the restricted-chars have to be %-encoded 
if you want to use them, etc].  This is clearly MUCH more complicated 
than one actually _usually_ runs into in practice, and is probably 
more complicated that the original inquirer had in mind.

But still: it'd be nice to settle on "what's a URL" for these 
purposes before we start playing golf on short-routines to match them 
[e.g., mailto: URLs are *real* common, and they generlaly use ALL of 
the fields: something@ for the mailbox, the ?subject=.... to set up 
the default subject]

  /Bernie\
-- 
Bernie Cosell                     Fantasy Farm Fibers
mailto:[EMAIL PROTECTED]     Pearisburg, VA
    -->  Too many people, too few sheep  <--          

Reply via email to