On matching URLs, it might be nice to start with a *definition* of
the proper syntax of a, no? The actual spec is fairly complicated
[see RFC 2396]. It is hard even to give a *summary* syntax for the
darn things [especially since the different URL "schemes" can have
their own syntactic rules : http: versus telnet: versus mailto:,
etc].
Roughly:
<scheme>://<authority><path>?<query>
the scheme can be almost anything [I don't think there's any aprori
way to tell a 'defined' scheme from a random one, so any alph-
sequence is at least legal]. Then the authority looks like:
<userinfo>@<host>:<port>
Where 'userinfo' is intended to be authentication info corresponding
to the desired access to the given @host. For the 'mailto' scheme,
that's the desired target email-box, of course. You omit the '@' if
there is no userinfo.
Almost anything that can be resolved to an IP address *somehow* [DNS,
dot-quads, wahtever] is legal for the 'host' part. Port is the
decimal port number to use [use the default port associated with the
given scheme if the ":<port>" is omitted].
And so on, for the spec on what's legal in a <path> and what is legal
in a <query> [and about how the restricted-chars have to be %-encoded
if you want to use them, etc]. This is clearly MUCH more complicated
than one actually _usually_ runs into in practice, and is probably
more complicated that the original inquirer had in mind.
But still: it'd be nice to settle on "what's a URL" for these
purposes before we start playing golf on short-routines to match them
[e.g., mailto: URLs are *real* common, and they generlaly use ALL of
the fields: something@ for the mailbox, the ?subject=.... to set up
the default subject]
/Bernie\
--
Bernie Cosell Fantasy Farm Fibers
mailto:[EMAIL PROTECTED] Pearisburg, VA
--> Too many people, too few sheep <--