Re: URL Trimming s{} regex

Andy Bach Tue, 12 Feb 2008 14:02:33 -0800

David Moreno wrote:
> It's a nice, a bit complex one. I'd try it as:
>
> $url =~ m!\A(http://)?(.+?)(/.*)?\z!;
> print $1 if $1;
> print $2;
>
> TIMTOWTDI.
>


Be a little more flexible ( inner non-capturing parens ( "(?: ...)" ) 
add https and, if needed "ftp" or "ldap" or ... and "/i" for case 
insensitive) and always test, not assume a match.  And if you know your 
separator/marker (the slash) use that rather than 'dot':
if ( $url =~ m!\A((?:http|https)://)?([^/]+)!i ) {
   print $1 if $1;
   print $2;
}  # else 'no URL'

Don't need to match (or not) the stuff after the end of the 'not slash' 
part, as you don't care about it ... though you may need to 'chomp' $url 
first (or deal w/ the "\n" if it's there - depends upon your loop).  If 
you're serious, though, there are a number of modules for this URL 
finding that'll do it right for nearly everything legit - it's harder 
than you'd think.  J. Freidl's ("Mastering Regular Expressions" O'Reilly
http://www.oreilly.com/catalog/regex3/index.html
http://regex.info/
) URL matching masterpiece runs to 9 embedded REs and yikes but here's a 
simpler one:

if ($url =~ m{^https?://([^/:]+)(:(\d+))?(/.*)?$}i)
{
  my $host = $1;
  my $port = $3 || 80;  #/ Use $3 if it exists; otherwise default to 80./
  my $path = $4 || "/"; #/ Use $4 if it exists; otherwise default to "/"./
  print "Host: $host\n";
  print "Port: $port\n";
  print "Path: $path\n";
} else {
  print "Not an HTTP URL\n";
}


-- 
Andy Bach, Sys. Mangler
Internet: [EMAIL PROTECTED] 
VOICE: (608) 261-5738  FAX 264-5932

The only function of economic forecasting is 
to make astrology look respectable.
  - John Kenneth Galbraith


_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: URL Trimming s{} regex

Reply via email to