> I'm trying to throw out URLs with any invalid characters in them, like
> '@". According to http://www.ietf.org/rfc/rfc1738.txt :
>    Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
>    reserved characters used for their reserved purposes may be used
>    unencoded within a URL.
> 
> I'd like to throw out a URL like
> 'http://jncicancerspectrum.oupjournals.org/cgi/content/full/jnci;91/3/252'
> (even though this one works perfectly fine. Go figure.). I've tried:
>         if ($url =~ /^[^A-Za-z0-9$-_.+!*'(),]+$/) { #if there are any
> invalid URL characters in the string
>                                                     # Remember, special
> regex characters lose their meaning inside []
>            print "Invalid character in URL at line $.: $url\n";
>            next;
>         }
> 
> According to my Camel, special regex characters are supposed to lose
> their special functioning inside []. Yet, that obviously isn't true for
> '-' used to separate the start and end of a range. I thought the fourth
> '-' at '$-' was probably indicating a range, so I tried to escape it by
> preceding it with a backslash or '\Q' but both gave strange errors about
> uninitiated strings in concatenations.
> 
> Any suggestions? Thanks for your help and thoughts.
> 

Did you mean to leave out those characters the RFC mentions are reserved
for some schemes, 

"The characters ";", "/", "?", ":", "@", "=" and "&" are the characters
which may be reserved for special meaning within a scheme."

They should be in the class as well, since you are negating it right? 
Just trying to understand completely so I don't throw you off with any
dumb remarks...

http://danconia.org


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to