> I'm trying to throw out URLs with any invalid characters in them, like > '@". According to http://www.ietf.org/rfc/rfc1738.txt : > Thus, only alphanumerics, the special characters "$-_.+!*'(),", and > reserved characters used for their reserved purposes may be used > unencoded within a URL. > > I'd like to throw out a URL like > 'http://jncicancerspectrum.oupjournals.org/cgi/content/full/jnci;91/3/252' > (even though this one works perfectly fine. Go figure.). I've tried: > if ($url =~ /^[^A-Za-z0-9$-_.+!*'(),]+$/) { #if there are any > invalid URL characters in the string > # Remember, special > regex characters lose their meaning inside [] > print "Invalid character in URL at line $.: $url\n"; > next; > } > > According to my Camel, special regex characters are supposed to lose > their special functioning inside []. Yet, that obviously isn't true for > '-' used to separate the start and end of a range. I thought the fourth > '-' at '$-' was probably indicating a range, so I tried to escape it by > preceding it with a backslash or '\Q' but both gave strange errors about > uninitiated strings in concatenations. > > Any suggestions? Thanks for your help and thoughts. >
Did you mean to leave out those characters the RFC mentions are reserved for some schemes, "The characters ";", "/", "?", ":", "@", "=" and "&" are the characters which may be reserved for special meaning within a scheme." They should be in the class as well, since you are negating it right? Just trying to understand completely so I don't throw you off with any dumb remarks... http://danconia.org -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>