On Thu, Mar 13, 2003 at 10:12:14PM +0100, Josip Rodin wrote: > On Thu, Mar 13, 2003 at 09:27:28PM +0100, Frank Lichtenheld wrote: > > Ok. Let's elaborate a little. Sorry if it's too long. > > Oh, I understood perfectly what you said, I just meant to say that I thought > the original code preserved URL: within the <a> tag by mistake.
Ok. But it is not within the <a> tag, it only converts the <URL:http://...> to <URL:http://...> The regex that converts the http://.. to <a href="http//...">http://...</a> is this one one line below [$long_desc =~ s,(http://[\S~-]+?/?)((\>\;)?[)]?[']?[.\,]?(\s|$)),<a href=\"$1\">$1</a>$2,go;] After all the discussions I would propose as the patch to apply (it contains elements of both versions): $long_desc =~ s,<((URL:)?\s*http://[^>]+)\s*>,\<\;$1\>\;,go; In the end it's your decision. > Well, I think in principle it's much better to just match until the first > closing bracket since IME such things are less prone to errors. Of course, > if someone found URLs with <> in them, that idea goes down the drain... Ok, let's wait for a package maintainer to do this. Then we can handle it ;) > > But if you want to really allow this you have to write something like: > > $long_desc =~ s/\&(?!(?:#x?[\da-fA-F]+|\w+)\;)/\&\;/go; > > Seems to work good but no warranty. Happy regexing ;) > > Not sure offhand why you both check the entity format and use a rather > simple \w+ as an alternative... A sentence could end talking about > Barnes&Noble; and then it could be followed by another sentence :) Hmmm, see the problem. Only solution seems to be to make a list of allowed entities: $long_desc =~ s/\&(?!(?:#x?[\da-fA-F]+|amp|gt|lt|quot)\;)/\&\;/go; Greetings, Frank -- *** Frank Lichtenheld <[EMAIL PROTECTED]> *** *** http://www.djpig.de/ *** see also: - http://www.usta.de/ - http://fachschaft.physik.uni-karlsruhe.de/