Bug#181872: Patch

Frank Lichtenheld Fri, 14 Mar 2003 08:43:38 -0600

On Thu, Mar 13, 2003 at 10:12:14PM +0100, Josip Rodin wrote:
> On Thu, Mar 13, 2003 at 09:27:28PM +0100, Frank Lichtenheld wrote:
> > Ok. Let's elaborate a little. Sorry if it's too long.
> 
> Oh, I understood perfectly what you said, I just meant to say that I thought
> the original code preserved URL: within the <a> tag by mistake.


Ok. But it is not within the <a> tag, it only converts the
<URL:http://...> to &lt;URL:http://...&gt; The regex that converts the
http://.. to <a href="http//...">http://...</a> is this one one line
below [$long_desc =~ s,(http://[\S~-]+?/?)((\&gt\;)?[)]?[']?[.\,]?(\s|$)),<a 
href=\"$1\">$1</a>$2,go;]

After all the discussions I would propose as the patch to apply (it
contains elements of both versions):

$long_desc =~ s,<((URL:)?\s*http://[^>]+)\s*>,\&lt\;$1\&gt\;,go;

In the end it's your decision.

> Well, I think in principle it's much better to just match until the first
> closing bracket since IME such things are less prone to errors. Of course,
> if someone found URLs with <> in them, that idea goes down the drain...

Ok, let's wait for a package maintainer to do this. Then we can handle
it ;)

> > But if you want to really allow this you have to write something like:
> >    $long_desc =~ s/\&(?!(?:#x?[\da-fA-F]+|\w+)\;)/\&amp\;/go;
> > Seems to work good but no warranty. Happy regexing ;)
> 
> Not sure offhand why you both check the entity format and use a rather
> simple \w+ as an alternative... A sentence could end talking about
> Barnes&Noble; and then it could be followed by another sentence :)

Hmmm, see the problem. Only solution seems to be to make a list of
allowed entities:
$long_desc =~ s/\&(?!(?:#x?[\da-fA-F]+|amp|gt|lt|quot)\;)/\&amp\;/go;

Greetings,
        Frank

-- 
*** Frank Lichtenheld <[EMAIL PROTECTED]> ***
          *** http://www.djpig.de/ ***
see also: - http://www.usta.de/
          - http://fachschaft.physik.uni-karlsruhe.de/

Bug#181872: Patch

Reply via email to