On 09/30/2010 06:38 PM, Lukas Fleischer wrote:
On Thu, Sep 30, 2010 at 06:18:24PM +0200, PyroPeter wrote:
+  $url = str_replace('&','&', $url);
+  $url = str_replace('&', '&', $url);

What about the occurrences of "&(html-entity-code-here);" you
produced the line before?

Nothing? Any occurrence of an HTML entity code is correctly encoded as
"&". People shouldn't be able to manually insert HTML entities in
comments. The first line is actually even superfluous as I realized just
now since ampersands should already have been replaced by
htmlspecialchars() before at the time this line is executed (didn't
check that before, this part of code has been extracted from the
DokuWiki plugin).

Well, but you are encoding existing entities, that are not "&" as
"&foo;". See the example below.

+  $patterns[] = '(\b(?i)www?(?-i)\.[' . $host . ']+?\.[' . $host . ']+?[' . 
$any . ']+?(?=[' . $punc . ']*[^' . $any . ']))';
+  $patterns[] = '(\b(?i)ftp?(?-i)\.['. $host . ']+?\.[' . $host . ']+?[' . 
$any . ']+?(?=[' . $punc . ']*[^' . $any . ']))';

I am not that experienced with PHP, but this looks like the $patterns
array got replaced instead of extended.

Nope, it doesn't. Check [1].

I see, "$var[] = foo" creates the array $var if necessary and appends
foo.

+  $comment = htmlspecialchars($comment);

Won't this render the next instruction useless if there are
html-characters in a link?

Nope. Links need to be escaped as well. Not sure what happens if a link
contains quotes or "<"/">". This shouldn't happen too often tho.

Imo, you should split the message at the link boundaries.
( "foo ", "http://foo.bar.tld";, " baz")
Then you should encode the html-entities in all elements, wrap the links
in <a>'s, and then join all that together.

I can not think of a way to connect <a>-tags with proper encoding of
the user input using just "normal" string functions like replace, or
regexes.

I would be happy if you could prove the opposite, and will help by
providing you with input that breaks your system.

== example 1 ==

input: "foo http://foo.tld/iLikeToUseApersands/foo&bar.html baz"

If I am not mistaken, $regex would be
"/http://foo.tld/iLikeToUseApersands/foo&bar.html/msS";
(are the "/" correctly escaped? I will assume they are.)

Then, $regex would be:
"/http:\/\/foo\.tld\/iLikeToUseApersands\/foo&bar\.html/msS"

$comment would be set by htmlspecialchars() to:
"foo http://foo.tld/iLikeToUseApersands/foo&amp;bar.html baz"

=> preg_replace_callback() would not match, as & got replaced.

Generally I would not make hostnames ("www.foo.tld") clickable.
If people are not able to provide proper URL's, they have a serious
problem. (there is also the technical argument that the hostname is not
a good indicator for the kind of service the host provides.)

Why not? What if you explicitly want to link to a project's home page?
It'll also just convert hostnames if they start with a "www" or "ftp"
subdomain, so comments refering to domains in other ways won't be
converted.

You can also link to a homepage using valid URL's. The additional
"feature" may be nice, but makes the code more complex. It also
trains users to omit the "http://"; and produces more work for devs,
as they all now have to parse this invalid hostname+path stuff.

Unrelated: You seem to accept only a-zA-Z in hostnames? Or does
PHP's \w include 0-9 and language-dependent letters? What about
underscores?

Why does the <a>'s content only include the Path of the URL?

Regards, PyroPeter
--
freenode/pyropeter                          "12:50 - Ich drücke Return."

Reply via email to