Cscott has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/179185

Change subject: Don't break autolinks by stripping the final semicolon from an 
entity.
......................................................................

Don't break autolinks by stripping the final semicolon from an entity.

Autolinking free external links is clever about making sure that trailing
punctuation isn't included in the link.  But if an HTML entity happens to
terminate the URL, the semicolon from the entity is stripped from the url,
breaking it.

Fix this corner case.  This also unifies autolink parsing with Parsoid.

See: I5ae8435322c78dd1df170d7a3543fff3642759b1
Change-Id: I5482782c25e12283030b0fd2150ac55092f7979b
---
M includes/parser/Parser.php
M tests/parser/parserTests.txt
2 files changed, 21 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/core 
refs/changes/85/179185/1

diff --git a/includes/parser/Parser.php b/includes/parser/Parser.php
index 5c8253a..2d684ba 100644
--- a/includes/parser/Parser.php
+++ b/includes/parser/Parser.php
@@ -1485,6 +1485,13 @@
                }
 
                $numSepChars = strspn( strrev( $url ), $sep );
+               # Don't break a trailing HTML entity
+               if ( $numSepChars && substr( $url, -$numSepChars, 1 ) === ';') {
+                       $chopped = substr( $url, 0, -$numSepChars );
+                       if ( preg_match( '/&([a-z]+|#x[\da-f]+|#\d+)$/i', 
$chopped ) ) {
+                               $numSepChars--;
+                       }
+               }
                if ( $numSepChars ) {
                        $trail = substr( $url, -$numSepChars ) . $trail;
                        $url = substr( $url, 0, -$numSepChars );
diff --git a/tests/parser/parserTests.txt b/tests/parser/parserTests.txt
index 5f19e8b..0e62459 100644
--- a/tests/parser/parserTests.txt
+++ b/tests/parser/parserTests.txt
@@ -4171,6 +4171,13 @@
 http://example.com?
 http://example.com)
 http://example.com/url_with_(brackets)
+(http://example.com/url_without_brackets)
+http://example.com/url_with_entity 
+http://example.com/url_with_entity 
+http://example.com/url_with_entity 
+http://example.com/url_with_entity<
+http://example.com/url_with_entity<
+http://example.com/url_with_entity<
 !! html
 <p><a rel="nofollow" class="external free" 
href="http://example.com";>http://example.com</a>,
 <a rel="nofollow" class="external free" 
href="http://example.com";>http://example.com</a>;
@@ -4181,6 +4188,13 @@
 <a rel="nofollow" class="external free" 
href="http://example.com";>http://example.com</a>?
 <a rel="nofollow" class="external free" 
href="http://example.com";>http://example.com</a>)
 <a rel="nofollow" class="external free" 
href="http://example.com/url_with_(brackets)">http://example.com/url_with_(brackets)</a>
+(<a rel="nofollow" class="external free" 
href="http://example.com/url_without_brackets";>http://example.com/url_without_brackets</a>)
+<a rel="nofollow" class="external free" 
href="http://example.com/url_with_entity ";>http://example.com/url_with_entity 
</a>
+<a rel="nofollow" class="external free" 
href="http://example.com/url_with_entity ";>http://example.com/url_with_entity 
</a>
+<a rel="nofollow" class="external free" 
href="http://example.com/url_with_entity ";>http://example.com/url_with_entity 
</a>
+<a rel="nofollow" class="external free" 
href="http://example.com/url_with_entity";>http://example.com/url_with_entity</a>&lt;
+<a rel="nofollow" class="external free" 
href="http://example.com/url_with_entity%3C";>http://example.com/url_with_entity%3C</a>
+<a rel="nofollow" class="external free" 
href="http://example.com/url_with_entity%3C";>http://example.com/url_with_entity%3C</a>
 </p>
 !! end
 

-- 
To view, visit https://gerrit.wikimedia.org/r/179185
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5482782c25e12283030b0fd2150ac55092f7979b
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/core
Gerrit-Branch: master
Gerrit-Owner: Cscott <canan...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to