Brian Wolff has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/391501 )

Change subject: Fix langauge converter parser test with self-close tags
......................................................................


Fix langauge converter parser test with self-close tags

This fixes an issue in f21f3942 where if there was an html
element with an alt or title attribute containing an <
entity, an ascii EOT control character (0x04) may become
inserted into the text if language converter was enabled.

Due to a really old bug in language converter, self-closed tags
got turned into non-self closed tags. However due a different
bug which was fixed in f21f3942 this code path was rarely taken
so nobody noticed until now.

Follow-up Idbc45cac12

Bug: T180552
Change-Id: I077d30c50fcb419837fef937d27caca307153d2d
---
M languages/LanguageConverter.php
M tests/parser/parserTests.txt
2 files changed, 17 insertions(+), 3 deletions(-)



diff --git a/languages/LanguageConverter.php b/languages/LanguageConverter.php
index 28c75e1..7b5abb2 100644
--- a/languages/LanguageConverter.php
+++ b/languages/LanguageConverter.php
@@ -354,7 +354,6 @@
                if ( $this->guessVariant( $text, $toVariant ) ) {
                        return $text;
                }
-
                /* we convert everything except:
                   1. HTML markups (anything between < and >)
                   2. HTML entities
@@ -389,6 +388,7 @@
 
                // Guard against delimiter nulls in the input
                $text = str_replace( "\000", '', $text );
+               $text = str_replace( "\004", '', $text );
 
                $markupMatches = null;
                $elementMatches = null;
@@ -403,6 +403,13 @@
                                        // We hit the end.
                                        $elementPos = strlen( $text );
                                        $element = '';
+                               } elseif ( substr( $element, -1 ) === "\004" ) {
+                                       // This can sometimes happen if we have
+                                       // unclosed html tags (For example
+                                       // when converting a title attribute
+                                       // during a recursive call that contains
+                                       // a &lt; e.g. <div title="&lt;">.
+                                       $element = substr( $element, 0, -1 );
                                }
                        } else {
                                // If we hit here, then Language Converter 
could be tricked
@@ -430,7 +437,14 @@
                        if ( $element !== ''
                                && preg_match( 
'/^(<[^>\s]*+)\s([^>]*+)(.*+)$/', $element, $elementMatches )
                        ) {
+                               // FIXME, this decodes entities, so if you have 
something
+                               // like <div title="foo&lt;bar"> the bar won't 
get
+                               // translated since after entity decoding it 
looks like
+                               // unclosed html and we call this method 
recursively
+                               // on attributes.
                                $attrs = Sanitizer::decodeTagAttributes( 
$elementMatches[2] );
+                               // Ensure self-closing tags stay self-closing.
+                               $close = substr( $elementMatches[2], -1 ) === 
'/' ? ' /' : '';
                                $changed = false;
                                foreach ( [ 'title', 'alt' ] as $attrName ) {
                                        if ( !isset( $attrs[$attrName] ) ) {
@@ -451,7 +465,7 @@
                                }
                                if ( $changed ) {
                                        $element = $elementMatches[1] . 
Html::expandAttributes( $attrs ) .
-                                               $elementMatches[3];
+                                               $close . $elementMatches[3];
                                }
                        }
                        $literalBlob .= $element . "\000";
diff --git a/tests/parser/parserTests.txt b/tests/parser/parserTests.txt
index b498105..5814985 100644
--- a/tests/parser/parserTests.txt
+++ b/tests/parser/parserTests.txt
@@ -16772,7 +16772,7 @@
 [[File:Foobar.jpg|alt=-{}-foAjrjvi-{}-]]
 !! html
 <p>
-</p><p><a 
href="/wiki/%D0%94%D0%B0%D1%82%D0%BE%D1%82%D0%B5%D0%BA%D0%B0:Foobar.jpg" 
class="image"><img alt="&quot; onload=&quot;alert(1)&quot; data-foo=&quot;" 
src="http://example.com/images/3/3a/Foobar.jpg"; width="1941" height="220"></a>
+</p><p><a 
href="/wiki/%D0%94%D0%B0%D1%82%D0%BE%D1%82%D0%B5%D0%BA%D0%B0:Foobar.jpg" 
class="image"><img alt="&quot; onload=&quot;alert(1)&quot; data-foo=&quot;" 
src="http://example.com/images/3/3a/Foobar.jpg"; width="1941" height="220" /></a>
 </p>
 !! end
 

-- 
To view, visit https://gerrit.wikimedia.org/r/391501
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I077d30c50fcb419837fef937d27caca307153d2d
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/core
Gerrit-Branch: REL1_27
Gerrit-Owner: Brian Wolff <bawolff...@gmail.com>
Gerrit-Reviewer: Brian Wolff <bawolff...@gmail.com>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to