[MediaWiki-commits] [Gerrit] mediawiki/core[REL1_26]: Remove support for $wgWellFormedXml=false
jenkins-bot has submitted this change and it was merged. Change subject: Remove support for $wgWellFormedXml=false .. Remove support for $wgWellFormedXml=false tl;dr: Having unnessary complexity in security critical code is bad. * When PHP is built with certain versions of libpcre, can lead to XSS * Extra options add extra complexity and maintenance burden ** Thus we should only have one html output mode. well formed = false was already vetoed in T52040, so lets go with WellFormed=true. * Options which are used by very few people tend to get tested less * Escaping is an area of code where we should be very conservative * Having escaping rules depend on making assumptions about which characters various browsers consider "whitespace" is scary * $wgWellFormedXml=false has had a negative security impact in the past (Usually not directly its fault, but has made other bugs more exploitable) * Saving a couple bytes (even less bytes after gzip taken into account) is really not worth it in this context (imho). Incidentally, this backports the removal of the space before the closing '/>' of a self-closed tag from dd2d7d0ffc. Bug: T57548 Change-Id: I5c922e0980d3f9eb39adb5bb5833e158afda42ed --- M includes/DefaultSettings.php M includes/Html.php M tests/parser/parserTest.inc M tests/parser/parserTests.txt M tests/phpunit/includes/HtmlTest.php M tests/phpunit/includes/LinkerTest.php M tests/phpunit/includes/OutputPageTest.php M tests/phpunit/includes/XmlSelectTest.php M tests/phpunit/includes/XmlTest.php M tests/phpunit/includes/content/JsonContentTest.php M tests/phpunit/includes/parser/NewParserTest.php 11 files changed, 147 insertions(+), 249 deletions(-) Approvals: Jforrester: Looks good to me, approved jenkins-bot: Verified diff --git a/includes/DefaultSettings.php b/includes/DefaultSettings.php index 7498a02..1a97dac 100644 --- a/includes/DefaultSettings.php +++ b/includes/DefaultSettings.php @@ -3091,24 +3091,6 @@ $wgAllowMicrodataAttributes = false; /** - * Should we try to make our HTML output well-formed XML? If set to false, - * output will be a few bytes shorter, and the HTML will arguably be more - * readable. If set to true, life will be much easier for the authors of - * screen-scraping bots, and the HTML will arguably be more readable. - * - * Setting this to false may omit quotation marks on some attributes, omit - * slashes from some self-closing tags, omit some ending tags, etc., where - * permitted by HTML5. Setting it to true will not guarantee that all pages - * will be well-formed, although non-well-formed pages should be rare and it's - * a bug if you find one. Conversely, setting it to false doesn't mean that - * all XML-y constructs will be omitted, just that they might be. - * - * Because of compatibility with screen-scraping bots, and because it's - * controversial, this is currently left to true by default. - */ -$wgWellFormedXml = true; - -/** * Permit other namespaces in addition to the w3.org default. * * Use the prefix for the key and the namespace for the value. diff --git a/includes/Html.php b/includes/Html.php index 62ae0b8..c49cca4 100644 --- a/includes/Html.php +++ b/includes/Html.php @@ -38,8 +38,6 @@ * * $wgMimeType: If this is set to an xml MIME type then output should be * valid XHTML5. - * $wgWellFormedXml: If this is set to true, then all output should be - * well-formed XML (quotes on attributes, self-closing tags, etc.). * * This class is meant to be confined to utility functions that are called from * trusted code paths. It does not do enforcement of policy like not allowing @@ -199,8 +197,7 @@ * This is quite similar to Xml::tags(), but it implements some useful * HTML-specific logic. For instance, there is no $allowShortTag * parameter: the closing tag is magically omitted if $element has an empty -* content model. If $wgWellFormedXml is false, then a few bytes will be -* shaved off the HTML output as well. +* content model. * * @param string $element The element's name, e.g., 'a' * @param array $attribs Associative array of attributes, e.g., array( @@ -211,14 +208,10 @@ * @return string Raw HTML */ public static function rawElement( $element, $attribs = array(), $contents = '' ) { - global $wgWellFormedXml; $start = self::openElement( $element, $attribs ); if ( in_array( $element, self::$voidElements ) ) { - if ( $wgWellFormedXml ) { - // Silly XML. - return substr( $start, 0, -1 ) . ' />'; - } - return $start; + // Silly XML. + return substr( $start, 0, -1 ) . '/>'; } else {
[MediaWiki-commits] [Gerrit] mediawiki/core[REL1_26]: Remove support for $wgWellFormedXml=false
Chad has uploaded a new change for review. https://gerrit.wikimedia.org/r/304154 Change subject: Remove support for $wgWellFormedXml=false .. Remove support for $wgWellFormedXml=false tl;dr: Having unnessary complexity in security critical code is bad. * When PHP is built with certain versions of libpcre, can lead to XSS * Extra options add extra complexity and maintenance burden ** Thus we should only have one html output mode. well formed = false was already vetoed in T52040, so lets go with WellFormed=true. * Options which are used by very few people tend to get tested less * Escaping is an area of code where we should be very conservative * Having escaping rules depend on making assumptions about which characters various browsers consider "whitespace" is scary * $wgWellFormedXml=false has had a negative security impact in the past (Usually not directly its fault, but has made other bugs more exploitable) * Saving a couple bytes (even less bytes after gzip taken into account) is really not worth it in this context (imho). Incidentally, this backports the removal of the space before the closing '/>' of a self-closed tag from dd2d7d0ffc. Bug: T57548 Change-Id: I5c922e0980d3f9eb39adb5bb5833e158afda42ed --- M includes/DefaultSettings.php M includes/Html.php M tests/parser/parserTest.inc M tests/parser/parserTests.txt M tests/phpunit/includes/HtmlTest.php M tests/phpunit/includes/LinkerTest.php M tests/phpunit/includes/OutputPageTest.php M tests/phpunit/includes/XmlSelectTest.php M tests/phpunit/includes/XmlTest.php M tests/phpunit/includes/content/JsonContentTest.php M tests/phpunit/includes/parser/NewParserTest.php 11 files changed, 147 insertions(+), 249 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/mediawiki/core refs/changes/54/304154/1 diff --git a/includes/DefaultSettings.php b/includes/DefaultSettings.php index 7498a02..1a97dac 100644 --- a/includes/DefaultSettings.php +++ b/includes/DefaultSettings.php @@ -3091,24 +3091,6 @@ $wgAllowMicrodataAttributes = false; /** - * Should we try to make our HTML output well-formed XML? If set to false, - * output will be a few bytes shorter, and the HTML will arguably be more - * readable. If set to true, life will be much easier for the authors of - * screen-scraping bots, and the HTML will arguably be more readable. - * - * Setting this to false may omit quotation marks on some attributes, omit - * slashes from some self-closing tags, omit some ending tags, etc., where - * permitted by HTML5. Setting it to true will not guarantee that all pages - * will be well-formed, although non-well-formed pages should be rare and it's - * a bug if you find one. Conversely, setting it to false doesn't mean that - * all XML-y constructs will be omitted, just that they might be. - * - * Because of compatibility with screen-scraping bots, and because it's - * controversial, this is currently left to true by default. - */ -$wgWellFormedXml = true; - -/** * Permit other namespaces in addition to the w3.org default. * * Use the prefix for the key and the namespace for the value. diff --git a/includes/Html.php b/includes/Html.php index 62ae0b8..c49cca4 100644 --- a/includes/Html.php +++ b/includes/Html.php @@ -38,8 +38,6 @@ * * $wgMimeType: If this is set to an xml MIME type then output should be * valid XHTML5. - * $wgWellFormedXml: If this is set to true, then all output should be - * well-formed XML (quotes on attributes, self-closing tags, etc.). * * This class is meant to be confined to utility functions that are called from * trusted code paths. It does not do enforcement of policy like not allowing @@ -199,8 +197,7 @@ * This is quite similar to Xml::tags(), but it implements some useful * HTML-specific logic. For instance, there is no $allowShortTag * parameter: the closing tag is magically omitted if $element has an empty -* content model. If $wgWellFormedXml is false, then a few bytes will be -* shaved off the HTML output as well. +* content model. * * @param string $element The element's name, e.g., 'a' * @param array $attribs Associative array of attributes, e.g., array( @@ -211,14 +208,10 @@ * @return string Raw HTML */ public static function rawElement( $element, $attribs = array(), $contents = '' ) { - global $wgWellFormedXml; $start = self::openElement( $element, $attribs ); if ( in_array( $element, self::$voidElements ) ) { - if ( $wgWellFormedXml ) { - // Silly XML. - return substr( $start, 0, -1 ) . ' />'; - } - return $start; + // Silly XML. + return substr( $start, 0, -1 ) . '/>';