[MediaWiki-commits] [Gerrit] mediawiki/core[REL1_26]: Remove support for $wgWellFormedXml=false

2016-08-11 Thread jenkins-bot (Code Review)
jenkins-bot has submitted this change and it was merged.

Change subject: Remove support for $wgWellFormedXml=false
..


Remove support for $wgWellFormedXml=false

tl;dr: Having unnessary complexity in security critical code is bad.

* When PHP is built with certain versions of libpcre, can lead to XSS
* Extra options add extra complexity and maintenance burden
** Thus we should only have one html output mode. well formed = false
 was already vetoed in T52040, so lets go with WellFormed=true.
* Options which are used by very few people tend to get tested less
* Escaping is an area of code where we should be very conservative
* Having escaping rules depend on making assumptions about which
characters various browsers consider "whitespace" is scary
* $wgWellFormedXml=false has had a negative security impact in the
past (Usually not directly its fault, but has made other bugs
more exploitable)
* Saving a couple bytes (even less bytes after gzip taken into
account) is really not worth it in this context (imho).

Incidentally, this backports the removal of the space before the
closing '/>' of a self-closed tag from dd2d7d0ffc.

Bug: T57548
Change-Id: I5c922e0980d3f9eb39adb5bb5833e158afda42ed
---
M includes/DefaultSettings.php
M includes/Html.php
M tests/parser/parserTest.inc
M tests/parser/parserTests.txt
M tests/phpunit/includes/HtmlTest.php
M tests/phpunit/includes/LinkerTest.php
M tests/phpunit/includes/OutputPageTest.php
M tests/phpunit/includes/XmlSelectTest.php
M tests/phpunit/includes/XmlTest.php
M tests/phpunit/includes/content/JsonContentTest.php
M tests/phpunit/includes/parser/NewParserTest.php
11 files changed, 147 insertions(+), 249 deletions(-)

Approvals:
  Jforrester: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/includes/DefaultSettings.php b/includes/DefaultSettings.php
index 7498a02..1a97dac 100644
--- a/includes/DefaultSettings.php
+++ b/includes/DefaultSettings.php
@@ -3091,24 +3091,6 @@
 $wgAllowMicrodataAttributes = false;
 
 /**
- * Should we try to make our HTML output well-formed XML?  If set to false,
- * output will be a few bytes shorter, and the HTML will arguably be more
- * readable.  If set to true, life will be much easier for the authors of
- * screen-scraping bots, and the HTML will arguably be more readable.
- *
- * Setting this to false may omit quotation marks on some attributes, omit
- * slashes from some self-closing tags, omit some ending tags, etc., where
- * permitted by HTML5.  Setting it to true will not guarantee that all pages
- * will be well-formed, although non-well-formed pages should be rare and it's
- * a bug if you find one.  Conversely, setting it to false doesn't mean that
- * all XML-y constructs will be omitted, just that they might be.
- *
- * Because of compatibility with screen-scraping bots, and because it's
- * controversial, this is currently left to true by default.
- */
-$wgWellFormedXml = true;
-
-/**
  * Permit other namespaces in addition to the w3.org default.
  *
  * Use the prefix for the key and the namespace for the value.
diff --git a/includes/Html.php b/includes/Html.php
index 62ae0b8..c49cca4 100644
--- a/includes/Html.php
+++ b/includes/Html.php
@@ -38,8 +38,6 @@
  *
  * $wgMimeType: If this is set to an xml MIME type then output should be
  * valid XHTML5.
- * $wgWellFormedXml: If this is set to true, then all output should be
- * well-formed XML (quotes on attributes, self-closing tags, etc.).
  *
  * This class is meant to be confined to utility functions that are called from
  * trusted code paths.  It does not do enforcement of policy like not allowing
@@ -199,8 +197,7 @@
 * This is quite similar to Xml::tags(), but it implements some useful
 * HTML-specific logic.  For instance, there is no $allowShortTag
 * parameter: the closing tag is magically omitted if $element has an 
empty
-* content model.  If $wgWellFormedXml is false, then a few bytes will 
be
-* shaved off the HTML output as well.
+* content model.
 *
 * @param string $element The element's name, e.g., 'a'
 * @param array $attribs Associative array of attributes, e.g., array(
@@ -211,14 +208,10 @@
 * @return string Raw HTML
 */
public static function rawElement( $element, $attribs = array(), 
$contents = '' ) {
-   global $wgWellFormedXml;
$start = self::openElement( $element, $attribs );
if ( in_array( $element, self::$voidElements ) ) {
-   if ( $wgWellFormedXml ) {
-   // Silly XML.
-   return substr( $start, 0, -1 ) . ' />';
-   }
-   return $start;
+   // Silly XML.
+   return substr( $start, 0, -1 ) . '/>';
} else {

[MediaWiki-commits] [Gerrit] mediawiki/core[REL1_26]: Remove support for $wgWellFormedXml=false

2016-08-10 Thread Chad (Code Review)
Chad has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/304154

Change subject: Remove support for $wgWellFormedXml=false
..

Remove support for $wgWellFormedXml=false

tl;dr: Having unnessary complexity in security critical code is bad.

* When PHP is built with certain versions of libpcre, can lead to XSS
* Extra options add extra complexity and maintenance burden
** Thus we should only have one html output mode. well formed = false
 was already vetoed in T52040, so lets go with WellFormed=true.
* Options which are used by very few people tend to get tested less
* Escaping is an area of code where we should be very conservative
* Having escaping rules depend on making assumptions about which
characters various browsers consider "whitespace" is scary
* $wgWellFormedXml=false has had a negative security impact in the
past (Usually not directly its fault, but has made other bugs
more exploitable)
* Saving a couple bytes (even less bytes after gzip taken into
account) is really not worth it in this context (imho).

Incidentally, this backports the removal of the space before the
closing '/>' of a self-closed tag from dd2d7d0ffc.

Bug: T57548
Change-Id: I5c922e0980d3f9eb39adb5bb5833e158afda42ed
---
M includes/DefaultSettings.php
M includes/Html.php
M tests/parser/parserTest.inc
M tests/parser/parserTests.txt
M tests/phpunit/includes/HtmlTest.php
M tests/phpunit/includes/LinkerTest.php
M tests/phpunit/includes/OutputPageTest.php
M tests/phpunit/includes/XmlSelectTest.php
M tests/phpunit/includes/XmlTest.php
M tests/phpunit/includes/content/JsonContentTest.php
M tests/phpunit/includes/parser/NewParserTest.php
11 files changed, 147 insertions(+), 249 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/core 
refs/changes/54/304154/1

diff --git a/includes/DefaultSettings.php b/includes/DefaultSettings.php
index 7498a02..1a97dac 100644
--- a/includes/DefaultSettings.php
+++ b/includes/DefaultSettings.php
@@ -3091,24 +3091,6 @@
 $wgAllowMicrodataAttributes = false;
 
 /**
- * Should we try to make our HTML output well-formed XML?  If set to false,
- * output will be a few bytes shorter, and the HTML will arguably be more
- * readable.  If set to true, life will be much easier for the authors of
- * screen-scraping bots, and the HTML will arguably be more readable.
- *
- * Setting this to false may omit quotation marks on some attributes, omit
- * slashes from some self-closing tags, omit some ending tags, etc., where
- * permitted by HTML5.  Setting it to true will not guarantee that all pages
- * will be well-formed, although non-well-formed pages should be rare and it's
- * a bug if you find one.  Conversely, setting it to false doesn't mean that
- * all XML-y constructs will be omitted, just that they might be.
- *
- * Because of compatibility with screen-scraping bots, and because it's
- * controversial, this is currently left to true by default.
- */
-$wgWellFormedXml = true;
-
-/**
  * Permit other namespaces in addition to the w3.org default.
  *
  * Use the prefix for the key and the namespace for the value.
diff --git a/includes/Html.php b/includes/Html.php
index 62ae0b8..c49cca4 100644
--- a/includes/Html.php
+++ b/includes/Html.php
@@ -38,8 +38,6 @@
  *
  * $wgMimeType: If this is set to an xml MIME type then output should be
  * valid XHTML5.
- * $wgWellFormedXml: If this is set to true, then all output should be
- * well-formed XML (quotes on attributes, self-closing tags, etc.).
  *
  * This class is meant to be confined to utility functions that are called from
  * trusted code paths.  It does not do enforcement of policy like not allowing
@@ -199,8 +197,7 @@
 * This is quite similar to Xml::tags(), but it implements some useful
 * HTML-specific logic.  For instance, there is no $allowShortTag
 * parameter: the closing tag is magically omitted if $element has an 
empty
-* content model.  If $wgWellFormedXml is false, then a few bytes will 
be
-* shaved off the HTML output as well.
+* content model.
 *
 * @param string $element The element's name, e.g., 'a'
 * @param array $attribs Associative array of attributes, e.g., array(
@@ -211,14 +208,10 @@
 * @return string Raw HTML
 */
public static function rawElement( $element, $attribs = array(), 
$contents = '' ) {
-   global $wgWellFormedXml;
$start = self::openElement( $element, $attribs );
if ( in_array( $element, self::$voidElements ) ) {
-   if ( $wgWellFormedXml ) {
-   // Silly XML.
-   return substr( $start, 0, -1 ) . ' />';
-   }
-   return $start;
+   // Silly XML.
+   return substr( $start, 0, -1 ) . '/>';