[MediaWiki-commits] [Gerrit] WTS: Fixes to wikitext escaping for wt tag pairs (headings &... - change (mediawiki...Parsoid)

2013-04-24 Thread jenkins-bot (Code Review)
jenkins-bot has submitted this change and it was merged.

Change subject: WTS: Fixes to wikitext escaping for wt tag pairs (headings & 
links)
..


WTS: Fixes to wikitext escaping for wt tag pairs (headings & links)

* Previous code was accummulating all text that might get output
  as part of a single line and using that to determine if
  "=" and [" chars needing to be escaped.  However, that
  code felt a little convoluted, especially after the rewrite of
  WTS to be entirely DOM-based.

* Fixed the escaping to accummulate text on a line after it is output
  and use that to decide escaping strategies for "=", "[", "]" chars.

* This now changes serialized wikitext output.
  HTML: =foo''a''= ==> WT: ==foo''a''==

  Earlier, this used to be
  HTML: =foo''a''= ==> WT: ==foo''a''==

  Similarly, escaping of "]" and "[" chars has changed.

* Based on the change, parserTests.txt need updating, but updated
  our local copy of wt_escape.tests.txt that I often use when testing
  wikitext escaping. The fixes exposed some bugs in older code.

  1. = was being serialized to "===" earlier which is buggy.

  2. x [Foobar] x was being
 serialized to [[Foo| x [Foobar] x]] but the
 nowiki-escaping was unnecessary.

  3. ]]bar [[foo]] serialization led to "]]bar"
 being nowiki-escaped which was unnecessary.

  4. [[Foo|x [http://google.com] x]] is parsed incorrectly compared to
 the PHP parser that doesn't convert [http://google.com] to a link.
 Currently, the wikitext escaping code does not escape the link-like
 text in the middle, which can lead to failures since Parsoid
 converts the text in the middle to a link.  What is the right fix
 here?

* Fixed in a bug in 'wikilinkHandler' that was not escaping text in
  this (and similar) examples: ]]bar

  Updated wt_escaping tests with these new tests.

* Parser tests results:
  - 2 more wt2wt tests passing
  - 1 more html2html test passing
  - 20 more selser tests passing

  - 6 *fewer* html2wt tests passing, but this is because of the
change in expect serialized output as explained above. Once
parserTests.txt is fixed, this number will go back up.

* Blacklist file updated accordingly.

Change-Id: Ib784416f00c1fd23d41ce0c7c7d21b675d981e86
---
M js/lib/mediawiki.DOMUtils.js
M js/lib/mediawiki.WikitextSerializer.js
M js/tests/parserTests-blacklist.js
M js/tests/wt_escape.tests.txt
4 files changed, 150 insertions(+), 248 deletions(-)

Approvals:
  GWicke: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/js/lib/mediawiki.DOMUtils.js b/js/lib/mediawiki.DOMUtils.js
index d105fad..42011ea 100644
--- a/js/lib/mediawiki.DOMUtils.js
+++ b/js/lib/mediawiki.DOMUtils.js
@@ -13,6 +13,10 @@
return node.nodeType === Node.ELEMENT_NODE;
},
 
+   isText: function(node) {
+   return node.nodeType === Node.TEXT_NODE;
+   },
+
isBlockNode: function(node) {
return node && Util.isBlockTag(node.nodeName.toLowerCase());
},
@@ -463,9 +467,8 @@
 * Is a node representing inter-element ws?
 */
isIEW: function (node) {
-   return node.nodeType === node.TEXT_NODE &&
-   // ws-only
-   node.nodeValue.match(/^\s*$/);
+   // ws-only
+   return this.isText(node) && node.nodeValue.match(/^\s*$/);
},
 
isContentNode: function(node) {
diff --git a/js/lib/mediawiki.WikitextSerializer.js 
b/js/lib/mediawiki.WikitextSerializer.js
index 19044c2..8f54e78 100644
--- a/js/lib/mediawiki.WikitextSerializer.js
+++ b/js/lib/mediawiki.WikitextSerializer.js
@@ -139,31 +139,27 @@
 
 WEHP.urlParser = new PegTokenizer();
 
-WEHP.headingHandler = function(state, text) {
-   // replace heading-handler with the default handler
-   // only "=" at the extremities trigger escaping
-
-   if (state.currLine && !state.currLine.seen) {
-   var line = state.currLine.text,
-   len  = line ? line.length : 0;
-   return (line && len > 2 && (line[0] === '=') && (line[len-1] 
=== '='));
+WEHP.headingHandler = function(headingNode, state, text, opts) {
+   // Only "=" at the extremities trigger escaping
+   if (opts.isLastChild && DU.isText(headingNode.firstChild)) {
+   var line = state.currLine.text;
+   if (line.length === 0) {
+   line = text;
+   }
+   return line[0] === '=' &&
+   text && text.length > 0 && text[text.length-1] === '=';
} else {
return false;
}
 };
 
-WEHP.liHandler = function(state, text) {
-   // replace li-handler with the default handler
-   // only bullets at the beginning of the list trigger escaping
-   if (state.currLine && !state.currLine.seen) {
+WEHP.liHandler = function(liNode, state, text) {
+   // On

[MediaWiki-commits] [Gerrit] WTS: Fixes to wikitext escaping for wt tag pairs (headings &... - change (mediawiki...Parsoid)

2013-04-22 Thread Subramanya Sastry (Code Review)
Subramanya Sastry has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/60300


Change subject: WTS: Fixes to wikitext escaping for wt tag pairs (headings & 
links)
..

WTS: Fixes to wikitext escaping for wt tag pairs (headings & links)

* Previous code was accumulating all text that might get output
  as part of a single line and using that to determine if
  "=" and [" chars needing to be escaped.  However, that
  code felt a little convoluted, especially after the rewrite of
  WTS to be entirely DOM-based.

* Fixed the escaping to accumulate text on a line after it is output
  and use that to decide escaping strategies for "=", "[", "]" chars.

* This now changes serialized wikitext output.
  HTML: =foo''a''= ==> WT: ==foo''a''==

  Earlier, this used to be
  HTML: =foo''a''= ==> WT: ==foo''a''==

  Similarly, escaping of "]" and "[" chars has changed.

* Based on the change, parserTests.txt need updating, but updated
  our local copy of wt_escape.tests.txt that I often use when testing
  wikitext escaping. The fixes exposed some bugs in older code.

  1. = was being serialied to "===" earlier which is buggy.

  2. x [Foobar] x was being
 serialized to [[Foo| x [Foobar] x]] but the
 nowiki escaping was unnecessary.

  3. [[Foo|x [http://google.com] x]] is parsed incorrectly compared to
 the PHP parser that doesn't convert [http://google.com] to a link.
 Currently, the wikitext escaping code does not escape the link-like
 text in the middle, which can lead to failures since Parsoid
 converts the text in the middle to a link.  What is the right fix
 here?

* Parser tests results:
  - 2 more wt2wt tests passing
  - 1 more html2html test passing
  - 20 more selser tests passing

  - 6 *fewer* html2wt tests passing, but this is because of the
change in expect serialized output as explained above. Once
parserTests.txt is fixed, this number will go back up.

Change-Id: Ib784416f00c1fd23d41ce0c7c7d21b675d981e86
---
M js/lib/mediawiki.DOMUtils.js
M js/lib/mediawiki.WikitextSerializer.js
M js/tests/wt_escape.tests.txt
3 files changed, 137 insertions(+), 228 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/Parsoid 
refs/changes/00/60300/1

diff --git a/js/lib/mediawiki.DOMUtils.js b/js/lib/mediawiki.DOMUtils.js
index d105fad..42011ea 100644
--- a/js/lib/mediawiki.DOMUtils.js
+++ b/js/lib/mediawiki.DOMUtils.js
@@ -13,6 +13,10 @@
return node.nodeType === Node.ELEMENT_NODE;
},
 
+   isText: function(node) {
+   return node.nodeType === Node.TEXT_NODE;
+   },
+
isBlockNode: function(node) {
return node && Util.isBlockTag(node.nodeName.toLowerCase());
},
@@ -463,9 +467,8 @@
 * Is a node representing inter-element ws?
 */
isIEW: function (node) {
-   return node.nodeType === node.TEXT_NODE &&
-   // ws-only
-   node.nodeValue.match(/^\s*$/);
+   // ws-only
+   return this.isText(node) && node.nodeValue.match(/^\s*$/);
},
 
isContentNode: function(node) {
diff --git a/js/lib/mediawiki.WikitextSerializer.js 
b/js/lib/mediawiki.WikitextSerializer.js
index 19044c2..4346878 100644
--- a/js/lib/mediawiki.WikitextSerializer.js
+++ b/js/lib/mediawiki.WikitextSerializer.js
@@ -139,23 +139,23 @@
 
 WEHP.urlParser = new PegTokenizer();
 
-WEHP.headingHandler = function(state, text) {
-   // replace heading-handler with the default handler
-   // only "=" at the extremities trigger escaping
-
-   if (state.currLine && !state.currLine.seen) {
-   var line = state.currLine.text,
-   len  = line ? line.length : 0;
-   return (line && len > 2 && (line[0] === '=') && (line[len-1] 
=== '='));
+WEHP.headingHandler = function(headingNode, state, text, opts) {
+   // Only "=" at the extremities trigger escaping
+   if (opts.isLastChild && DU.isText(headingNode.firstChild)) {
+   var line = state.currLine.text;
+   if (line.length === 0) {
+   line = text;
+   }
+   return line[0] === '=' &&
+   text && text.length > 0 && text[text.length-1] === '=';
} else {
return false;
}
 };
 
-WEHP.liHandler = function(state, text) {
-   // replace li-handler with the default handler
-   // only bullets at the beginning of the list trigger escaping
-   if (state.currLine && !state.currLine.seen) {
+WEHP.liHandler = function(liNode, state, text) {
+   // Only bullets at the beginning of the list trigger escaping
+   if (DU.isText(liNode.firstChild) && state.currLine.text === '') {
return text.match(/^[#\*:;]/);
} else {
return false;
@@ -184,14 +184,8 @@