Tim Starling has submitted this change and it was merged.

Change subject: New Grizzly version
......................................................................


New Grizzly version

* Migrate from Ant to Maven. Rearrange files per Maven standard. Add
  dependencies.
* Migrate from servlet to Grizzly. The servlet class is still there but
  doesn't work at the moment.
* Rename from HTML5Tidy to HTML5Depurate, so that there is less chance
  of collision.
* Add CC0 license.

Change-Id: Ifb1e01191b9975a97897eb8a84d61a5b60b3dc26
---
M .gitreview
A LICENSE.TXT
R build-tomcat.xml
A pom.xml
A src/main/java/org/wikimedia/html5depurate/Config.java
A src/main/java/org/wikimedia/html5depurate/DepurateDaemon.java
A src/main/java/org/wikimedia/html5depurate/DepurateErrorPageGenerator.java
A src/main/java/org/wikimedia/html5depurate/DepurateHandler.java
A src/main/java/org/wikimedia/html5depurate/Depurator.java
A src/main/java/org/wikimedia/html5depurate/MultipartBuffer.java
R src/webapp/org/wikimedia/html5depurate/DepurateServlet.java
11 files changed, 569 insertions(+), 22 deletions(-)

Approvals:
  Tim Starling: Verified; Looks good to me, approved



diff --git a/.gitreview b/.gitreview
index c2fce0c..da1e076 100644
--- a/.gitreview
+++ b/.gitreview
@@ -1,6 +1,6 @@
 [gerrit]
 host=gerrit.wikimedia.org
 port=29418
-project=mediawiki/services/html5tidy.git
+project=mediawiki/services/html5depurate.git
 defaultbranch=master
 defaultrebase=0
diff --git a/LICENSE.TXT b/LICENSE.TXT
new file mode 100644
index 0000000..1373246
--- /dev/null
+++ b/LICENSE.TXT
@@ -0,0 +1,119 @@
+CC0 1.0 Universal
+
+    CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
+    LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
+    ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
+    INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
+    REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
+    PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
+    THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
+    HEREUNDER.
+
+Statement of Purpose
+
+The laws of most jurisdictions throughout the world automatically confer
+exclusive Copyright and Related Rights (defined below) upon the creator
+and subsequent owner(s) (each and all, an "owner") of an original work of
+authorship and/or a database (each, a "Work").
+
+Certain owners wish to permanently relinquish those rights to a Work for
+the purpose of contributing to a commons of creative, cultural and
+scientific works ("Commons") that the public can reliably and without fear
+of later claims of infringement build upon, modify, incorporate in other
+works, reuse and redistribute as freely as possible in any form whatsoever
+and for any purposes, including without limitation commercial purposes.
+These owners may contribute to the Commons to promote the ideal of a free
+culture and the further production of creative, cultural and scientific
+works, or to gain reputation or greater distribution for their Work in
+part through the use and efforts of others.
+
+For these and/or other purposes and motivations, and without any
+expectation of additional consideration or compensation, the person
+associating CC0 with a Work (the "Affirmer"), to the extent that he or she
+is an owner of Copyright and Related Rights in the Work, voluntarily
+elects to apply CC0 to the Work and publicly distribute the Work under its
+terms, with knowledge of his or her Copyright and Related Rights in the
+Work and the meaning and intended legal effect of CC0 on those rights.
+
+1. Copyright and Related Rights. A Work made available under CC0 may be
+protected by copyright and related or neighboring rights ("Copyright and
+Related Rights"). Copyright and Related Rights include, but are not
+limited to, the following:
+
+  i. the right to reproduce, adapt, distribute, perform, display,
+     communicate, and translate a Work;
+ ii. moral rights retained by the original author(s) and/or performer(s);
+iii. publicity and privacy rights pertaining to a person's image or
+     likeness depicted in a Work;
+ iv. rights protecting against unfair competition in regards to a Work,
+     subject to the limitations in paragraph 4(a), below;
+  v. rights protecting the extraction, dissemination, use and reuse of data
+     in a Work;
+ vi. database rights (such as those arising under Directive 96/9/EC of the
+     European Parliament and of the Council of 11 March 1996 on the legal
+     protection of databases, and under any national implementation
+     thereof, including any amended or successor version of such
+     directive); and
+vii. other similar, equivalent or corresponding rights throughout the
+     world based on applicable law or treaty, and any national
+     implementations thereof.
+
+2. Waiver. To the greatest extent permitted by, but not in contravention
+of, applicable law, Affirmer hereby overtly, fully, permanently,
+irrevocably and unconditionally waives, abandons, and surrenders all of
+Affirmer's Copyright and Related Rights and associated claims and causes
+of action, whether now known or unknown (including existing as well as
+future claims and causes of action), in the Work (i) in all territories
+worldwide, (ii) for the maximum duration provided by applicable law or
+treaty (including future time extensions), (iii) in any current or future
+medium and for any number of copies, and (iv) for any purpose whatsoever,
+including without limitation commercial, advertising or promotional
+purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each
+member of the public at large and to the detriment of Affirmer's heirs and
+successors, fully intending that such Waiver shall not be subject to
+revocation, rescission, cancellation, termination, or any other legal or
+equitable action to disrupt the quiet enjoyment of the Work by the public
+as contemplated by Affirmer's express Statement of Purpose.
+
+3. Public License Fallback. Should any part of the Waiver for any reason
+be judged legally invalid or ineffective under applicable law, then the
+Waiver shall be preserved to the maximum extent permitted taking into
+account Affirmer's express Statement of Purpose. In addition, to the
+extent the Waiver is so judged Affirmer hereby grants to each affected
+person a royalty-free, non transferable, non sublicensable, non exclusive,
+irrevocable and unconditional license to exercise Affirmer's Copyright and
+Related Rights in the Work (i) in all territories worldwide, (ii) for the
+maximum duration provided by applicable law or treaty (including future
+time extensions), (iii) in any current or future medium and for any number
+of copies, and (iv) for any purpose whatsoever, including without
+limitation commercial, advertising or promotional purposes (the
+"License"). The License shall be deemed effective as of the date CC0 was
+applied by Affirmer to the Work. Should any part of the License for any
+reason be judged legally invalid or ineffective under applicable law, such
+partial invalidity or ineffectiveness shall not invalidate the remainder
+of the License, and in such case Affirmer hereby affirms that he or she
+will not (i) exercise any of his or her remaining Copyright and Related
+Rights in the Work or (ii) assert any associated claims and causes of
+action with respect to the Work, in either case contrary to Affirmer's
+express Statement of Purpose.
+
+4. Limitations and Disclaimers.
+
+ a. No trademark or patent rights held by Affirmer are waived, abandoned,
+    surrendered, licensed or otherwise affected by this document.
+ b. Affirmer offers the Work as-is and makes no representations or
+    warranties of any kind concerning the Work, express, implied,
+    statutory or otherwise, including without limitation warranties of
+    title, merchantability, fitness for a particular purpose, non
+    infringement, or the absence of latent or other defects, accuracy, or
+    the present or absence of errors, whether or not discoverable, all to
+    the greatest extent permissible under applicable law.
+ c. Affirmer disclaims responsibility for clearing rights of other persons
+    that may apply to the Work or any use thereof, including without
+    limitation any person's Copyright and Related Rights in the Work.
+    Further, Affirmer disclaims responsibility for obtaining any necessary
+    consents, permissions or other rights required for any use of the
+    Work.
+ d. Affirmer understands and acknowledges that Creative Commons is not a
+    party to this document and has no duty or obligation with respect to
+    this CC0 or use of the Work.
diff --git a/build.xml b/build-tomcat.xml
similarity index 100%
rename from build.xml
rename to build-tomcat.xml
diff --git a/pom.xml b/pom.xml
new file mode 100644
index 0000000..e726baa
--- /dev/null
+++ b/pom.xml
@@ -0,0 +1,42 @@
+<project xmlns="http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd";>
+  <modelVersion>4.0.0</modelVersion>
+  <groupId>org.wikimedia</groupId>
+  <artifactId>html5depurate</artifactId>
+  <packaging>jar</packaging>
+  <version>1.0-SNAPSHOT</version>
+  <name>html5depurate</name>
+  <url>http://maven.apache.org</url>
+  <dependencies>
+    <dependency>
+      <groupId>nu.validator</groupId>
+      <artifactId>htmlparser</artifactId>
+      <version>1.4.1</version>
+    </dependency>
+    <dependency>
+      <groupId>commons-daemon</groupId>
+      <artifactId>commons-daemon</artifactId>
+      <version>1.0.15</version>
+    </dependency>
+    <dependency>
+      <groupId>commons-cli</groupId>
+      <artifactId>commons-cli</artifactId>
+      <version>1.3.1</version>
+    </dependency>
+    <dependency>
+      <groupId>org.glassfish.grizzly</groupId>
+      <artifactId>grizzly-framework</artifactId>
+      <version>2.3.22</version>
+    </dependency>
+    <dependency>
+      <groupId>org.glassfish.grizzly</groupId>
+      <artifactId>grizzly-http-server</artifactId>
+      <version>2.3.22</version>
+    </dependency>
+    <dependency>
+      <groupId>org.glassfish.grizzly</groupId>
+      <artifactId>grizzly-http-server-multipart</artifactId>
+      <version>2.3.22</version>
+    </dependency>
+  </dependencies>
+</project>
diff --git a/src/main/java/org/wikimedia/html5depurate/Config.java 
b/src/main/java/org/wikimedia/html5depurate/Config.java
new file mode 100644
index 0000000..bb9b34e
--- /dev/null
+++ b/src/main/java/org/wikimedia/html5depurate/Config.java
@@ -0,0 +1,9 @@
+package org.wikimedia.html5depurate;
+
+class Config {
+       public Config() {}
+
+       int maxPostSize;
+       String host;
+       int port;
+}
diff --git a/src/main/java/org/wikimedia/html5depurate/DepurateDaemon.java 
b/src/main/java/org/wikimedia/html5depurate/DepurateDaemon.java
new file mode 100644
index 0000000..a171c7b
--- /dev/null
+++ b/src/main/java/org/wikimedia/html5depurate/DepurateDaemon.java
@@ -0,0 +1,114 @@
+package org.wikimedia.html5depurate;
+
+import org.wikimedia.html5depurate.Config;
+import org.wikimedia.html5depurate.DepurateErrorPageGenerator;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.DefaultParser;
+import org.apache.commons.cli.Options;
+import org.apache.commons.cli.ParseException;
+import org.apache.commons.daemon.Daemon;
+import org.apache.commons.daemon.DaemonContext;
+import org.apache.commons.daemon.DaemonInitException;
+
+import org.glassfish.grizzly.http.server.HttpServer;
+import org.glassfish.grizzly.http.server.NetworkListener;
+import org.glassfish.grizzly.http.server.ServerConfiguration;
+
+import java.io.FileNotFoundException;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.util.Properties;
+import java.util.logging.Logger;
+import java.util.logging.Level;
+
+/**
+ * Daemon for execution via jsvc
+ */
+public class DepurateDaemon implements Daemon {
+       HttpServer m_server;
+       String[] m_args;
+       Logger m_logger = Logger.getLogger(this.getClass().getName());
+
+       public static void main(String[] args) throws Exception {
+               DepurateDaemon daemon = new DepurateDaemon();
+               daemon.m_args = args;
+               daemon.start();
+               Thread.currentThread().join();
+       }
+
+       public void init(DaemonContext context)
+                       throws DaemonInitException,Exception
+       {
+               m_args = context.getArguments();
+       }
+
+       protected CommandLine loadCommandLine()
+                       throws IOException, ParseException
+       {
+               Options options = new Options();
+               options.addOption("c", true, "The configuration file name");
+               DefaultParser parser = new DefaultParser();
+               return parser.parse(options, m_args);
+       }
+
+       protected Config loadConfig(String path) {
+               Config config = new Config();
+               Properties properties = new Properties();
+               try {
+                       properties.load(new FileInputStream(path));
+               } catch (FileNotFoundException e) {
+                       m_logger.warning("Config file not found: " + path);
+               } catch (IOException e) {
+                       m_logger.warning("Error loading config file: " + 
e.toString());
+               }
+
+               double maxPostSize = Double.parseDouble(
+                               properties.getProperty("maxPostSize", "100e6"));
+               if (maxPostSize > Integer.MAX_VALUE) {
+                       config.maxPostSize = Integer.MAX_VALUE;
+               } else if (maxPostSize > 0) {
+                       config.maxPostSize = (int)maxPostSize;
+               } else {
+                       config.maxPostSize = 100000000;
+               }
+               m_logger.info("Max post size: " + config.maxPostSize);
+
+               config.host = properties.getProperty("host", "localhost");
+               config.port = Integer.parseInt(properties.getProperty("port", 
"4339"));
+               m_logger.info("Binding to " + config.host + ":" + 
Integer.toString(config.port));
+
+               return config;
+       }
+
+       public void start() throws Exception {
+               m_logger.info("Starting");
+
+               CommandLine cl = loadCommandLine();
+               String configPath = "/etc/html5depurate/html5depurate.conf";
+               if (cl.hasOption("c")) {
+                       configPath = cl.getOptionValue("c");
+               }
+               Config config = loadConfig(configPath);
+
+               m_server = new HttpServer();
+               m_server.addListener(
+                               new NetworkListener("depurate", config.host, 
config.port));
+
+               ServerConfiguration serverConf = 
m_server.getServerConfiguration();
+               serverConf.addHttpHandler(new DepurateHandler(config), 
"/depurate");
+               serverConf.setDefaultErrorPageGenerator(new 
DepurateErrorPageGenerator());
+               serverConf.setName("depurate");
+               m_server.start();
+       }
+
+       public void stop() throws Exception {
+               m_logger.info("Stopping");
+               if (m_server != null) {
+                       m_server.shutdownNow();
+               }
+       }
+
+       public void destroy() {
+       }
+}
diff --git 
a/src/main/java/org/wikimedia/html5depurate/DepurateErrorPageGenerator.java 
b/src/main/java/org/wikimedia/html5depurate/DepurateErrorPageGenerator.java
new file mode 100644
index 0000000..280cbf5
--- /dev/null
+++ b/src/main/java/org/wikimedia/html5depurate/DepurateErrorPageGenerator.java
@@ -0,0 +1,27 @@
+package org.wikimedia.html5depurate;
+
+import org.glassfish.grizzly.http.server.ErrorPageGenerator;
+import org.glassfish.grizzly.http.server.DefaultErrorPageGenerator;
+import org.glassfish.grizzly.http.server.Request;
+import org.glassfish.grizzly.http.util.HttpStatus;
+
+public class DepurateErrorPageGenerator implements ErrorPageGenerator {
+       DefaultErrorPageGenerator def = new DefaultErrorPageGenerator();
+
+       @Override
+       public String generate(final Request request,
+                       final int status, final String reasonPhrase,
+                       final String description, final Throwable exception)
+       {
+               String realReasonPhrase;
+               if (reasonPhrase.equals(description)) {
+                       realReasonPhrase = 
HttpStatus.getHttpStatus(status).getReasonPhrase();
+               } else {
+                       realReasonPhrase = reasonPhrase;
+               }
+               return def.generate(request, status,
+                               "\n" + realReasonPhrase + "\n",
+                               "\n" + description + "\n",
+                               exception);
+       }
+}
diff --git a/src/main/java/org/wikimedia/html5depurate/DepurateHandler.java 
b/src/main/java/org/wikimedia/html5depurate/DepurateHandler.java
new file mode 100644
index 0000000..ea5262b
--- /dev/null
+++ b/src/main/java/org/wikimedia/html5depurate/DepurateHandler.java
@@ -0,0 +1,116 @@
+package org.wikimedia.html5depurate;
+
+import org.wikimedia.html5depurate.Config;
+import org.wikimedia.html5depurate.MultipartBuffer;
+import org.wikimedia.html5depurate.Depurator;
+
+import org.glassfish.grizzly.http.multipart.MultipartScanner;
+import org.glassfish.grizzly.http.server.HttpHandler;
+import org.glassfish.grizzly.http.server.Request;
+import org.glassfish.grizzly.http.server.Response;
+import org.glassfish.grizzly.http.util.HttpStatus;
+import org.glassfish.grizzly.EmptyCompletionHandler;
+
+import java.io.InputStream;
+import java.io.IOException;
+import java.io.ByteArrayInputStream;
+import java.io.StringReader;
+import java.util.logging.Logger;
+import java.util.logging.Level;
+import org.xml.sax.InputSource;
+import org.xml.sax.SAXException;
+
+class DepurateHandler extends HttpHandler {
+       final private Config m_config;
+       Logger m_logger = Logger.getLogger(this.getClass().getName());
+
+       DepurateHandler(Config config) {
+               super("depurate");
+               m_config = config;
+       }
+
+       @Override
+       public void service(final Request request, final Response response)
+                       throws Exception
+       {
+               response.suspend();
+               request.setCharacterEncoding("UTF-8");
+               final MultipartBuffer buf = new 
MultipartBuffer(m_config.maxPostSize);
+
+               MultipartScanner.scan(request,
+                       buf,
+                       new EmptyCompletionHandler<Request>() {
+                               @Override
+                               public void completed(final Request request) {
+                                       depurate(request, response, buf);
+                                       response.resume();
+                               }
+
+                               @Override
+                               public void failed(Throwable throwable) {
+                                       depurate(request, response, buf);
+                                       response.resume();
+                               }
+                       }
+               );
+       }
+
+       private void depurate(
+                       final Request request, Response response, 
MultipartBuffer multi)
+       {
+               try {
+                       if (multi.isTooBig()) {
+                               sendError(response, 400, "The POST size was too 
large");
+                               return;
+                       }
+                       String text = request.getParameter("text");
+                       InputSource source = null;
+                       if (text != null) {
+                               StringReader sr = new StringReader(text);
+                               source = new InputSource(sr);
+                               m_logger.log(Level.INFO, "Depurating {0} chars 
of URL input",
+                                               text.length());
+                       } else {
+                               byte[] textBytes = multi.getParameter("text");
+                               if (textBytes != null) {
+                                       m_logger.log(Level.INFO, 
+                                                       "Depurating {0} bytes 
of multipart input",
+                                                       textBytes.length);
+                                       InputStream stream = new 
ByteArrayInputStream(textBytes);
+                                       source = new InputSource(stream);
+                                       source.setEncoding("UTF-8");
+                               }
+                       }
+                       if (source == null) {
+                               sendError(response, 400, "The text parameter 
must be given");
+                               return;
+                       }
+
+                       byte[] outputBytes;
+                       try {
+                               outputBytes = Depurator.depurate(source);
+                       } catch (SAXException e) {
+                               m_logger.info("Error running depurator");
+                               sendError(response, 500, "Error parsing HTML: " 
+ e.toString());
+                               return;
+                       }
+
+                       response.setContentType("text/html;charset=UTF-8");
+                       response.setContentLength(outputBytes.length);
+                       response.setBufferSize(outputBytes.length);
+                       response.getOutputStream().write(outputBytes);
+               } catch (IOException e) {
+                       m_logger.warning("Got IOException: " + e.toString());
+                       sendError(response, 500, "Got IOException: " + 
e.toString());
+               }
+       }
+
+       private void sendError(Response response, int code, String message) {
+               response.getResponse().setAllowCustomReasonPhrase(false);
+               try {
+                       response.sendError(code, message);
+               } catch (IOException e) {
+                       m_logger.warning("Got IOException while sending error: 
" + e.toString());
+               }
+       }
+}
diff --git a/src/main/java/org/wikimedia/html5depurate/Depurator.java 
b/src/main/java/org/wikimedia/html5depurate/Depurator.java
new file mode 100644
index 0000000..644d242
--- /dev/null
+++ b/src/main/java/org/wikimedia/html5depurate/Depurator.java
@@ -0,0 +1,27 @@
+package org.wikimedia.html5depurate;
+
+import nu.validator.htmlparser.common.XmlViolationPolicy;
+import nu.validator.htmlparser.sax.HtmlParser;
+import nu.validator.htmlparser.sax.HtmlSerializer;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import org.xml.sax.InputSource;
+import org.xml.sax.SAXException;
+import org.xml.sax.ContentHandler;
+
+
+class Depurator {
+       public static byte[] depurate(InputSource source)
+               throws SAXException, IOException
+       {
+               ByteArrayOutputStream sink = new ByteArrayOutputStream();
+               ContentHandler serializer = new HtmlSerializer(sink);
+               HtmlParser parser = new HtmlParser(XmlViolationPolicy.ALLOW);
+               parser.setContentHandler(serializer);
+               
parser.setProperty("http://xml.org/sax/properties/lexical-handler";,
+                               serializer);
+               parser.parse(source);
+               return sink.toByteArray();
+       }
+}
diff --git a/src/main/java/org/wikimedia/html5depurate/MultipartBuffer.java 
b/src/main/java/org/wikimedia/html5depurate/MultipartBuffer.java
new file mode 100644
index 0000000..4fadc53
--- /dev/null
+++ b/src/main/java/org/wikimedia/html5depurate/MultipartBuffer.java
@@ -0,0 +1,106 @@
+package org.wikimedia.html5depurate;
+
+import org.glassfish.grizzly.Buffer;
+import org.glassfish.grizzly.ReadHandler;
+import org.glassfish.grizzly.http.io.NIOInputStream;
+import org.glassfish.grizzly.http.multipart.MultipartEntryHandler;
+import org.glassfish.grizzly.http.multipart.MultipartEntry;
+import org.glassfish.grizzly.http.multipart.ContentDisposition;
+import org.glassfish.grizzly.http.multipart.MultipartEntry;
+
+import java.io.ByteArrayOutputStream;
+import java.util.HashMap;
+
+class MultipartBuffer implements MultipartEntryHandler {
+       private HashMap<String, byte[]> m_params;
+       private int m_size;
+       private int m_maxSize;
+       private NIOInputStream m_stream;
+       private boolean m_tooBig;
+
+       private class MultipartBufferReadHandler implements ReadHandler {
+               private String m_name;
+               private ByteArrayOutputStream m_largeBuffer = new 
ByteArrayOutputStream();
+               private byte[] m_smallBuffer = new byte[8192];
+
+               private MultipartBufferReadHandler(NIOInputStream stream, 
String name) {
+                       m_stream = stream;
+                       m_name = name;
+               }
+
+               @Override
+               public void onDataAvailable() throws Exception {
+                       readAndSaveAvail();
+                       m_stream.notifyAvailable(this);
+               }
+
+               @Override
+               public void onAllDataRead() throws Exception {
+                       readAndSaveAvail();
+                       finish();
+               }
+
+               @Override
+               public void onError(Throwable t) {
+                       finish();
+               }
+
+               private void readAndSaveAvail() throws Exception {
+                       while (m_stream.isReady()) {
+                               int bytesRead = m_stream.read(m_smallBuffer);
+                               if (incrementSize(bytesRead)) {
+                                       m_largeBuffer.write(m_smallBuffer, 0, 
bytesRead);
+                               }
+                       }
+               }
+
+               private void finish() {
+                       m_params.put(m_name, m_largeBuffer.toByteArray());
+               }
+       }
+
+       public MultipartBuffer(int maxSize) {
+               m_params = new HashMap<String, byte[]>();
+               m_size = 0;
+               m_maxSize = maxSize;
+       }
+
+       @Override
+       public void handle(MultipartEntry entry) throws Exception {
+               ContentDisposition disposition = entry.getContentDisposition();
+               String name = disposition.getDispositionParamUnquoted("name");
+               if (isTooBig()) {
+                       entry.skip();
+               } else if (name != null) {
+                       NIOInputStream stream = entry.getNIOInputStream();
+                       MultipartBufferReadHandler rh = new 
MultipartBufferReadHandler(
+                                       stream, name);
+                       stream.notifyAvailable(rh);
+               } else {
+                       entry.skip();
+               }
+       }
+
+       private boolean incrementSize(int size) throws Exception {
+               if (m_tooBig) {
+                       return false;
+               } else if (m_size >= m_maxSize - size) {
+                       m_tooBig = true;
+                       return false;
+               } else {
+                       m_size += size;
+                       return true;
+               }
+       }
+
+       public byte[] getParameter(String key) throws RuntimeException {
+               if (m_tooBig) {
+                       throw new RuntimeException("Maximum POST size 
exceeded");
+               }
+               return m_params.get(key);
+       }
+
+       public boolean isTooBig() {
+               return m_tooBig;
+       }
+}
diff --git a/src/org/wikimedia/html5tidy/HTML5Tidy.java 
b/src/webapp/org/wikimedia/html5depurate/DepurateServlet.java
similarity index 66%
rename from src/org/wikimedia/html5tidy/HTML5Tidy.java
rename to src/webapp/org/wikimedia/html5depurate/DepurateServlet.java
index 93b8efe..92b3bb7 100644
--- a/src/org/wikimedia/html5tidy/HTML5Tidy.java
+++ b/src/webapp/org/wikimedia/html5depurate/DepurateServlet.java
@@ -1,8 +1,7 @@
 /*
  * License: http://creativecommons.org/publicdomain/zero/1.0/
  */
-
-package org.wikimedia.html5tidy;
+package org.wikimedia.html5depurate;
 
 import javax.servlet.http.HttpServlet;
 import javax.servlet.http.HttpServletRequest;
@@ -18,21 +17,15 @@
 import java.io.ByteArrayInputStream;
 import java.io.ByteArrayOutputStream;
 
-import java.util.Enumeration;
-
 import java.nio.charset.Charset;
 
-import org.xml.sax.ContentHandler;
 import org.xml.sax.InputSource;
 import org.xml.sax.SAXException;
 
 import nu.validator.encoding.Encoding;
-import nu.validator.htmlparser.common.XmlViolationPolicy;
-import nu.validator.htmlparser.sax.HtmlParser;
-import nu.validator.htmlparser.sax.HtmlSerializer;
 
 @MultipartConfig()
-public class HTML5Tidy extends HttpServlet {
+public class DepurateServlet extends HttpServlet {
        public void doPost(HttpServletRequest req, HttpServletResponse res)
                        throws ServletException, IOException
        {
@@ -63,24 +56,18 @@
                        return;
                }
 
-               // Set up the parser and run it
-               ByteArrayOutputStream sink = new ByteArrayOutputStream();
-               ContentHandler serializer = new HtmlSerializer(sink);
-               HtmlParser parser = new HtmlParser(XmlViolationPolicy.ALLOW);
-               parser.setContentHandler(serializer);
+               InputSource source = new InputSource(stream);
+               source.setEncoding("UTF-8");
+               byte[] outputBytes;
                try {
-                       
parser.setProperty("http://xml.org/sax/properties/lexical-handler";,
-                                       serializer);
-                       InputSource source = new InputSource(stream);
-                       source.setEncoding("UTF-8");
-                       parser.parse(source);
+                       outputBytes = Depurator.depurate(source);
                } catch (SAXException e) {
                        throw new ServletException("Error parsing HTML", e);
                }
 
                // HtmlSerializer writes UTF-8 by default
                res.setContentType("text/html;charset=UTF-8");
-
-               res.getOutputStream().write(sink.toByteArray());
+               res.setContentLength(outputBytes.length);
+               res.getOutputStream().write(outputBytes);
        }
 };

-- 
To view, visit https://gerrit.wikimedia.org/r/232444
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ifb1e01191b9975a97897eb8a84d61a5b60b3dc26
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/services/html5depurate
Gerrit-Branch: master
Gerrit-Owner: Tim Starling <tstarl...@wikimedia.org>
Gerrit-Reviewer: Tim Starling <tstarl...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to