[cp-patches] FYI: Using the custom DTD for the Swing HTML parser.

Audrius Meskauskas Sun, 16 Jul 2006 07:56:32 -0700

Fixing the 28392, I have concluded that HTMLEditorKit is getting moreand more unnecessarily complicated functionality, and that thesuggestions of Roman and others (discussed in Brussels) to have thecustom DTD model for our Swing are probably correct. This patchintroduces the HTML_401Swing.java which is derived from HTML_401F.javaand allows us to define additional rules exclusively for the parser ofthe HTMLDocument. It will not affect any applications that use theparser directly, creating the instance of the ParserDelegator.

The custom DTD model generates the implied P tags for the top leveldocument body text that is not in a paragraph. It also generates P tagsfor the top level tags like I, B, U, A, FONT and so on, because, if notwrapped into paragraph at the top body level, they cause the sameproblems. The tags are not generated when they are not necessary and areclosed where they end is supposed from the context. The DTD model can beextended to work about more our HTML rendering problems.

The implied paragraph handling in HTMLDocument is no longer needed as isremoved.


2006-07-16  Audrius Meskauskas  <[EMAIL PROTECTED]>

 PR 28392
   * examples/gnu/classpath/examples/swing/HtmlDemo.java:
   Removed heading p tag from the parsing example.
   * gnu/javax/swing/text/html/parser/HTML_401F.java:
   (createHtmlContentModel): Explained.
   (defineElements): Call getBodyElements to get the body
   elements. (getBodyElements): New method. (model):
   Made protected from private.
   * gnu/javax/swing/text/html/parser/htmlValidator.java
   (openTag): Mind that current content model may be null.
   (tagIsValidForContext): If the tag is PCDATA, and it is not
   valid for context, but the paragraph (P) is valid for context,
   suggest to insert the P tag here.
   * javax/swing/text/html/HTMLDocument.java (HTMLReader.addContent,
   HTMLReader.blockOpen, HTMLReader.blockClose): Do not handle
   implied P tags here.
   * javax/swing/text/html/HTMLEditorKit.java (getParser):
   Get the custom parser, using  DTD.
   * javax/swing/text/html/parser/ParserDelegator.java:
   Removed the obsolete note that HTMLEditorKit does not exist.
   * gnu/javax/swing/text/html/parser/GnuParserDelegator.java,
   gnu/javax/swing/text/html/parser/HTML_401Swing.java: New files.

### Eclipse Workspace Patch 1.0
#P classpath
Index: gnu/javax/swing/text/html/parser/htmlValidator.java
===================================================================
RCS file: /sources/classpath/classpath/gnu/javax/swing/text/html/parser/htmlValidator.java,v
retrieving revision 1.3
diff -u -r1.3 htmlValidator.java
--- gnu/javax/swing/text/html/parser/htmlValidator.java	2 Jul 2005 20:32:15 -0000	1.3
+++ gnu/javax/swing/text/html/parser/htmlValidator.java	16 Jul 2006 14:30:40 -0000
@@ -233,7 +233,9 @@
                 Element fe = (Element) v;
 
                 // notify the content model that we add the proposed tag
-                getCurrentContentModel().show(fe);
+                node ccm = getCurrentContentModel();
+                if (ccm != null)
+                  ccm.show(fe);
                 openFictionalTag(fe);
 
                 Object vv = tagIsValidForContext(tElement);
@@ -321,7 +323,7 @@
 
     // Check exclusions and inclusions.
     ListIterator iter = stack.listIterator(stack.size());
-    hTag t;
+    hTag t = null;
     final int idx = tElement.getElement().index;
 
     // Check only known tags.
@@ -343,7 +345,19 @@
               }
           }
         if (!inclusions.get(idx))
-          return Boolean.FALSE;
+          {
+            // If we need to insert the text, and cannot do this, but
+            // it is allowed to insert the paragraph here, insert the 
+            // paragraph.
+            if (tElement.getElement().getName().
+                equalsIgnoreCase(HTML_401F.PCDATA))
+              {
+                Element P = dtd.getElement(HTML_401F.P); 
+                if (inclusions.get(P.index))
+                  return P;
+              }
+            return Boolean.FALSE;
+          }
       }
     return Boolean.TRUE;
   }
Index: gnu/javax/swing/text/html/parser/HTML_401F.java
===================================================================
RCS file: /sources/classpath/classpath/gnu/javax/swing/text/html/parser/HTML_401F.java,v
retrieving revision 1.3
diff -u -r1.3 HTML_401F.java
--- gnu/javax/swing/text/html/parser/HTML_401F.java	2 Jul 2005 20:32:15 -0000	1.3
+++ gnu/javax/swing/text/html/parser/HTML_401F.java	16 Jul 2006 14:30:38 -0000
@@ -759,23 +759,8 @@
       defElement(BODY, 0, true, true, null,
       NONE
       ,
-      new String[] {
-        PCDATA, A, ABBR, ACRONYM,
-        APPLET, B, BASEFONT, BDO, BIG,
-        BR, BUTTON, CITE, CODE, DFN,
-        EM, FONT, I, IFRAME, IMG,
-        INPUT, KBD, LABEL, MAP, OBJECT,
-        Q, S, SAMP, SCRIPT, SELECT,
-        SMALL, SPAN, STRIKE, STRONG, SUB,
-        SUP, TEXTAREA, TT, U, VAR,
-        ADDRESS, BLOCKQUOTE, CENTER, DEL, DIR,
-        DIV, DL, FIELDSET, FORM, H1,
-        H2, H3, H4, H5, H6,
-        HR, INS, ISINDEX, MENU, NOFRAMES,
-        NOSCRIPT, OL, P, PRE, TABLE,
-        UL
-      }
-    ,
+      getBodyElements()
+      ,
       new AttributeList[] {
         attr(sID, null, null, ID, IMPLIED),
         attr(CLASS, null, null, 0, IMPLIED),
@@ -3634,7 +3619,7 @@
    * Crate a content model, consisting of the single
    * element, specified by name.
    */
-  private ContentModel model(String element)
+  protected ContentModel model(String element)
   {
     return new ContentModel(getElement(element));
   }
@@ -3653,7 +3638,7 @@
 
   /**
    * Create the model HEAD, BODY
-   * @return
+   * @return the HTML content model of the whole document
    */
   protected ContentModel createHtmlContentModel()
   {
@@ -3725,5 +3710,27 @@
     li.type = ul.type = ol.type = '|';
     return li;
   }
-
+  
+  /**
+   * Get elements that are allowed in the document body, at the zero level.
+   */
+  protected String[] getBodyElements()
+  {
+    return new String[] {
+        PCDATA, A, ABBR, ACRONYM,
+        APPLET, B, BASEFONT, BDO, BIG,
+        BR, BUTTON, CITE, CODE, DFN,
+        EM, FONT, I, IFRAME, IMG,
+        INPUT, KBD, LABEL, MAP, OBJECT,
+        Q, S, SAMP, SCRIPT, SELECT,
+        SMALL, SPAN, STRIKE, STRONG, SUB,
+        SUP, TEXTAREA, TT, U, VAR,
+        ADDRESS, BLOCKQUOTE, CENTER, DEL, DIR,
+        DIV, DL, FIELDSET, FORM, H1,
+        H2, H3, H4, H5, H6,
+        HR, INS, ISINDEX, MENU, NOFRAMES,
+        NOSCRIPT, OL, P, PRE, TABLE,
+        UL
+      };
+  }
 }
Index: examples/gnu/classpath/examples/swing/HtmlDemo.java
===================================================================
RCS file: /sources/classpath/classpath/examples/gnu/classpath/examples/swing/HtmlDemo.java,v
retrieving revision 1.3
diff -u -r1.3 HtmlDemo.java
--- examples/gnu/classpath/examples/swing/HtmlDemo.java	13 Jul 2006 12:48:58 -0000	1.3
+++ examples/gnu/classpath/examples/swing/HtmlDemo.java	16 Jul 2006 14:30:30 -0000
@@ -65,7 +65,7 @@
   
   JTextPane html = new JTextPane();
 
-  JTextArea text = new JTextArea("<html><body><p>" +
+  JTextArea text = new JTextArea("<html><body>" +
     "123456789HR!<hr>987654321"+
     "123456789BR!<br>987654321"+
     "<p id='insertHere'>Insertion target</p><p>"+
Index: javax/swing/text/html/HTMLEditorKit.java
===================================================================
RCS file: /sources/classpath/classpath/javax/swing/text/html/HTMLEditorKit.java,v
retrieving revision 1.30
diff -u -r1.30 HTMLEditorKit.java
--- javax/swing/text/html/HTMLEditorKit.java	6 Jul 2006 10:55:29 -0000	1.30
+++ javax/swing/text/html/HTMLEditorKit.java	16 Jul 2006 14:30:45 -0000
@@ -40,6 +40,8 @@
 
 
 import gnu.classpath.NotImplementedException;
+import gnu.javax.swing.text.html.parser.GnuParserDelegator;
+import gnu.javax.swing.text.html.parser.HTML_401Swing;
 
 import java.awt.event.ActionEvent;
 import java.awt.event.MouseAdapter;
@@ -886,7 +888,9 @@
   protected Parser getParser()
   {
     if (parser == null)
-      parser = new ParserDelegator();
+      {
+        parser = new GnuParserDelegator(HTML_401Swing.getInstance());
+      }
     return parser;
   }
   
Index: javax/swing/text/html/HTMLDocument.java
===================================================================
RCS file: /sources/classpath/classpath/javax/swing/text/html/HTMLDocument.java,v
retrieving revision 1.37
diff -u -r1.37 HTMLDocument.java
--- javax/swing/text/html/HTMLDocument.java	13 Jul 2006 14:00:13 -0000	1.37
+++ javax/swing/text/html/HTMLDocument.java	16 Jul 2006 14:30:43 -0000
@@ -1317,16 +1317,6 @@
       printBuffer();
       DefaultStyledDocument.ElementSpec element;
 
-      // If the previous tag is content and the parent is p-implied, then
-      // we must also close the p-implied.
-      if (parseStack.size() > 0 && parseStack.peek() == HTML.Tag.IMPLIED)
-        {
-          element = new DefaultStyledDocument.ElementSpec(null,
-                                    DefaultStyledDocument.ElementSpec.EndTagType);
-          parseBuffer.addElement(element);
-          parseStack.pop();
-        }
-
       parseStack.push(t);
       AbstractDocument.AttributeContext ctx = getAttributeContext();
       AttributeSet copy = attr.copyAttributes();
@@ -1364,16 +1354,6 @@
                                     new char[0], 0, 0);
           parseBuffer.add(element);
         }
-      // If the previous tag is content and the parent is p-implied, then
-      // we must also close the p-implied.
-      else if (!parseStack.isEmpty() && parseStack.peek() == HTML.Tag.IMPLIED)
-        {
-          element = new DefaultStyledDocument.ElementSpec(null,
-                                 DefaultStyledDocument.ElementSpec.EndTagType);
-          parseBuffer.addElement(element);
-          if (parseStack.size() > 0)
-            parseStack.pop();
-        }
 
       element = new DefaultStyledDocument.ElementSpec(null,
 				DefaultStyledDocument.ElementSpec.EndTagType);
@@ -1413,27 +1393,6 @@
       DefaultStyledDocument.ElementSpec element;
       AttributeSet attributes = null;
 
-      // Content must always be embedded inside a paragraph element,
-      // so we create this if the previous element is not one of
-      // <p>, <h1> .. <h6>.
-      boolean createImpliedParagraph = false;
-      HTML.Tag parent = (HTML.Tag) parseStack.peek();
-      if (parent != HTML.Tag.P && parent != HTML.Tag.H1
-          && parent != HTML.Tag.H2
-          && parent != HTML.Tag.H3 && parent != HTML.Tag.H4
-          && parent != HTML.Tag.H5 && parent != HTML.Tag.H6
-          && parent != HTML.Tag.TD)
-        {
-          attributes = ctx.getEmptySet();
-          attributes = ctx.addAttribute(attributes,
-                                        StyleConstants.NameAttribute,
-                                        HTML.Tag.IMPLIED);
-          element = new DefaultStyledDocument.ElementSpec(attributes,
-                       DefaultStyledDocument.ElementSpec.StartTagType);
-          parseBuffer.add(element);
-          parseStack.push(HTML.Tag.IMPLIED);
-        }
-
       // Copy the attribute set, don't use the same object because 
       // it may change
       if (charAttr != null)
Index: javax/swing/text/html/parser/ParserDelegator.java
===================================================================
RCS file: /sources/classpath/classpath/javax/swing/text/html/parser/ParserDelegator.java,v
retrieving revision 1.8
diff -u -r1.8 ParserDelegator.java
--- javax/swing/text/html/parser/ParserDelegator.java	27 Jul 2005 08:09:37 -0000	1.8
+++ javax/swing/text/html/parser/ParserDelegator.java	16 Jul 2006 14:30:45 -0000
@@ -52,9 +52,6 @@
  * This class instantiates and starts the working instance of
  * html parser, being responsible for providing the default DTD.
  *
- * TODO Later this class must be derived from the totally abstract class
- * HTMLEditorKit.Parser. HTMLEditorKit that does not yet exist.
- *
  * @author Audrius Meskauskas ([EMAIL PROTECTED])
  */
 public class ParserDelegator
Index: gnu/javax/swing/text/html/parser/GnuParserDelegator.java
===================================================================
RCS file: gnu/javax/swing/text/html/parser/GnuParserDelegator.java
diff -N gnu/javax/swing/text/html/parser/GnuParserDelegator.java
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ gnu/javax/swing/text/html/parser/GnuParserDelegator.java	1 Jan 1970 00:00:00 -0000
@@ -0,0 +1,178 @@
+/* GnuParserDelegator.java -- The parser delegator which uses Swing DTD
+   Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is part of GNU Classpath.
+
+GNU Classpath is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2, or (at your option)
+any later version.
+
+GNU Classpath is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Classpath; see the file COPYING.  If not, write to the
+Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+02110-1301 USA.
+
+Linking this library statically or dynamically with other modules is
+making a combined work based on this library.  Thus, the terms and
+conditions of the GNU General Public License cover the whole
+combination.
+
+As a special exception, the copyright holders of this library give you
+permission to link this library with independent modules to produce an
+executable, regardless of the license terms of these independent
+modules, and to copy and distribute the resulting executable under
+terms of your choice, provided that you also meet, for each linked
+independent module, the terms and conditions of the license of that
+module.  An independent module is a module which is not derived from
+or based on this library.  If you modify this library, you may extend
+this exception to your version of the library, but you are not
+obligated to do so.  If you do not wish to do so, delete this
+exception statement from your version. */
+
+
+package gnu.javax.swing.text.html.parser;
+
+import java.io.IOException;
+import java.io.Reader;
+import java.io.Serializable;
+
+import javax.swing.text.BadLocationException;
+import javax.swing.text.html.HTMLEditorKit;
+import javax.swing.text.html.HTMLEditorKit.ParserCallback;
+import javax.swing.text.html.parser.DTD;
+import javax.swing.text.html.parser.ParserDelegator;
+import javax.swing.text.html.parser.TagElement;
+
+/**
+ * This parser delegator uses the different DTD ([EMAIL PROTECTED] HTML_401Swing}).
+ * It is derived from the ParserDelegator for the compatibility reasons.
+ * 
+ * @author Audrius Meskauskas ([EMAIL PROTECTED]) 
+ */
+public class GnuParserDelegator extends ParserDelegator implements Serializable
+{
+  class gnuParser
+    extends gnu.javax.swing.text.html.parser.support.Parser
+  {
+    private static final long serialVersionUID = 1;
+
+    gnuParser(DTD d)
+    {
+      super(d);
+    }
+
+    protected final void handleComment(char[] comment)
+    {
+      callBack.handleComment(comment, hTag.where.startPosition);
+    }
+
+    protected final void handleEmptyTag(TagElement tag)
+      throws javax.swing.text.ChangedCharSetException
+    {
+      callBack.handleSimpleTag(tag.getHTMLTag(), getAttributes(),
+                               hTag.where.startPosition
+                              );
+    }
+
+    protected final void handleEndTag(TagElement tag)
+    {
+      callBack.handleEndTag(tag.getHTMLTag(), hTag.where.startPosition);
+    }
+
+    protected final void handleError(int line, String message)
+    {
+      callBack.handleError(message, hTag.where.startPosition);
+    }
+
+    protected final void handleStartTag(TagElement tag)
+    {
+      htmlAttributeSet attributes = gnu.getAttributes();
+
+      if (tag.fictional())
+        attributes.addAttribute(ParserCallback.IMPLIED, Boolean.TRUE);
+
+      callBack.handleStartTag(tag.getHTMLTag(), attributes,
+                              hTag.where.startPosition
+                             );
+    }
+
+    protected final void handleText(char[] text)
+    {
+      callBack.handleText(text, hTag.where.startPosition);
+    }
+
+    DTD getDTD()
+    {
+      // Accessing the inherited gnu.javax.swing.text.html.parser.support.Parser
+      // field. super. is a workaround, required to support JDK1.3's javac.
+      return super.dtd;
+    }
+  }
+
+  /**
+   * Use serialVersionUID for interoperability.
+   */
+  private static final long serialVersionUID = -1276686502624777206L;
+
+  private DTD theDtd; 
+
+  /**
+   * The callback.
+   * This is package-private to avoid an accessor method.
+   */
+  HTMLEditorKit.ParserCallback callBack;
+
+  /**
+   * The reference to the working class of HTML parser that is
+   * actually used to parse the document.
+   * This is package-private to avoid an accessor method.
+   */
+  gnuParser gnu;
+  
+  /**
+   * Create the parser that uses the given DTD to parse the document.
+   * 
+   * @param theDtd the DTD
+   */
+  public GnuParserDelegator(DTD theDtd)
+  {
+    this.theDtd = theDtd;
+    gnu = new gnuParser(theDtd);
+  }
+
+  /**
+   * Parses the HTML document, calling methods of the provided callback. This
+   * method must be multithread - safe.
+   * 
+   * @param reader The reader to read the HTML document from
+   * @param a_callback The callback that is notifyed about the presence of HTML
+   *          elements in the document.
+   * @param ignoreCharSet If thrue, any charset changes during parsing are
+   *          ignored.
+   * @throws java.io.IOException
+   */
+  public void parse(Reader reader,
+                                 HTMLEditorKit.ParserCallback a_callback,
+                                 boolean ignoreCharSet) throws IOException
+  {
+    callBack = a_callback;
+    gnu.parse(reader);
+
+    callBack.handleEndOfLineString(gnu.getEndOfLineSequence());
+    try
+      {
+        callBack.flush();
+      }
+    catch (BadLocationException ex)
+      {
+        // Convert this into the supported type of exception.
+        throw new IOException(ex.getMessage());
+      }
+  }
+}
Index: gnu/javax/swing/text/html/parser/HTML_401Swing.java
===================================================================
RCS file: gnu/javax/swing/text/html/parser/HTML_401Swing.java
diff -N gnu/javax/swing/text/html/parser/HTML_401Swing.java
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ gnu/javax/swing/text/html/parser/HTML_401Swing.java	1 Jan 1970 00:00:00 -0000
@@ -0,0 +1,98 @@
+/* HTML_401Swing.java -- The HTML 4.01 DTD, adapted for HTML rendering in Swing
+   Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is part of GNU Classpath.
+
+GNU Classpath is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2, or (at your option)
+any later version.
+
+GNU Classpath is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Classpath; see the file COPYING.  If not, write to the
+Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+02110-1301 USA.
+
+Linking this library statically or dynamically with other modules is
+making a combined work based on this library.  Thus, the terms and
+conditions of the GNU General Public License cover the whole
+combination.
+
+As a special exception, the copyright holders of this library give you
+permission to link this library with independent modules to produce an
+executable, regardless of the license terms of these independent
+modules, and to copy and distribute the resulting executable under
+terms of your choice, provided that you also meet, for each linked
+independent module, the terms and conditions of the license of that
+module.  An independent module is a module which is not derived from
+or based on this library.  If you modify this library, you may extend
+this exception to your version of the library, but you are not
+obligated to do so.  If you do not wish to do so, delete this
+exception statement from your version. */
+
+
+package gnu.javax.swing.text.html.parser;
+
+import java.io.IOException;
+
+import javax.swing.text.html.parser.ContentModel;
+import javax.swing.text.html.parser.DTD;
+
+/**
+ * This class is necessary because the current implementation of the GNU
+ * Classpath Swing requires always enclose the text into paragraphs.
+ * 
+ * @author Audrius Meskauskas ([EMAIL PROTECTED])
+ */
+public class HTML_401Swing extends HTML_401F
+{
+  /**
+   * The singleton instance;
+   */
+  final static HTML_401Swing singleton = new HTML_401Swing();
+  
+  /**
+   * Either takes the document (by name) from DTD table, or
+   * creates a new instance and registers it in the tabe.
+   * The document is registerd under name "-//W3C//DTD HTML 4.01 Frameset//EN".
+   * @return The new or existing DTD for parsing HTML 4.01 Frameset.
+   */
+  public static DTD getInstance()
+  {
+    System.out.println("HTML_401Swing.java.getInstance:");
+    return singleton;
+  }  
+  
+  /**
+   * Get elements that are allowed in the document body, at the zero level.
+   * This list disallows the text at this level (the implied P tag will be
+   * generated). It also disallows A, B, I, U, CITE and other similar
+   * elements that have the plain text inside. They will also be placed
+   * inside the generated implied P tags.
+   */
+  protected String[] getBodyElements()
+  {
+    return new String[] {
+        ABBR, ACRONYM,
+        APPLET, BASEFONT, BDO, 
+        BR, BUTTON, 
+        FONT, IFRAME, IMG,
+        INPUT, LABEL, MAP, OBJECT,
+        Q, S, SCRIPT, SELECT,
+        SPAN, STRIKE, SUB,
+        SUP, TEXTAREA, 
+        ADDRESS, BLOCKQUOTE, CENTER, DEL, DIR,
+        DIV, DL, FIELDSET, FORM, H1,
+        H2, H3, H4, H5, H6,
+        HR, INS, ISINDEX, MENU, NOFRAMES,
+        NOSCRIPT, OL, P, PRE, TABLE,
+        UL
+      };
+  }
+
+}

[cp-patches] FYI: Using the custom DTD for the Swing HTML parser.

Reply via email to