https://issues.apache.org/bugzilla/show_bug.cgi?id=55383

            Bug ID: 55383
           Summary: Improve markup and design of Tomcat's HTML pages
           Product: Tomcat 8
           Version: trunk
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Documentation
          Assignee: dev@tomcat.apache.org
          Reporter: prei...@web.de

Hi,

I think there is room for improvements in markup and style of Tomcat's HTML
pages (e.g. to meet current HTML5 [1] standards and not using obsolete
features) in Tomcat 8.
The ROOT index.jsp has alredy greatly been improved by pidster.

There are some other occurences of HTML in Tomcat's source that I think can be
improved:
• Tomcat's error pages
• Examples webapp
• Probably also the Tomcat website (http://tomcat.apache.org/)


Some things that I think of being improvable are:

1) Use always "Full Standards Mode" instead of "Quirks Mode" or "Almost
Standards Mode", as the Quirks mode is kind of a relict that browsers implement
to be able to display Websites that were written for IE <= 5, which had serious
layout errors. However, current Websites should always use Standards Mode as
described in the HTML5 spec [1].
  This means that for HTML documents ("text/html"), always use the recommended
doctype <!DOCTYPE html>. For XHTML documents ("application/xhtml+xml"), a
doctype is not needed (as there is only Full Standards Mode), but should be
used if making polyglot documents [2].
  Note: Placing a DOCTYPE after a HTML comment (like it is done at pidster's
proposal here: http://people.apache.org/~pidster/tomcat/site/) will force IE 9
and older to use Quirks Mode; however, IE 10 and newer will use Full Standards
Mode in such case. This can be tested by using IE's F12 developer tools.

2) Don't use obsolete HTML elements or attributes, like the ones which are
replaced by CSS as they were purely used for styling and not markup (e.g.
"bgcolor" or "align" attributes, <font> element etc.)

3) A <table> element should only be used for showing real tables, not for
layouting purposes [3], like currently done at the Tomcat Website.

4) Use new HTML5 elements for structuring HTML content, like <nav> (marking a
navigation section), <header>, <footer>; <time datetime="..."> for marking a
date with a machine-readable format and so on.
  HTML5 also allows to declare microdata in HTML markup using some new
attributes like "itemscope", "itemtype" etc [9]. E.g. one can use microdata
format provided by http://schema.org/ which should be recognized by Search
Engines like Google, Bing, Yahoo and Yandex (but I don't know if this will be
useful to the Tomcat website).
  However, if we still want to support IE versions older than IE9, then care
must be taken when using new HTML elements, because IE <= 8 have a special
parsing method of unknown elements compared to IE9 and other browsers. E.g., if
you have 
  <myElement class="myHeader">Hi!</myElement>
  then the DOM in IE 9/10/11 and other browsers like Firefox represents exactly
that piece of HTML, so if they don't know what a <myElement> element is, they
will still use the CSS defined for class "myHeader" to format it. However in IE
8 and older, the DOM will look like 
  <myElement></myElement>Hi!</myElement><//myElement>
  (this is not valid HTML, but the DOM actually has a element with the name
"myElement" and one with the name "/myElement"), so it will not make use of the
CSS defined for that element.  

5) Maybe also historical practices like putting CSS or Javascript into comments
("<style type=...><!-- .myClass {...} --></style>") can be abandoned, as that
was only required for very old browsers that did not know <style> or <script>
elements, to prevent the content of such elements from appearing in a document
as text. However, I cannot think of a still supported browser that would not
recognize such elements.
  Note: For XHTML, this approach is actually wrong, because a XML parser will
treat a comment inside of a style element like a comment, which means that the
browser only sees a empty style element. There are ways to make things work if
the document should be both a HTML and a XHTML document [4], but I don't think
it makes sense (see 7) ).

6) Encodings can be set on HTML pages using <meta charset="UTF-8">,
alternatively to <meta http-equiv="Content-Type" value="text/html;
encoding="UTF-8">. I think the shorter form makes more sense because the
Content-Type can only be set externally (by a Header or using a file extension)
before a browser begins with parsing the document. It seems that even IE7
supports the short variant.
  Notice that this is for HTML only; for XHTML, the encoding can specified in
the XML header declaration (<?xml version="1.0" encoding="UTF-8"?>).
  Note: For the sake of polyglot documents (see 7) ), it is allowed that a
XHTML document also includes a <meta charset="UTF-8" /> - but "UTF-8" is the
only permitted value in this case [8]. Then, also a XML declaration may not be
used as it is forbidden in HTML - this means a XML parser will determine the
encoding from BOM bytes, if present - if not, UTF-8 will be used [7].
  Note: Even if the encoding is already set by a Content-Type header, I think
it is a good practice to also include the encoding declaration in the document
itself (of course, matching the one set in the Content-Type header).

7) Some HTML pages contain elements in a XHTML-compatible syntax, e.g.
self-closing elements like "<br />", whereas others contain "<br>". Actually,
when using "<br />" in a document that is sent using "text/html" (or a ".html"
file extension), this was wrong syntax according to previous HTML
specifications, as only "<br>" syntax was allowed (even if a DOCTYPE states
"XHTML 1.0", the browser actually uses a HTML parser instead of a XML parser
when the page is sent with "text/html" instead of "application/xhtml+xml").
However, the new HTML5 specification allows this syntax (<br />) in HTML
documents.
  However, when a document is intended to be a polyglot document (i.e. a
document that is compatible with both HTML and XML parsing modes), then there
are a lot of other concerns that one needs to take care of, e.g. not to use
entity references like "&nbsp;" as XML parsers are not guaranteed to process a
external DTD (which would declare such entities), and also the HTML5 spec does
not define any such DTD [5]. Furthermore, always write <input type="checkbox"
checked="checked" itemscope="itemscope" /> instead of <input type=checkbox
checked itemscope> and so on.
  However, I think it is easier to write documents that are either HTML only,
or XHTML (XML) only. IE 9 is the first IE that supports XHTML, so I think,
currently, a XHTML-only document is not an option here (whereas I do XHTML-only
documents for my own websites, as I use a XML tool to generate the output and I
don't care about IE <= 8). In this case, I think it is easier to use HTML-only
syntax instead of creating a polyglot document.
  Actually, for a person that writes HTML by hand, I think it is easier to
write HTML-only syntax; whereas for documents generated programatically by an
XML writer, the XHTML-only syntax would fit better.


Please add, if I forgot something.


Now, if one is going to improve the markup of a document, I think it is a good
idea to also improve the design/layout.

When I have time (and if you agree) I can start working on the things mentioned
above, unless it is already done by another person ;-)

I see in the archives that Pidster already proposed a new layout for the Tomcat
website [6] (which was the same layout as the ROOT index.jsp), which has not
been accepted yet. I don't have another proposal for a new design of the Tomcat
website atm, but I think the HTML markup can be improved (the points mentioned
above) and the style can be tweaked a bit so that the site doesn't look dated.

What do you think?


[1] http://www.w3.org/TR/html5/
[2] http://wiki.whatwg.org/wiki/HTML_vs._XHTML
[3] http://www.w3.org/TR/html5/tabular-data.html#the-table-element
[4] http://hixie.ch/advocacy/xhtml
[5] http://www.w3.org/TR/html5/the-xhtml-syntax.html#writing-xhtml-documents
[6] http://markmail.org/message/og235cbvrdluiejg
[7] http://www.w3.org/TR/xml/#sec-guessing
[8] http://www.w3.org/TR/html5/document-metadata.html#attr-meta-charset
[9] http://en.wikipedia.org/wiki/Microdata_%28HTML%29

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to