[PHP-DOC] Suggested text for the Tidy example

Keryx Web Wed, 05 Dec 2007 06:47:38 -0800

Page in question:

http://docs.php.net/manual/en/tidy.examples.php


Suggestions (summary):

1. Change the example to use a strict doctype, and include the outputfrom the script.


2. Explain what has happened

3. Give another example where a different type snippet is corrected

4. Explain what has happened

5. Give a third example where valid, but un-semantic, code will not bemade semantic, to show how one still needs to consider proper usage,even though tidy is being used. (The chapter on Tidy in PHP Cookbook byO'Reilly gives a really faulty explanations in this regard.)


-------------------
Suggested text:
-------------------

The script above will output:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";>
  <head>
    <title></title>
  </head>
  <body>
    a html document
  </body>
</html>

Notice how the missing doctype and missing XHTML-tags have been added,in this particular example the head, title and body-tags. The result iswell formed XHTML. Indentation has also been added.

However, in this example Tidy still fails to produce entirely<dfn>valid</dfn> XHTML 1.0 strict, since any text in the body must beinside a block-level element, and tidy will not guess what block-elementit should use. But Tidy can give a warning about issues like these.Adding the line echo $tidy->errorBuffer would produce the followingwarnings:


line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 7 - Warning: plain text isn't allowed in <head> elements
line 1 column 7 - Warning: inserting missing 'title' element

Example 2

[Code to be cleaned follows, same script in PHP]

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";>
  <head>
    <title>A second example</title>
  </head>
  <body>
    <p>
      Badly <span>nested <em class=foo>and</span> missing tags.</p>
  </body>
</html>

This script will output:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";>
  <head>
    <title>
      A second example
    </title>
  </head>
  <body>
    <p>
      Badly <span>nested <em class="foo">and</em> missing tags.</span>
    </p>
  </body>
</html>

Notice that the dfn and em tags now are correctly nested, and thatquotation marks have been added to the attribute value. The missingclosing tag for the em-element has been added as well. This code willnow validate.


Example 3

<?php
ob_start();
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";>
  <head>
    <title>A third example</title>
  </head>
  <body>
    <p><abbr>Valid</abbr>, but <cite>misused</cite> use of HTML</p>
    <center>Well formed, but <font face="Verdana">unsemantic</font>
       HTML</center>
  </body>
</html>
<?php
$html = ob_get_clean();

$tidy = new tidy;
$config = array(
           'indent'         => true,
           'output-xhtml'   => false,
           'doctype'        => 'strict',
           'drop-font-tags' => true,
           'wrap'           => 200);
$tidy->parseString($html, $config, 'utf8');
$tidy->cleanRepair();
echo $tidy;

This code will output:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd";>
<html>
  <head>
    <title>
      A third example
    </title>
  </head>
  <body>
    <p>
      <abbr>Valid</abbr>, but <cite>misused</cite> use of HTML
    </p>Well formed, but unsemantic HTML<br>
  </body>
</html>

In this example we converted XHTML to HTML 4.01. Tidy can be used forconversions in either direction. The namespace attribute was removed.The elements center and font, now obsolete and better replaced with CSS,have been removed from the markup, thanks to the setting"drop-font-tags". Notice that Tidy will not drop these elements unlessthis explicitly has been mentioned in the settings, even though theyhave been completely removed from HTML 4.01 strict and XHML 1.0 strictand later. Tidy will not deduce such behavior from the doctype chosen.Tidy can remove some unsemantic markup, but it cannot check that anauthor has used (X)HTML elements properly from a semantic point of view.

Even so, this extension is a powerful tool to check for most and correctsome markup errors.


----------

English question: Is it "A HTML doc" or "AN HTML doc" (one speaks HTMLwith a vowel sound first...)

[PHP-DOC] Suggested text for the Tidy example

Reply via email to