Re: Conversion of raw html stops in mid-file w/o error message

2005-10-23 Thread David Crossley
Moshe Yudkowsky wrote:
 I've got a raw html file that is being auto-converted -- decorated -- 
 by forrest.

Then it is not raw. Raw files get no decoration.

 Although the conversion goes well for the initial sections, at one point 
 the conversion stops, and the rest of the file does not appear. There 
 are no error messages or warnings.

This is probably the h2, h3, h4 issue that Brian mentioned.

 I have validated the document using the W3C validator, and it passes 
 whether I use it as 4.01 loose or XHTML strict. (The meta tags have to 
 be modified, depending  on the format, but the rest of the document is 
 unchanged.)
 
 Problem 1: no conversion of XHTML strict
 
 If the document is XHTML strict, then forrest does not convert any of 
 the body text whatsoever!

Correct because Forrest is expecting HTML input,
not XHTML.

You need to add a SourceTypeAction for XHTML.
http://forrest.apache.org/docs_0_70/cap.html

-David


Re: Conversion of raw html stops in mid-file w/o error message

2005-10-23 Thread Moshe Yudkowsky

All,

Thanks for the information about how to accomplish this conversion.  My 
(current) problem is solved: h2 cannot be followed directly by h4 in 
forrest.


I do have a comment:

* The W3C HTML validator says that h2 followed by h4 is valid HTML 
and valid XHTML. From the W3 spec 
http://www.w3.org/TR/REC-html40/struct/global.html#edef-H2 itself, 
Some people consider skipping heading levels to be bad practice. They 
accept H1 H2 H1 while they do not accept H1 H3 H1 since the heading 
level H2 is skipped. Forrest is going beyond the spec by enforcing this 
restriction.


And now, for some other cleanup work (#151; to mdash;, for example). 
With any luck I'll find some time to submit doc patches.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
Don't try to outweird me, three-eyes.  I get stranger things than you free
with my breakfast cereal.
- Zaphod Beeblebrox in Hithiker's Guide to the Galaxy


Re: Conversion of raw html stops in mid-file w/o error message

2005-10-23 Thread David Crossley
Moshe Yudkowsky wrote:
 
 Thanks for the information about how to accomplish this conversion.  My 
 (current) problem is solved: h2 cannot be followed directly by h4 in 
 forrest.
 
 I do have a comment:
 
 * The W3C HTML validator says that h2 followed by h4 is valid HTML 
 and valid XHTML. From the W3 spec 
 http://www.w3.org/TR/REC-html40/struct/global.html#edef-H2 itself, 
 Some people consider skipping heading levels to be bad practice. They 
 accept H1 H2 H1 while they do not accept H1 H3 H1 since the heading 
 level H2 is skipped. Forrest is going beyond the spec by enforcing this 
 restriction.

It is not that we are deliberatley enforcing that.
If you can devise a method to handle such tag soup,
then we will gladly apply the patch.

-David


Re: Conversion of raw html stops in mid-file w/o error message

2005-10-23 Thread Gav....

- Original Message - 
From: Ross Gardler [EMAIL PROTECTED]
To: user@forrest.apache.org
Sent: Sunday, October 23, 2005 8:33 PM
Subject: Re: Conversion of raw html stops in mid-file w/o error message


| Moshe Yudkowsky wrote:
|  All,
| 
|  Thanks for the information about how to accomplish this conversion.  My
|  (current) problem is solved: h2 cannot be followed directly by h4 in
|  forrest.
| 
|  I do have a comment:
| 
|  * The W3C HTML validator says that h2 followed by h4 is valid HTML
|  and valid XHTML. From the W3 spec
|  http://www.w3.org/TR/REC-html40/struct/global.html#edef-H2 itself,
|  Some people consider skipping heading levels to be bad practice. They
|  accept H1 H2 H1 while they do not accept H1 H3 H1 since the heading
|  level H2 is skipped. Forrest is going beyond the spec by enforcing this
|  restriction.
|
| Lke we said, if it is a problem to you then feel free to provide a patch
| to the html2document.xsl stylesheet.
|
| We don't use html as an internal format because it lacks some needed
| structure for other processing. Outr internal structure is much closer
| to XHTML2 (in fact we will be moving to a ubset XHTML2 in some future
| release).
|
| In HTML2 headings do not have levels assigned to them, instead you av:
|
| section
|   title
| section
|   title
|
| Ross

And no more h1h4 , its just h

body
hThis is a top level heading/h
p/p
section
p/p
hThis is a second-level heading/h
p/p
hThis is another second-level heading/h
p/p
/section

Gav...



-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.361 / Virus Database: 267.12.4/146 - Release Date: 21/10/2005



-- 
This message was scanned for spam and viruses by BitDefender.
For more information please visit http://linux.bitdefender.com/




Re: Conversion of raw html stops in mid-file w/o error message

2005-10-23 Thread Moshe Yudkowsky

Ross notes:

 Lke we said, if it is a problem to you then feel free to provide a patch
  to the html2document.xsl stylesheet.

David comments:

  It is not that we are deliberatley enforcing that.
  If you can devise a method to handle such tag soup,
  then we will gladly apply the patch.

Thanks for the information.

I will try to look at this issue, I think.

Actually, a higher-priority item for me would be the silent failure. I
didn't get any error messages; instead, the page was simply not
rendered, and if I hadn't been checking the pages to see what happened
under 0.7 I would never have noticed. I've now checked the rest of the
site and I've found other silent failures!


  We don't use html as an internal format because it lacks some needed
  structure for other processing. Outr internal structure is much closer
  to XHTML2 (in fact we will be moving to a ubset XHTML2 in some future
  release).


Thanks, I will be on the lookout.