Re: Conversion of raw html stops in mid-file w/o error message
Moshe Yudkowsky wrote: I've got a raw html file that is being auto-converted -- decorated -- by forrest. Then it is not raw. Raw files get no decoration. Although the conversion goes well for the initial sections, at one point the conversion stops, and the rest of the file does not appear. There are no error messages or warnings. This is probably the h2, h3, h4 issue that Brian mentioned. I have validated the document using the W3C validator, and it passes whether I use it as 4.01 loose or XHTML strict. (The meta tags have to be modified, depending on the format, but the rest of the document is unchanged.) Problem 1: no conversion of XHTML strict If the document is XHTML strict, then forrest does not convert any of the body text whatsoever! Correct because Forrest is expecting HTML input, not XHTML. You need to add a SourceTypeAction for XHTML. http://forrest.apache.org/docs_0_70/cap.html -David
Re: Conversion of raw html stops in mid-file w/o error message
All, Thanks for the information about how to accomplish this conversion. My (current) problem is solved: h2 cannot be followed directly by h4 in forrest. I do have a comment: * The W3C HTML validator says that h2 followed by h4 is valid HTML and valid XHTML. From the W3 spec http://www.w3.org/TR/REC-html40/struct/global.html#edef-H2 itself, Some people consider skipping heading levels to be bad practice. They accept H1 H2 H1 while they do not accept H1 H3 H1 since the heading level H2 is skipped. Forrest is going beyond the spec by enforcing this restriction. And now, for some other cleanup work (#151; to mdash;, for example). With any luck I'll find some time to submit doc patches. -- Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe Don't try to outweird me, three-eyes. I get stranger things than you free with my breakfast cereal. - Zaphod Beeblebrox in Hithiker's Guide to the Galaxy
Re: Conversion of raw html stops in mid-file w/o error message
Moshe Yudkowsky wrote: Thanks for the information about how to accomplish this conversion. My (current) problem is solved: h2 cannot be followed directly by h4 in forrest. I do have a comment: * The W3C HTML validator says that h2 followed by h4 is valid HTML and valid XHTML. From the W3 spec http://www.w3.org/TR/REC-html40/struct/global.html#edef-H2 itself, Some people consider skipping heading levels to be bad practice. They accept H1 H2 H1 while they do not accept H1 H3 H1 since the heading level H2 is skipped. Forrest is going beyond the spec by enforcing this restriction. It is not that we are deliberatley enforcing that. If you can devise a method to handle such tag soup, then we will gladly apply the patch. -David
Re: Conversion of raw html stops in mid-file w/o error message
- Original Message - From: Ross Gardler [EMAIL PROTECTED] To: user@forrest.apache.org Sent: Sunday, October 23, 2005 8:33 PM Subject: Re: Conversion of raw html stops in mid-file w/o error message | Moshe Yudkowsky wrote: | All, | | Thanks for the information about how to accomplish this conversion. My | (current) problem is solved: h2 cannot be followed directly by h4 in | forrest. | | I do have a comment: | | * The W3C HTML validator says that h2 followed by h4 is valid HTML | and valid XHTML. From the W3 spec | http://www.w3.org/TR/REC-html40/struct/global.html#edef-H2 itself, | Some people consider skipping heading levels to be bad practice. They | accept H1 H2 H1 while they do not accept H1 H3 H1 since the heading | level H2 is skipped. Forrest is going beyond the spec by enforcing this | restriction. | | Lke we said, if it is a problem to you then feel free to provide a patch | to the html2document.xsl stylesheet. | | We don't use html as an internal format because it lacks some needed | structure for other processing. Outr internal structure is much closer | to XHTML2 (in fact we will be moving to a ubset XHTML2 in some future | release). | | In HTML2 headings do not have levels assigned to them, instead you av: | | section | title | section | title | | Ross And no more h1h4 , its just h body hThis is a top level heading/h p/p section p/p hThis is a second-level heading/h p/p hThis is another second-level heading/h p/p /section Gav... -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.361 / Virus Database: 267.12.4/146 - Release Date: 21/10/2005 -- This message was scanned for spam and viruses by BitDefender. For more information please visit http://linux.bitdefender.com/
Re: Conversion of raw html stops in mid-file w/o error message
Ross notes: Lke we said, if it is a problem to you then feel free to provide a patch to the html2document.xsl stylesheet. David comments: It is not that we are deliberatley enforcing that. If you can devise a method to handle such tag soup, then we will gladly apply the patch. Thanks for the information. I will try to look at this issue, I think. Actually, a higher-priority item for me would be the silent failure. I didn't get any error messages; instead, the page was simply not rendered, and if I hadn't been checking the pages to see what happened under 0.7 I would never have noticed. I've now checked the rest of the site and I've found other silent failures! We don't use html as an internal format because it lacks some needed structure for other processing. Outr internal structure is much closer to XHTML2 (in fact we will be moving to a ubset XHTML2 in some future release). Thanks, I will be on the lookout.