I just tried it. Each line of the extracted text is wrapped in a
paragraph tag, and each page is wrapped in div tags. That's all. No
other HTML tags are used.
<p>Installing Java DB
</p>
<p>Java DB is installed automatically as part of the Java SE Development
Kit (JDK).
</p>
<p>To obtain the JDK, navigate your web browser to
</p>
<p>http://www.oracle.com/technetwork/java/javase/downloads/ and click
the Download JDK
</p>
<p>button. Follow the instructions on subsequent pages.
</p>
Kim
On 01/10/12 12:15 PM, Andrew McIntyre wrote:
On Tue, Jan 10, 2012 at 5:46 AM, Rick Hillegas<[email protected]> wrote:
I ran a quick experiment: I removed fo2html.xsl and verified that I could
build the frames html docs. Here are some solutions listed in declining
order of effort:
<snip options>
Thanks,
-Rick
Another option would be to use PDFBox's ExtractText utility to convert
the PDFs generated by the FOP into HTML:
http://pdfbox.apache.org/commandlineutilities/ExtractText.html
I haven't tried it yet, so I can't speak to its accuracy or
presentation, but it would be another easy solution, and its
definitely licensed with the Apache License. :-)
- andrew