Luke,
What you are looking to do comes up pretty often on this list so you will probably get quite a bit of help.
Here's mine....
What you will need to do is go from HTML into XML then into FO, once in FO, FOP can render it quite quickly into a PDF, your browser can even be used as the delivery mechanism.
I wrote a Java Servlet which is invoked via an HTML page link, the links passes the necessary parameters. In your case that will be a reference to the original HMTL file.
The next step is not obvious, hence this e-mail. Not all HTML is XML-ready, humans make mistakes which most browsers correct, unbalanced and missing tags for example. Also some tags need to be doctored, <BR> and <HR> come to mind, these have no closing tags. What I did here was to use the Tidy engine/library to fix up my HTML into valid XML.
Now the job gets pretty easy...
The next step is to develop an XSL transform which takes HMTL tags and create FO XML. I have some transforms which I am very happy to share with you, as will others. Nobody has a complete HTML to FO implementation as this would be huge but you can get most of the transform working quickly and then add to it as needed.
Once you have the FO XML -- BOOM, a few lines of code later and you've got your PDF.
The servlet I wrote actually communicates back to the browser every second and fakes an elapsed progress timer. We had to do this as we originally were running on slow hardware and have very impatient. With our hardware these days the transform and PDF generation runs so quickly, the interaction is more of a nuisance that an aid. But at the time that is what the boss wanted, so I wrote it.
--will
smime.p7s
Description: S/MIME cryptographic signature
