Re: [SLUG] howto convert html to pdf?
I tried it as well, and it looks to me like it's a cross-language dependency problem. The one downside to the CPAN installer (and language-specific source repository installations in general) is that it isn't able to cross language boundaries. It looks like HTML::Tidy needs something called libtidy, and is freaking out when it can't find it. On another note though, I hadn't heard of mozilla2ps before, but I think that almost certainly the best approach. Using a full blown rendering engine is much more likely to produce good results. So you add my support to the mozilla2ps -> ps2pdf approach as well, despite the slowness. Adam K Sonia Hamilton wrote: * On Tue, Nov 14, 2006 at 06:04:11PM +1100, Adam Kennedy wrote: I just realized that my previous response didn't make it to the list. Something like this should work for you. http://search.cpan.org/~audreyt/PDF-FromHTML-0.20/script/html2pdf.pl Should be just a case of... [EMAIL PROTECTED]:~$ cpan -i PDF::FromHTML [EMAIL PROTECTED]:~$ html2pdf.pl source.html > target.pdf Thanks Adam. xulrunner + mozilla2ps gave me good results, so I used that. A bit slow, but I scripted it and left it to chug away... Out of interest, 'cpan -i PDF::FromHTML' failed, due to 'cpan -i HTML::Tidy' failing, with errors below. Not sure how to fix this - any ideas? I tried googling on the error messages, installing tidy (in case it's a missing library). I don't use Perl much, so don't know where to start with these sort of problems. Errors: ... ... cp lib/HTML/Tidy.pm blib/lib/HTML/Tidy.pm /usr/bin/perl /usr/share/perl/5.8/ExtUtils/xsubpp -typemap /usr/share/perl/5.8/ExtUtils/typemap Tidy.xs > Tidy.xsc && mv Tidy.xsc Tidy.c Please specify prototyping behavior for Tidy.xs (see perlxs manual) cc -c -I. -I/usr/include/tidy -I/usr/local/include/tidy -I/sw/include/tidy -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -DVERSION=\"1.06\" -DXS_VERSION=\"1.06\" -fPIC "-I/usr/lib/perl/5.8/CORE" Tidy.c Tidy.xs:5:18: error: tidy.h: No such file or directory Tidy.xs:6:20: error: buffio.h: No such file or directory ... ... make had returned bad status, install seems impossible -- Sonia Hamilton. GPG key A8B77238. . One OS to rule them all, One OS to find them. One OS to call them all, And in salvation bind them. In the bright land of Linux, Where the hackers play. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] howto convert html to pdf?
* On Tue, Nov 14, 2006 at 06:04:11PM +1100, Adam Kennedy wrote: > I just realized that my previous response didn't make it to the list. > > Something like this should work for you. > > http://search.cpan.org/~audreyt/PDF-FromHTML-0.20/script/html2pdf.pl > > Should be just a case of... > > [EMAIL PROTECTED]:~$ cpan -i PDF::FromHTML > > [EMAIL PROTECTED]:~$ html2pdf.pl source.html > target.pdf Thanks Adam. xulrunner + mozilla2ps gave me good results, so I used that. A bit slow, but I scripted it and left it to chug away... Out of interest, 'cpan -i PDF::FromHTML' failed, due to 'cpan -i HTML::Tidy' failing, with errors below. Not sure how to fix this - any ideas? I tried googling on the error messages, installing tidy (in case it's a missing library). I don't use Perl much, so don't know where to start with these sort of problems. Errors: ... ... cp lib/HTML/Tidy.pm blib/lib/HTML/Tidy.pm /usr/bin/perl /usr/share/perl/5.8/ExtUtils/xsubpp -typemap /usr/share/perl/5.8/ExtUtils/typemap Tidy.xs > Tidy.xsc && mv Tidy.xsc Tidy.c Please specify prototyping behavior for Tidy.xs (see perlxs manual) cc -c -I. -I/usr/include/tidy -I/usr/local/include/tidy -I/sw/include/tidy -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -DVERSION=\"1.06\" -DXS_VERSION=\"1.06\" -fPIC "-I/usr/lib/perl/5.8/CORE" Tidy.c Tidy.xs:5:18: error: tidy.h: No such file or directory Tidy.xs:6:20: error: buffio.h: No such file or directory ... ... make had returned bad status, install seems impossible -- Sonia Hamilton. GPG key A8B77238. . One OS to rule them all, One OS to find them. One OS to call them all, And in salvation bind them. In the bright land of Linux, Where the hackers play. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] howto convert html to pdf?
On 11/14/06, Sonia Hamilton <[EMAIL PROTECTED]> wrote: I (well my boss actually) want to convert several hundred html pages to pdf - what's the easiest way to do this? Any pointers, ideas? We're using Apache FOP for html -> pdf conversion. It might be slightly more involved than the other tools people have suggested, however it will give you a lot of control over the final pdf (via xslt). We looked at some of the single-shot command line tools and found that fop produced nicer documents, albeit with a few more steps. It's Java though, this may or may not be an issue for you. Link below gives some examples of how to use it from the command line, as well as in code. http://www.javaworld.com/javaworld/jw-04-2006/jw-0410-html.html?page=1 Home page is at http://xmlgraphics.apache.org/ If you do use it, grab the latest version, handles tables etc. much better. Cheers.Steve -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] howto convert html to pdf?
* On Tue, Nov 14, 2006 at 06:01:03PM +1100, Michael Lake wrote: > Well it depends on how well formed the HTML is. If its crap it could be > hard to create neat, good looking PDF. Here are possibilities: * On Tue, Nov 14, 2006 at 06:04:11PM +1100, Adam Kennedy wrote: > Something like this should work for you. > http://search.cpan.org/~audreyt/PDF-FromHTML-0.20/script/html2pdf.pl * On Tue, Nov 14, 2006 at 06:17:22PM +1100, Craige McWhirter wrote: > One solution would be to use html2ps then ps2pdf. I at least use ps2pdf * On Tue, Nov 14, 2006 at 06:48:22PM +1100, David Kempe wrote: > Funnily enough we have had a project just recently with the exact same * On Tue, Nov 14, 2006 at 06:55:32PM +1100, Lindsay Holmwood wrote: > I've used mozilla2ps before to do this, then hooked it into ps2pdf for Thanks for all your suggestions everyone - I'll now get down and RTFM and have a bit of a play :-) Funny thing is that we're actually scraping from my company's intranet, but for various reasons (ie politics...) it's easier to scrape the stuff rather than trying to get it out of the central IT Department... -- Sonia Hamilton. GPG key A8B77238. . One OS to rule them all, One OS to find them. One OS to call them all, And in salvation bind them. In the bright land of Linux, Where the hackers play. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] howto convert html to pdf?
On Tue, Nov 14, 2006 at 05:42:59PM +1100, Sonia Hamilton wrote: > I (well my boss actually) want to convert several hundred html pages to > pdf - what's the easiest way to do this? Any pointers, ideas? > > I guess I'm looking for a tool like pdf2html (but going in the reverse > direction). > > I've found a php module called html2pdf [1] - just wondering if there's > a stand alone tool callable from the shell. > > [1] http://directory.fsf.org/print/misc/html2pdf.html > I've used mozilla2ps before to do this, then hooked it into ps2pdf for the final output. html2ps doesn't really cut it, with the output looking a bit wishy-washy if you're doing anything complicated (tables, divs, etc). If you're looking for good quality output it's the only choice, however be prepared to wait around a bit for it to generate the pages (it starts up the entire gecko engine before loading the page, then it prints it) You'll need xulrunner underneath, and the moz2ps website[0] explains how to set up everything. The syntax of xulrunner/moz2ps is a bit finicky, but if you follow the configuration docs to the letter you should be ok. Good luck! Lindsay [0] http://michele.pupazzo.org/mozilla2ps/ -- http://slug.org.au/ (Sydney Linux Users Group) http://lca2007.linux.org.au/ (linux.conf.au 2007) http://holmwood.id.au/~lindsay/ (me) -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] howto convert html to pdf?
Funnily enough we have had a project just recently with the exact same problem. http://htmldoc.org/ htlmdoc is by the people who made CUPS and probably will do unless you have css up the wazoo or something more complex. For the ultimate in rendering ability, embeded mozilla engine via http://michele.pupazzo.org/mozilla2ps/ then the run the output through ps2pdf. moz2ps doesn't seem to work with stdin though (you should be ok). those two choices are the easiest to use, though moz2ps seems to like a gui (gtk libraries) and sometimes it likes to run as root (tarball version) get ubuntu edgy or recent debian and you should be ok with that though. dave Sonia Hamilton wrote: I (well my boss actually) want to convert several hundred html pages to pdf - what's the easiest way to do this? Any pointers, ideas? I guess I'm looking for a tool like pdf2html (but going in the reverse direction). I've found a php module called html2pdf [1] - just wondering if there's a stand alone tool callable from the shell. [1] http://directory.fsf.org/print/misc/html2pdf.html -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] howto convert html to pdf?
On Tue, 2006-11-14 at 17:42 +1100, Sonia Hamilton wrote: > I've found a php module called html2pdf [1] - just wondering if there's > a stand alone tool callable from the shell. One solution would be to use html2ps then ps2pdf. I at least use ps2pdf daily and it's a good tool. Hopefully html2ps is too. -- Cheers, Craige. signature.asc Description: This is a digitally signed message part -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] howto convert html to pdf?
I just realized that my previous response didn't make it to the list. Something like this should work for you. http://search.cpan.org/~audreyt/PDF-FromHTML-0.20/script/html2pdf.pl Should be just a case of... [EMAIL PROTECTED]:~$ cpan -i PDF::FromHTML [EMAIL PROTECTED]:~$ html2pdf.pl source.html > target.pdf Adam K Sonia Hamilton wrote: I (well my boss actually) want to convert several hundred html pages to pdf - what's the easiest way to do this? Any pointers, ideas? I guess I'm looking for a tool like pdf2html (but going in the reverse direction). I've found a php module called html2pdf [1] - just wondering if there's a stand alone tool callable from the shell. [1] http://directory.fsf.org/print/misc/html2pdf.html -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] howto convert html to pdf?
Sonia Hamilton wrote: I (well my boss actually) want to convert several hundred html pages to pdf - what's the easiest way to do this? Any pointers, ideas? I guess I'm looking for a tool like pdf2html (but going in the reverse direction). I've found a php module called html2pdf [1] - just wondering if there's a stand alone tool callable from the shell. [1] http://directory.fsf.org/print/misc/html2pdf.html Well it depends on how well formed the HTML is. If its crap it could be hard to create neat, good looking PDF. Here are possibilities: 1. If its got few tags and is very neat maybe use sed/bash scripting to convert it to latex and run pdflatex on it. This is easily scriptable to do several hundred files. 2. Script Open Office. OpenOffice can take in HTML and produce PDF. Mike -- Michael Lake Computational Research Support Unit Science Faculty, UTS Ph: 9514 2238 -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
[SLUG] howto convert html to pdf?
I (well my boss actually) want to convert several hundred html pages to pdf - what's the easiest way to do this? Any pointers, ideas? I guess I'm looking for a tool like pdf2html (but going in the reverse direction). I've found a php module called html2pdf [1] - just wondering if there's a stand alone tool callable from the shell. [1] http://directory.fsf.org/print/misc/html2pdf.html -- Sonia Hamilton. GPG key A8B77238. . One OS to rule them all, One OS to find them. One OS to call them all, And in salvation bind them. In the bright land of Linux, Where the hackers play. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html