Re: [SLUG] howto convert html to pdf?

2006-11-15 Thread Adam Kennedy
I tried it as well, and it looks to me like it's a cross-language 
dependency problem.


The one downside to the CPAN installer (and language-specific source 
repository installations in general) is that it isn't able to cross 
language boundaries.


It looks like HTML::Tidy needs something called libtidy, and is freaking 
out when it can't find it.


On another note though, I hadn't heard of mozilla2ps before, but I think 
that almost certainly the best approach. Using a full blown rendering 
engine is much more likely to produce good results.


So you add my support to the mozilla2ps -> ps2pdf approach as well, 
despite the slowness.


Adam K

Sonia Hamilton wrote:

* On Tue, Nov 14, 2006 at 06:04:11PM +1100, Adam Kennedy wrote:

I just realized that my previous response didn't make it to the list.

Something like this should work for you.

http://search.cpan.org/~audreyt/PDF-FromHTML-0.20/script/html2pdf.pl

Should be just a case of...

  [EMAIL PROTECTED]:~$ cpan -i PDF::FromHTML

  [EMAIL PROTECTED]:~$ html2pdf.pl source.html > target.pdf


Thanks Adam. xulrunner + mozilla2ps gave me good results, so I used
that. A bit slow, but I scripted it and left it to chug away...

Out of interest, 'cpan -i PDF::FromHTML' failed, due to 'cpan -i
HTML::Tidy' failing, with errors below. Not sure how to fix this - any
ideas? I tried googling on the error messages, installing tidy (in case
it's a missing library). I don't use Perl much, so don't know where to
start with these sort of problems.

Errors:

...
...
cp lib/HTML/Tidy.pm blib/lib/HTML/Tidy.pm
/usr/bin/perl /usr/share/perl/5.8/ExtUtils/xsubpp  -typemap
/usr/share/perl/5.8/ExtUtils/typemap  Tidy.xs > Tidy.xsc && mv Tidy.xsc
Tidy.c
Please specify prototyping behavior for Tidy.xs (see perlxs manual)
cc -c  -I. -I/usr/include/tidy -I/usr/local/include/tidy
-I/sw/include/tidy -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2   -DVERSION=\"1.06\"
-DXS_VERSION=\"1.06\" -fPIC "-I/usr/lib/perl/5.8/CORE"   Tidy.c
Tidy.xs:5:18: error: tidy.h: No such file or directory
Tidy.xs:6:20: error: buffio.h: No such file or directory
...
...
  make had returned bad status, install seems impossible

--
Sonia Hamilton. GPG key A8B77238.
.
One OS to rule them all, One OS to find them.
One OS to call them all, And in salvation bind them.
In the bright land of Linux, Where the hackers play. 

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] howto convert html to pdf?

2006-11-15 Thread Sonia Hamilton
* On Tue, Nov 14, 2006 at 06:04:11PM +1100, Adam Kennedy wrote:
> I just realized that my previous response didn't make it to the list.
> 
> Something like this should work for you.
> 
> http://search.cpan.org/~audreyt/PDF-FromHTML-0.20/script/html2pdf.pl
> 
> Should be just a case of...
> 
>   [EMAIL PROTECTED]:~$ cpan -i PDF::FromHTML
> 
>   [EMAIL PROTECTED]:~$ html2pdf.pl source.html > target.pdf

Thanks Adam. xulrunner + mozilla2ps gave me good results, so I used
that. A bit slow, but I scripted it and left it to chug away...

Out of interest, 'cpan -i PDF::FromHTML' failed, due to 'cpan -i
HTML::Tidy' failing, with errors below. Not sure how to fix this - any
ideas? I tried googling on the error messages, installing tidy (in case
it's a missing library). I don't use Perl much, so don't know where to
start with these sort of problems.

Errors:

...
...
cp lib/HTML/Tidy.pm blib/lib/HTML/Tidy.pm
/usr/bin/perl /usr/share/perl/5.8/ExtUtils/xsubpp  -typemap
/usr/share/perl/5.8/ExtUtils/typemap  Tidy.xs > Tidy.xsc && mv Tidy.xsc
Tidy.c
Please specify prototyping behavior for Tidy.xs (see perlxs manual)
cc -c  -I. -I/usr/include/tidy -I/usr/local/include/tidy
-I/sw/include/tidy -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2   -DVERSION=\"1.06\"
-DXS_VERSION=\"1.06\" -fPIC "-I/usr/lib/perl/5.8/CORE"   Tidy.c
Tidy.xs:5:18: error: tidy.h: No such file or directory
Tidy.xs:6:20: error: buffio.h: No such file or directory
...
...
  make had returned bad status, install seems impossible

--
Sonia Hamilton. GPG key A8B77238.
.
One OS to rule them all, One OS to find them.
One OS to call them all, And in salvation bind them.
In the bright land of Linux, Where the hackers play. 
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] howto convert html to pdf?

2006-11-14 Thread Steve Lindsay

On 11/14/06, Sonia Hamilton <[EMAIL PROTECTED]> wrote:

I (well my boss actually) want to convert several hundred html pages to
pdf - what's the easiest way to do this? Any pointers, ideas?



We're using Apache FOP for html -> pdf conversion. It might be
slightly more involved than the other tools people have suggested,
however it will give you a lot of control over the final pdf (via
xslt). We looked at some of the single-shot command line tools and
found that fop produced nicer documents, albeit with a few more steps.
It's Java though, this may or may not be an issue for you.

Link below gives some examples of how to use it from the command line,
as well as in code.

http://www.javaworld.com/javaworld/jw-04-2006/jw-0410-html.html?page=1

Home page is at http://xmlgraphics.apache.org/

If you do use it, grab the latest version, handles tables etc. much better.

Cheers.Steve
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] howto convert html to pdf?

2006-11-14 Thread Sonia Hamilton
* On Tue, Nov 14, 2006 at 06:01:03PM +1100, Michael Lake wrote:
> Well it depends on how well formed the HTML is. If its crap it could be 
> hard to create neat, good looking PDF. Here are possibilities:

* On Tue, Nov 14, 2006 at 06:04:11PM +1100, Adam Kennedy wrote:
> Something like this should work for you.
> http://search.cpan.org/~audreyt/PDF-FromHTML-0.20/script/html2pdf.pl

* On Tue, Nov 14, 2006 at 06:17:22PM +1100, Craige McWhirter wrote:
> One solution would be to use html2ps then ps2pdf. I at least use ps2pdf

* On Tue, Nov 14, 2006 at 06:48:22PM +1100, David Kempe wrote:
> Funnily enough we have had a project just recently with the exact same 

* On Tue, Nov 14, 2006 at 06:55:32PM +1100, Lindsay Holmwood wrote:
> I've used mozilla2ps before to do this, then hooked it into ps2pdf for

Thanks for all your suggestions everyone - I'll now get down and RTFM and
have a bit of a play :-)

Funny thing is that we're actually scraping from my company's intranet,
but for various reasons (ie politics...) it's easier to scrape the stuff
rather than trying to get it out of the central IT Department...

-- 
Sonia Hamilton. GPG key A8B77238.
.
One OS to rule them all, One OS to find them.
One OS to call them all, And in salvation bind them.
In the bright land of Linux, Where the hackers play. 
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] howto convert html to pdf?

2006-11-13 Thread Lindsay Holmwood
On Tue, Nov 14, 2006 at 05:42:59PM +1100, Sonia Hamilton wrote:
> I (well my boss actually) want to convert several hundred html pages to
> pdf - what's the easiest way to do this? Any pointers, ideas?
> 
> I guess I'm looking for a tool like pdf2html (but going in the reverse
> direction).
> 
> I've found a php module called html2pdf [1] - just wondering if there's
> a stand alone tool callable from the shell.
> 
> [1] http://directory.fsf.org/print/misc/html2pdf.html
> 

I've used mozilla2ps before to do this, then hooked it into ps2pdf for
the final output. html2ps doesn't really cut it, with the output looking
a bit wishy-washy if you're doing anything complicated (tables, divs,
etc). 

If you're looking for good quality output it's the only choice, however
be prepared to wait around a bit for it to generate the pages (it starts
up the entire gecko engine before loading the page, then it prints it)

You'll need xulrunner underneath, and the moz2ps website[0] explains
how to set up everything.

The syntax of xulrunner/moz2ps is a bit finicky, but if you follow the
configuration docs to the letter you should be ok.

Good luck!
Lindsay

[0] http://michele.pupazzo.org/mozilla2ps/


-- 
http://slug.org.au/ (Sydney Linux Users Group)
http://lca2007.linux.org.au/ (linux.conf.au 2007)
http://holmwood.id.au/~lindsay/ (me)
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] howto convert html to pdf?

2006-11-13 Thread David Kempe
Funnily enough we have had a project just recently with the exact same 
problem.

http://htmldoc.org/
htlmdoc is by the people who made CUPS and probably will do unless you 
have css up the wazoo or something more complex.
For the ultimate in rendering ability, embeded mozilla engine via 
http://michele.pupazzo.org/mozilla2ps/

then the run the output through ps2pdf.
moz2ps doesn't seem to work with stdin though (you should be ok).

those two choices are the easiest to use, though moz2ps seems to like a 
gui (gtk libraries) and sometimes it likes to run as root (tarball 
version) get ubuntu edgy or recent debian and you should be ok with that 
though.


dave

Sonia Hamilton wrote:

I (well my boss actually) want to convert several hundred html pages to
pdf - what's the easiest way to do this? Any pointers, ideas?

I guess I'm looking for a tool like pdf2html (but going in the reverse
direction).

I've found a php module called html2pdf [1] - just wondering if there's
a stand alone tool callable from the shell.

[1] http://directory.fsf.org/print/misc/html2pdf.html

  

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] howto convert html to pdf?

2006-11-13 Thread Craige McWhirter
On Tue, 2006-11-14 at 17:42 +1100, Sonia Hamilton wrote:

> I've found a php module called html2pdf [1] - just wondering if there's
> a stand alone tool callable from the shell.

One solution would be to use html2ps then ps2pdf. I at least use ps2pdf
daily and it's a good tool. Hopefully html2ps is too. 

--
Cheers,
  Craige.


signature.asc
Description: This is a digitally signed message part
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Re: [SLUG] howto convert html to pdf?

2006-11-13 Thread Adam Kennedy

I just realized that my previous response didn't make it to the list.

Something like this should work for you.

http://search.cpan.org/~audreyt/PDF-FromHTML-0.20/script/html2pdf.pl

Should be just a case of...

  [EMAIL PROTECTED]:~$ cpan -i PDF::FromHTML

  [EMAIL PROTECTED]:~$ html2pdf.pl source.html > target.pdf

Adam K

Sonia Hamilton wrote:

I (well my boss actually) want to convert several hundred html pages to
pdf - what's the easiest way to do this? Any pointers, ideas?

I guess I'm looking for a tool like pdf2html (but going in the reverse
direction).

I've found a php module called html2pdf [1] - just wondering if there's
a stand alone tool callable from the shell.

[1] http://directory.fsf.org/print/misc/html2pdf.html


--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] howto convert html to pdf?

2006-11-13 Thread Michael Lake

Sonia Hamilton wrote:

I (well my boss actually) want to convert several hundred html pages to
pdf - what's the easiest way to do this? Any pointers, ideas?

I guess I'm looking for a tool like pdf2html (but going in the reverse
direction).

I've found a php module called html2pdf [1] - just wondering if there's
a stand alone tool callable from the shell.

[1] http://directory.fsf.org/print/misc/html2pdf.html


Well it depends on how well formed the HTML is. If its crap it could be hard to 
create neat, good looking PDF. Here are possibilities:


1. If its got few tags and is very neat maybe use sed/bash scripting to convert 
it to
latex and run pdflatex on it. This is easily scriptable to do several hundred 
files.

2. Script Open Office. OpenOffice can take in HTML and produce PDF.

Mike
--
Michael Lake
Computational Research Support Unit
Science Faculty, UTS
Ph: 9514 2238



--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


[SLUG] howto convert html to pdf?

2006-11-13 Thread Sonia Hamilton
I (well my boss actually) want to convert several hundred html pages to
pdf - what's the easiest way to do this? Any pointers, ideas?

I guess I'm looking for a tool like pdf2html (but going in the reverse
direction).

I've found a php module called html2pdf [1] - just wondering if there's
a stand alone tool callable from the shell.

[1] http://directory.fsf.org/print/misc/html2pdf.html

-- 
Sonia Hamilton. GPG key A8B77238.
.
One OS to rule them all, One OS to find them.
One OS to call them all, And in salvation bind them.
In the bright land of Linux, Where the hackers play. 
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html