-----Original Message-----
From: Jim <[EMAIL PROTECTED]>
To: Steve Yeazel <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Date: Thu, 15 Apr 2004 22:21:48 -0600 (MDT)
Subject: Re: [htdig] please clue me in!  doc2html.pl

> On Thu, 15 Apr 2004, Steve Yeazel wrote:
> 
> > Deleted, no excerpt: 50/http://www.domain.com/download/xxxx.pdf
> ...
> > Read 8192 from document
> > Read 1903 from document
> > Read a total of 51055 bytes
> >   size = 51055
> > 
> > I've confirmed that pdf2html.pl and pdftotext both work from the
> command line.
> 
> Have you confirmed this using the PDFs that are giving you problems
> when
> indexing?
 
yes, absolutely.

> 
> > doc2html.pl just spits out garbage in between the html tags when I
> try to 
> > convert a pdf on the command line with it.
> 
> Some PDFs serve as little more than a wrapper around a set of images.
> If
> you haven't confirmed that this is not the case, that should probably
> be
> the next thing that you check.
 
the pdf's in question are pdf's that I created myself by printing MS .doc 
files to the acrobat distiller.  I would bet my life on the fact that they 
are not a set of images.  After doc2html parses pdf files should there be a 
physical html file somewhere that is the pdf in html form?  I'm confused...

> 
> Jim
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> ht://Dig general mailing list: <[EMAIL PROTECTED]>
> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-general




-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to