On Thu, 15 Apr 2004, Steve Yeazel wrote:

> Deleted, no excerpt: 50/http://www.domain.com/download/xxxx.pdf
...
> Read 8192 from document
> Read 1903 from document
> Read a total of 51055 bytes
>   size = 51055
> 
> I've confirmed that pdf2html.pl and pdftotext both work from the command line.

Have you confirmed this using the PDFs that are giving you problems when
indexing?

> doc2html.pl just spits out garbage in between the html tags when I try to 
> convert a pdf on the command line with it.

Some PDFs serve as little more than a wrapper around a set of images. If
you haven't confirmed that this is not the case, that should probably be
the next thing that you check.

Jim


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to