On Thu, 15 Apr 2004, Steve Yeazel wrote: > Deleted, no excerpt: 50/http://www.domain.com/download/xxxx.pdf ... > Read 8192 from document > Read 1903 from document > Read a total of 51055 bytes > size = 51055 > > I've confirmed that pdf2html.pl and pdftotext both work from the command line.
Have you confirmed this using the PDFs that are giving you problems when indexing? > doc2html.pl just spits out garbage in between the html tags when I try to > convert a pdf on the command line with it. Some PDFs serve as little more than a wrapper around a set of images. If you haven't confirmed that this is not the case, that should probably be the next thing that you check. Jim ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

