-----Original Message----- From: Jim <[EMAIL PROTECTED]> To: Steve Yeazel <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] Date: Thu, 15 Apr 2004 22:21:48 -0600 (MDT) Subject: Re: [htdig] please clue me in! doc2html.pl
> On Thu, 15 Apr 2004, Steve Yeazel wrote: > > > Deleted, no excerpt: 50/http://www.domain.com/download/xxxx.pdf > ... > > Read 8192 from document > > Read 1903 from document > > Read a total of 51055 bytes > > size = 51055 > > > > I've confirmed that pdf2html.pl and pdftotext both work from the > command line. > > Have you confirmed this using the PDFs that are giving you problems > when > indexing? yes, absolutely. > > > doc2html.pl just spits out garbage in between the html tags when I > try to > > convert a pdf on the command line with it. > > Some PDFs serve as little more than a wrapper around a set of images. > If > you haven't confirmed that this is not the case, that should probably > be > the next thing that you check. the pdf's in question are pdf's that I created myself by printing MS .doc files to the acrobat distiller. I would bet my life on the fact that they are not a set of images. After doc2html parses pdf files should there be a physical html file somewhere that is the pdf in html form? I'm confused... > > Jim > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > ht://Dig general mailing list: <[EMAIL PROTECTED]> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

