According to [EMAIL PROTECTED]: > Dear Colleagues, > I used the htdig under potato linux. In August I upgraded it to woody. > Its version now: htdig 3.1.6-3 > > > I set the binaries in the doc2html.pl this way: > > word2x (I got a lot of errors when using catdoc) > pstotext > pdf2html.pl > ppthtml (this makes huge memory leakage errors...) > > The characteristic line inside the doc2html.pl is: > my $PDF2HTML = '/usr/share/htdig/pdf2html.pl'; > ------------- > A strange phenomenon is that my htdig changes the "pdf" extension for > "doc". One example: > > Microsoft Word - Activity Report July2002.doc * * > > The real name of the file is: "Activity Report July2002.pdf" and this is > correct (the name consists of spaces).
I assume you mean "Microsoft Word - Activity Report July2002.doc" shows up as the title in search results. This is a common thing with PDF files. We have a lot that are made from WordPerfect documents, and we'd get things like "D:\......\foo.wpd" showing up as titles. This is because when the PDF is generated from a word processing document, the PDF's title field is set to the original document file name. It can easily be changed afterward in Acrobat Exchange, but this step is often forgotten. pdf2html.pl extracts the PDF's title field using the pdfinfo utility, but it's only giving you what's already in the PDF. It's only if the title field is empty that pdf2html.pl will use the PDF file name as the title. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This sf.net email is sponsored by: See the NEW Palm Tungsten T handheld. Power & Color in a compact size! http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

