According to Jeff Johnson:
> I have changed the excerpt_show_top to yes. If the page with
> the found word has a footnote, the footnote text is displayed on
> the search results page (the searched for word is not contained
> in the footnote text). Pages that contain the found word and no
> footnote, display the top of the page properly. We are searching
> pdf files. My external_parsers line is application/pdf->text/html
> /usr/local/bin/conv_doc.pl. Is this a problem with the parser or
> is there another configuration setting that I have overlooked or is
> undocumented.. Thanks in advance for your help.
My first guess would be that when conv_doc.pl dumps the text out of the
PDF files that have footnotes, it encounters that text first. PDF files
are essentially glorified print files, and there are no rules about
what part of the page gets printed first. If the application that made
the PDF file printed the footnote first, that's what comes out first.
You can change this behaviour by removing the "-raw" option from the
pdftotext command in conv_doc.pl, so that it uses its "coalescing"
feature to reorganise the text blocks into the order they appear on the
page, top to bottom. However, this isn't a good feature to use when
indexing multi-column PDFs, because the text gets indexed line by line
across the page, not column by column.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html