According to Alain Brion:
> At UNESCO, we use htdig to index all our WEB sites.
>
> In addition, we use BASIS to index the UNESCO documents. The documents are
> in PDF format in the UNESDOC database and are available on INTERNET :
> http://unesdoc.unesco.org/ulis
>
> UNESCO has the need to be able to display the part of document in wich the
> requested term(s) or expression is. At UNESCO, we call this function "to
> display the hits". You can see how it works on
> http://unesdoc.unesco.org/ulis : go to "all documents", type "little house"
> for "words from text", start search, and select a result in a green box :
> you can see, after a small delay, that the PDF document opens in the page
> connatining the words "little" or "house" and that these words are
> hightlighted. If there are several pages containg these words, you can
> browse the highlighted words with the two buttons at the right of the
> command buttons.
>
> BASIS has not this functionality, so I had to write the necessary code,
> using the indexing capabilities of BASIS.
Hmm. I did try this search on that site, and the one pdf I clicked on did
open up at a page other than the start, but the highlighted words were not
the ones in my query. Strange. (I'm using Acrobat Exchange 3.0, though,
and not Acrobat Reader 4.0, so that might have something to do with it.)
Anyway, I imagine that the key part of this functionality, which you had
to develop, is the CGI script that generates the XML code for Acrobat,
to tell it what to display. This should still work as before, I think,
but what will change is the way htdig's external parser or converter
for PDFs interfaces with your code. I imagine it will also require some
changes to htsearch too, to deal with the locations of the search words.
> Now, UNESCO is interrested in the same functionality in htdig for the pdf
> documents which are not in UNESDOC. UNESCO could possibily finance the
> development that I could do. Then, the result could be put in public domain
> as an improvment of htdig. I think that UNESCO needs to be shure that this
> functionality will be fully integrated in the htdig projet and will be
> mantained and available in future versions of htdig.
Yes, as long as it's implemented in a way that would be generally useful,
and portable, then we certainly would consider integrating it into the
main distribution. It would also have to be selectable as a configuration
option of some sort, so it's not forced onto anyone.
> I want to know if such a development is of interest for the htdig community,
> or if such a functionality is allready under developement. In any case, I
> cannot begin to work on the subject before april or may 2001.
Any development you do should be on the 3.2.0 code tree, and not 3.1.5,
as we don't plan to take 3.1.x any further. Geoff is hoping to release
3.2.0b3 in a few weeks, I think. In any case, I expect we'll be at least
at 3.2.0b4 by April, but at the rate things have been moving lately, who
knows?
I don't know of any such functionality currently under development,
but there is a stripped-down variation of it that a few people have
been using. It brings up the first page in the PDF that has a match,
but doesn't actually attempt to highlight the words. It can be found
at http://po.gaillard.free.fr/ but I wouldn't recommend using the
pdftodig.py script as a starting point, as it's pretty minimalist as an
external parser. It would be better to add the anchoring capability to
an external converter such as doc2html.pl, which comes with the contrib
code in the ht://Dig distribution.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.