Stefan Nehlsen <[EMAIL PROTECTED]> wrote: > On Tue, Dec 17, 2002 at 12:30:42PM +0100, Nikolaus Rath wrote: >> Hello! >> >> My problem is as follows: >> >> - I have a bunch of files (doc, pdf, html, txt, whatever) that needs >> to be indexed. >> - I can't modify the files. >> - I want to enter additional keywords or comments for each file. >> - A keyword or comment match should be ranked higher than >> a match in the file itself >> - The keywords should be usable as meta data >> > > [ not so good ideas deleted ] >> >> Is there a better way to achieve my goals? > > You already use a filter script (external_parsers) to handle non-html > documents. > > Please read the documentation of both versions (3.1.x and 3.2.x) because > they differ. The old style was to use a filter as a parser and the new > style is to use it as a converter to html. > > It is also possible to use an external_parser on html. > > In this filter you may add code to edit the information you want.
Yes, that sounds like a good idea. But how can the script get the metadata for each file it processes? I see two chances: 1. Using the URL passed by htdig. That means the script has to speak with the HTTP server itself. Bad. The script doesn't know about username/password etc as htdig does. 2. Encapsulated in the given file. That means that the http server would deliver all files as (for example) a tgz containing the file itself and its metadata using a cgi script. The external_parser would unpack the package and process the input as normal. Problem: MIME-Type of file is unknown. Comments? --Nikolaus ------------------------------------------------------- This sf.net email is sponsored by: With Great Power, Comes Great Responsibility Learn to use your power at OSDN's High Performance Computing Channel http://hpc.devchannel.org/ _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

