Re: [htdig] Files and Metadate in other files

Nikolaus Rath Tue, 17 Dec 2002 07:42:08 -0800

Stefan Nehlsen <[EMAIL PROTECTED]> wrote:
> On Tue, Dec 17, 2002 at 12:30:42PM +0100, Nikolaus Rath wrote:
>> Hello!
>> 
>> My problem is as follows:
>> 
>>  - I have a bunch of files (doc, pdf, html, txt, whatever) that needs
>>    to be indexed. 
>>  - I can't modify the files. 
>>  - I want to enter additional keywords or comments for each file.
>>  - A keyword or comment match should be ranked higher than
>>    a match in the file itself
>>  - The keywords should be usable as meta data
>> 
> 
> [ not so good ideas deleted ]
>> 
>> Is there a better way to achieve my goals?
> 
> You already use a filter script (external_parsers) to handle non-html
> documents. 
> 
> Please read the documentation of both versions (3.1.x and 3.2.x) because
> they differ. The old style was to use a filter as a parser and the new
> style is to use it as a converter to html.
> 
> It is also possible to use an external_parser on html.
> 
> In this filter you may add code to edit the information you want.


Yes, that sounds like a good idea. But how can the script get the
metadata for each file it processes? I see two chances:

1. Using the URL passed by htdig. That means the script has to
   speak with the HTTP server itself. Bad. The script doesn't
   know about username/password etc as htdig does.

2. Encapsulated in the given file. That means that the http server
   would deliver all files as (for example) a tgz containing the file
   itself and its metadata using a cgi script. The external_parser
   would unpack the package and process the input as normal. Problem:
   MIME-Type of file is unknown.

Comments?

   --Nikolaus



-------------------------------------------------------
This sf.net email is sponsored by:
With Great Power, Comes Great Responsibility 
Learn to use your power at OSDN's High Performance Computing Channel
http://hpc.devchannel.org/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Re: [htdig] Files and Metadate in other files

Reply via email to