David Adams <[EMAIL PROTECTED]> wrote:
>> My problem is as follows:
>>
>>  - I have a bunch of files (doc, pdf, html, txt, whatever) that needs
>>    to be indexed.
>>  - I can't modify the files.
>>  - I want to enter additional keywords or comments for each file.
>>  - A keyword or comment match should be ranked higher than
>>    a match in the file itself
>>  - The keywords should be usable as meta data
>>
>> I first tried to write comments and keywords into files with the same
>> name and an added suffix (indexing starts with the auto generated
>> directory index). If a comment match occurs, i just have to strip the
>> suffice to know the correct file. But this solution is not very good:
>>  - Metadata and data is treated separate. That means:
>>    � One document will probably generate two results (file + file with
>>      metadata)
>>    � A match in the metadata file and the file doesn't rank higher
>>      than a match in one of them
>>    etc.
>>  - A comment match does not count more than a regular match
>>  - The keywords are not available as meta data
>>
>> Another approach was to present every file using a cgi script that reads
>> the data from the metafile and adds it into meta tags. But this means
>> that the cgi script has to convert every file to HTML, i would have to
>> duplicate the entire existing filter functionality. Bad.. And when i
>> only link to the file, it and it's metadata would be treated separate.
>>
>> Is there a better way to achieve my goals?
>
> I can suggest two possible solutions, either requires a far amount of work:
> 
> 1)    Create an .html file for each file you need to index, each file to
> contain metadata for indexing and a link to the file it refers to.  Then
> index these files.  Not a very good solution as far as end-users are
> concerned, but simple to do.

Ok. Thats fairly similar to my first idea. I will use this solution
only if there is really no alternative.

> 2) Write a converter script which will add the appropriate metadata
> to the output from doc2html.pl or whatever conversion script you are
> using, during the indexing process. A bit of challenge to write, but
> it should give you exactly what you want and will be transparent to
> end users.

Yes. But the problem is: How does the script get the metadata? I
would write a wrapper for doc2html.pl, that means that i have no
chance to detect which file htdig requested from the http server,
isn't it?


   --Nikolaus



-------------------------------------------------------
This sf.net email is sponsored by:
With Great Power, Comes Great Responsibility
Learn to use your power at OSDN's High Performance Computing Channel
http://hpc.devchannel.org/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to