David Adams <[EMAIL PROTECTED]> wrote: >> My problem is as follows: >> >> - I have a bunch of files (doc, pdf, html, txt, whatever) that needs >> to be indexed. >> - I can't modify the files. >> - I want to enter additional keywords or comments for each file. >> - A keyword or comment match should be ranked higher than >> a match in the file itself >> - The keywords should be usable as meta data >> >> I first tried to write comments and keywords into files with the same >> name and an added suffix (indexing starts with the auto generated >> directory index). If a comment match occurs, i just have to strip the >> suffice to know the correct file. But this solution is not very good: >> - Metadata and data is treated separate. That means: >> � One document will probably generate two results (file + file with >> metadata) >> � A match in the metadata file and the file doesn't rank higher >> than a match in one of them >> etc. >> - A comment match does not count more than a regular match >> - The keywords are not available as meta data >> >> Another approach was to present every file using a cgi script that reads >> the data from the metafile and adds it into meta tags. But this means >> that the cgi script has to convert every file to HTML, i would have to >> duplicate the entire existing filter functionality. Bad.. And when i >> only link to the file, it and it's metadata would be treated separate. >> >> Is there a better way to achieve my goals? > > I can suggest two possible solutions, either requires a far amount of work: > > 1) Create an .html file for each file you need to index, each file to > contain metadata for indexing and a link to the file it refers to. Then > index these files. Not a very good solution as far as end-users are > concerned, but simple to do.
Ok. Thats fairly similar to my first idea. I will use this solution only if there is really no alternative. > 2) Write a converter script which will add the appropriate metadata > to the output from doc2html.pl or whatever conversion script you are > using, during the indexing process. A bit of challenge to write, but > it should give you exactly what you want and will be transparent to > end users. Yes. But the problem is: How does the script get the metadata? I would write a wrapper for doc2html.pl, that means that i have no chance to detect which file htdig requested from the http server, isn't it? --Nikolaus ------------------------------------------------------- This sf.net email is sponsored by: With Great Power, Comes Great Responsibility Learn to use your power at OSDN's High Performance Computing Channel http://hpc.devchannel.org/ _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

