I can suggest two possible solutions, either requires a far amount of work:

1)    Create an .html file for each file you need to index, each file to
contain metadata for indexing and a link to the file it refers to.  Then
index these files.  Not a very good solution as far as end-users are
concerned, but simple to do.

2)    Write a converter script which will add the appropriate metadata to
the output from doc2html.pl or whatever conversion script you are using,
during the indexing process.  A bit of challenge to write, but it should
give you exactly what you want and will be transparent to end users.

David Adams
Southampton University

----- Original Message -----
From: "Nikolaus Rath" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, December 17, 2002 11:30 AM
Subject: [htdig] Files and Metadate in other files


> Hello!
>
> My problem is as follows:
>
>  - I have a bunch of files (doc, pdf, html, txt, whatever) that needs
>    to be indexed.
>  - I can't modify the files.
>  - I want to enter additional keywords or comments for each file.
>  - A keyword or comment match should be ranked higher than
>    a match in the file itself
>  - The keywords should be usable as meta data
>
> I first tried to write comments and keywords into files with the same
> name and an added suffix (indexing starts with the auto generated
> directory index). If a comment match occurs, i just have to strip the
> suffice to know the correct file. But this solution is not very good:
>  - Metadata and data is treated separate. That means:
>    � One document will probably generate two results (file + file with
>      metadata)
>    � A match in the metadata file and the file doesn't rank higher
>      than a match in one of them
>    etc.
>  - A comment match does not count more than a regular match
>  - The keywords are not available as meta data
>
> Another approach was to present every file using a cgi script that reads
> the data from the metafile and adds it into meta tags. But this means
> that the cgi script has to convert every file to HTML, i would have to
> duplicate the entire existing filter functionality. Bad.. And when i
> only link to the file, it and it's metadata would be treated separate.
>
> Is there a better way to achieve my goals?
>
> Thanks for all hints,
>
>  - Nikolaus
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:
> With Great Power, Comes Great Responsibility
> Learn to use your power at OSDN's High Performance Computing Channel
> http://hpc.devchannel.org/
> _______________________________________________
> htdig-general mailing list <[EMAIL PROTECTED]>
> To unsubscribe, send a message to
<[EMAIL PROTECTED]> with a subject of unsubscribe
> FAQ: http://htdig.sourceforge.net/FAQ.html
>
>



-------------------------------------------------------
This sf.net email is sponsored by:
With Great Power, Comes Great Responsibility
Learn to use your power at OSDN's High Performance Computing Channel
http://hpc.devchannel.org/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to