Hi,
I was just scanning through some log files and wondered if there
is already some way to configure ht://Dig to recognize abbreviations.
The synonym fuzzy does something similar, but is not exactly the same,
since the abbreviation can translate into a phrase.
I can think of two ways implementing this and I feel that both of them
should be implemented:
1) Recognizing HTML 4.0 "title" attributes in abbreviations and acronyms:
This might get tricky (I haven't studied the database stuff so far),
because there is more than one expression that will trigger a positive
query result.
I somehow feel that this could be of great help for improving search
results on pages with scientific or legal contents, but will require
some work by the authors who should correctly include the "title" at-
tributes.
2) A fuzzy database similar (but not equal to) the synonym fuzzy:
The source layout for the input text file is trivial (acronym
or abbreviation followed by its corresponding full text on each
line).
If implementing both, there could be a global database for those
documents that don't care about "title" attributes plus a local
(maybe dynamically extendible in a way that an acronym or abbreviation
needs only to be explained once?) database for the stuff defined
within the document.
Sadly enough, phrase search is neccessary to get this going, but
maybe it is a nice idea of further extending the usefulness of
phrase searching in upcoming releases?
comments anyone?
Torsten
--
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstra�e 14 Tel: +49-4101-403605
D-25474 Ellerbek Fax: +49-4101-403606
E-Mail: [EMAIL PROTECTED] Internet: http://www.inwise.de
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.