> According to David Adams:
> > I have been looking at synonym matching with htsearch and have been
> > customising the list of synonyms to fit my ideas.  I modified the text
file
> > common/synonym and ran htfuzzy to build a new synonym.db
> >
> > I found that not only were the new synonyms that I had added being acted
on,
> > but so were the old ones I had deleted!  So I offer this tip to those
using
> > synonym matching:
> >
> >     Delete the file common/synonym.db before you run htfuzzy synonyms.
> >
> > It seems that htfuzzy *adds* to the database file, rather than
re-writing it
> > as I assumed.
>
> I didn't know that.  Thanks for the heads-up!  Maybe the code should
delete
> the old database, they way an htdig -i does.
>
> > I want to use synonym matching purely for words with two or more
'correct'
> > spellings (eg colour & color,
> > medieval & mediaeval, etc.) of English words, and would like to hear
from
> > anyone with an alternative list of such synonyms.
> >
> > If anyone wants to see how I propose to handle mis-spellings, then go to
> > http://www.soton.ac.uk/ and try searching for "accomodation" or
"libary".
>
> Cool!  I assume these are static pages you put in manually for common
> misspellings.  Did you need any special tweaks to the code to get them to
> come up first, or to you just use a very high keyword factor?  Is there
> a trick to get these pages not to show up when the word is spelled
> correctly, or are they just given a low score and show up towards the end
> of the results?  It's interesting that accomodation still has 79 other
> hits, apart from the search assistance page - pretty common misspelling.
>
> --
> Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
> Spinal Cord Research Centre       WWW:
http://www.scrc.umanitoba.ca/~grdetil
> Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
> Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
>

They are manually created static pages, but with SSI so that the exact
wording you get depends on the link you followed to see them.

I spent some time looking at our spell checker log and also at the
mis-spellings in the htsearch log.  The vast majority of mis-spellings we
get are for a just a few words:

Accommodation
Archaeology
Business
Calendar
Library
Management
Pharmacology
Prospectus
Psychology

I shall continue to monitor the logs and add more words as necessary.  I
think there is a seasonal factor and I will need to observe over an entire
year.

We have used a keyword factor of 200 ever since we adopted Ht://Dig, and the
mis-spellings are all in the keywords meta to give them a high score, but
not in the visible text where they might cause confusion.  The correct
spellings are given a low score by double spacing in the titles and
headings, eg: L&nbsp;r&nbsp;a&nbsp;r&nbsp;y, and by soft hyphenation. eg:

Interested in <b>Arch&shy;ae&shy;ol&shy;ogy</b>?

where possible in the text.

My initial impulse for setting this up was noticing that at our site you
could search for "accomodation" and get enough hits not to realise you were
missing the important pages.  In a University we have very little control
over the contents of pages, and nor would we want it.  Getting people to
correct their mistakes is close to impossible.

David Adams
University of Southampton


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
Information: http://lists.sourceforge.net/lists/listinfo/htdig-general
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to