According to Michael Olds:
> This is the next question: where can I get more information about how to
> accomplish what I need to do with this feature?
> 
> As described previously: I have a special font, where I have substituted
> certain characters for characters with diacritical marks. I need to be able
> to say:
> ¡ = A; ¢ = a; £ = A; ¤ = a; ¥ = E; ¦ = e; § = I; ¨ = i; © = O; ª = o; « = U;
> ¬ = u; ¿ = D; À = d; ¯ = H; ° = h; ± = L; ² = l; ³ = m; ´ = M; µ = m; ¶ = M;
> · = m; ¸ = n; ¹ = N; º = n; » = N; ¼ = n; ½ = n; Á = T; Â = t
> 
> My first reading of this attribute is that this only tells the program to
> allow a certain character to be recognized. I need to see an example of how
> it can be used to show that a certain character should be recognized as
> another character.

The extra_word_characters was really designed to deal with punctuation
characters that you want to remain as part of a word, and isn't really
that useful for dealing with letters. What htdig really needs is the
addition of two new config attributes, such as the following:

extra_word_casemap
        This would allow you to define a set of upper-case letters and
        their corresponding lower-case equivalents, to define character
        sets not supported by any locale on your system. ht://Dig would
        treat all these characters just like standard letters, and would
        know how to map them to lower-case for true case-insensitive
        matching.

accents_map
        This would allow you to define a set of accented letters and
        their corresponding unaccented equivalents, to define character
        sets other than the ISO-8859-1 set of accented letters currently
        hardcoded in the accents fuzzy algorithm. This map would override
        the hardcoded table. Ideally, it would also allow one-to-many
        and many-to-one equivalences, to deal with the typographic
        conventions of some languages.  It might also make sense to
        extend the soundex and metaphone algorithms to use this accent
        map to deal sensibly with accented letters.

This is something I've had in the back of my mind for quite some time,
but never found the time to implement these. If anyone wants to take
these ideas and run with them, I'm sure a lot of users dealing with broken
locale support or non-Western-European languages would be very grateful.

In the meantime, though, I think the only way to define a new set of
accented characters for non-ISO-8859-1 character sets is to edit the
table in htfuzzy/Accents.cc (which requires the accents.5 patch for
htdig 3.1.5). Michael, as you don't have the source for htdig on your
ISP's site, you're probably out of luck.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to