According to Michael Olds:
> This is the next question: where can I get more information about how to
> accomplish what I need to do with this feature?
>
> As described previously: I have a special font, where I have substituted
> certain characters for characters with diacritical marks. I need to be able
> to say:
> ¡ = A; ¢ = a; £ = A; ¤ = a; ¥ = E; ¦ = e; § = I; ¨ = i; © = O; ª = o; « = U;
> ¬ = u; ¿ = D; À = d; ¯ = H; ° = h; ± = L; ² = l; ³ = m; ´ = M; µ = m; ¶ = M;
> · = m; ¸ = n; ¹ = N; º = n; » = N; ¼ = n; ½ = n; Á = T; Â = t
>
> My first reading of this attribute is that this only tells the program to
> allow a certain character to be recognized. I need to see an example of how
> it can be used to show that a certain character should be recognized as
> another character.
The extra_word_characters was really designed to deal with punctuation
characters that you want to remain as part of a word, and isn't really
that useful for dealing with letters. What htdig really needs is the
addition of two new config attributes, such as the following:
extra_word_casemap
This would allow you to define a set of upper-case letters and
their corresponding lower-case equivalents, to define character
sets not supported by any locale on your system. ht://Dig would
treat all these characters just like standard letters, and would
know how to map them to lower-case for true case-insensitive
matching.
accents_map
This would allow you to define a set of accented letters and
their corresponding unaccented equivalents, to define character
sets other than the ISO-8859-1 set of accented letters currently
hardcoded in the accents fuzzy algorithm. This map would override
the hardcoded table. Ideally, it would also allow one-to-many
and many-to-one equivalences, to deal with the typographic
conventions of some languages. It might also make sense to
extend the soundex and metaphone algorithms to use this accent
map to deal sensibly with accented letters.
This is something I've had in the back of my mind for quite some time,
but never found the time to implement these. If anyone wants to take
these ideas and run with them, I'm sure a lot of users dealing with broken
locale support or non-Western-European languages would be very grateful.
In the meantime, though, I think the only way to define a new set of
accented characters for non-ISO-8859-1 character sets is to edit the
table in htfuzzy/Accents.cc (which requires the accents.5 patch for
htdig 3.1.5). Michael, as you don't have the source for htdig on your
ISP's site, you're probably out of luck.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html