At 14:56 29/10/99 -0500, you wrote:
>Yikes! I have a hard time believing that your patch_accents program would
>not start clobbering all sorts of data in db.docdb that it shouldn't.
>I'm assuming the whole point of this is to strip out the accents from
>the document excerpts, so that excerpt highlighting works for unaccented
>search words.
>If so, why not just strip out the accents on the fly in
>htsearch/Display.cc, before doing any searches on the excerpt, or
>better yet, just poke in some entries in the translate table, set in
>StringMatch::IgnoreCase() (in htlib/StringMatch.cc), to map accented
>letters to equivalent lower-case unaccented letters? The letter mapping
>in String.cc could also be done much more efficiently with a mapping
>table.
>The best approach, though, would be to define a new "accent" fuzzy match
>algorithm, which, when given a word, would search the word database
>for all accented and unaccented equivalents. The main engine of this
>would be very much like the current htfuzzy/Substring.cc algorithm.
>It would be more work, but you'd have something that would be selectable
>by the search_algorithm config attribute, and would fit in well with
>the existing code.
Gilles,
I agree with all of your remarks.
I have been also amazed by the fact that my patch_accent
was not totally corrupting de db file ;)
Looks like, ASCII codes modified are not used as separators and attributes.
Please note I took care of only modifying bytes that were in an ASCII
string :)
In fact I just have written this patch to match my purposes.
I made it public because after searching "accents french"
on the htdig site, I found a huge numbers of people trying
to get a solution ....
Don't be wrong, this patch is not an academic one,
it is a dirty and straightforward one (as I said on my page).
My point of vue, of a *good* patch is something like a conf file, let's
call it transcode.conf
which would contains characters equivalences.
this file would be used by htsearch and htfuzzy.
Best regards,
Salim
***********************************************
Salim Gasmi <http://www.gasmi.net>
System and network administrator.
SdV Plurimedia <http://www.sdv.fr>
PGP Key: http://www.gasmi.net/pgp.txt
***********************************************
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.