On Tue, 12 Feb 2002, Alois Treindl wrote:
> When I look into db.wordlist while it is generated by 'rundig',
> it mutilates all words containing an Umlaut or other accented
> character.
> 
> It treats an accented character is a word-splitting character, instead
> of mapping it to a non accented equivalent.
> 
> Example: German word 'ungem�tlich' (uncomfortable)
> creates wordlist entries
> ungem   i:1490  l:327   w:673   a:1
> tlich   i:1433  l:826   w:174   a:50
> 
> instead of mapping the umlaut to 'u' as in ungemutlich
> 
> Searching for 'ungem�tlich' results in no hit at all.

I solved PART of the problem myself in the meantime.
On my HPUX system, I recompiled htdig while I had set the environment
LANG=de_DE.iso88591
and I have added locale: de_DE.iso88591 to htdig.conf

Now the 8-bit characters are entered into wordlist, i.e.
ungem�tlich     i:310   l:327   w:673   a:1

and searching for 'ungem�tlich' works as well.

This is better, but it is not exactly what I want.
I would prefer if both, htdig and htsearch would do mapping to
'ungemutlich'

For example, on a US ascii keyboard it is difficult to enter the accented
characters into a search form. It would be an advantage
for users on such keyboards to have the mapping enabled.

Can I get the mapping activated somehow?

Alois


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to