I have installed ht//dig 3.1.6 on HPUX 10.20
my shell environment LANG setting is (and was during compilation)
LANG=C.iso88591

ht//dig does not process accented German and French characters
correctly.

I am trying to index a multi-lingual website which contains a mixture
of English,
German, French and other languages.

When I look into db.wordlist while it is generated by 'rundig',
it mutilates all words containing an Umlaut or other accented
character.

It treats an accented character is a word-splitting character, instead
of mapping it to a non accented equivalent.

Example: German word 'ungem�tlich' (uncomfortable)
creates wordlist entries
ungem   i:1490  l:327   w:673   a:1
tlich   i:1433  l:826   w:174   a:50

instead of mapping the umlaut to 'u' as in ungemutlich

Searching for 'ungem�tlich' results in no hit at all.

What do I have to do to get the correct behaviour, so that
both rundig/htdig do the accented character mapping,
and searching does the same mapping on the search terms before
it searches?

I have noticed that there is a description to solve this
problem in French, but my French is insufficient to understand it.
Also, I got the impression that this author was dealing with a
French-only website.

I am dealing with a multi-language site, and I do not mind if 
I get a few more search hits due to mapping, e.g. both
'Z�rich' and 'Zurich' are mapped to 'zurich' and a search
for 'Z�rich' gives also those pages containing the English
spelling 'Zurich'.

I hope that there is a howto-description somewhere, and I
did just not find it.

-- 
|| Alois Treindl,  Astrodienst AG,  mailto:[EMAIL PROTECTED]
|| Zollikon/Zurich, Switzerland

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to