According to Alexander I. Lebedev:
> I was using the HTDig for many years and I've just switched from 3.20b4 to
> 3.20b5 HTDig.  I don't know why, but I discovered a lot of locale-related
> problems (I guess they may be due to the upgrade of gcc).
> 
> I'm running Slackware-9.1 with gcc 3.2.3 and the system locale=ru_RU.KOI8-R
> (LANG=ru_RU.KOI8-R).  The HTDig config file's locale is ru_RU.KOI8-R, too.
> The system locale files were generated from scratch (the i18 library).
> 
> It appeared that many basic C calls in HTDig code (like tolower, toupper,
> isalpha, etc) give wrong results.  For example, tolower doesn't transform
> capital Russian letters to lowercase.  As a result, I can't find any word
> in the Russian text, which starts with a capital letter (English capitals
> are OK because locale doesn't influence the 032-127 ascii codes in the C
> functions).  Besides, for every Russian word in the stop list I get an
> error message "ignored because is NOALPHA".
> 
> Simple testing C programs convinced me that the C functions in gcc 3.2.3
> are working properly with the Russian locale, and so the problems come
> from bugs in HTDig code.  I've solved some of the problems by adding the
> setlocale(LC_ALL, "") command in two places in htlib/String.cc file, but
> there are still a lot of other problems.  Can I ask the authors to correct
> these bugs and issue the corrected code?

I don't know that it's a problem strictly with htdig and not with
Slackware.  I know that locale support does work well in htdig (3.1.6
and 3.2.0b5) on many Linux distributions, as long you stick to 8-bit
characters (i.e. not utf-8 locales).  I also know that there were a
lot of locale problems on Linux distributions that didn't adopt glibc,
and that even years after most had adopted it, Slackware still hadn't.
I don't know what the status of that is now, though.

There are a few tricks that have worked on some other OSes, which may
help you too, if you want to give these a try:

1) Change the LC_ALL in the setlocale call on line 188 in
htlib/Configuration.cc (in Configuration::AddParsed()) to LC_CTYPE.

2) Remove or comment out the setlocale(LC_TIME, "C"); call on line 196.

I don't expect that adding setlocale(LC_ALL, "") here and there in the
code will help, and indeed this is more likely to cause the code to
ignore/override the locale attribute in your config file.

See also http://www.htdig.org/FAQ.html#q5.8 (esp. the last paragraph).
Note also that htdig pays no attention to the LANG environment variable,
nor any other locale-related variable - only to the locale attribute in
your config file.

> Best regards,
> - Alexander
> 
> P.S.  I guess there are serious problems for an user who want to use HTDig
> for multilanguage support because the system locale (LANG) can point only
> one language (in addition to English).  Probably, different settings of
> LC_CTYPE in the config file may help.

While the locale can point to only one language, htdig is really only
concerned about the LC_CTYPE definition for that language, which generally
will apply to all other languages that use the same character encoding.
So, htdig can indeed simultaneously index files in multiple languages,
as long as they share the same 8-bit encoding.  (See the 2nd paragraph
of FAQ 5.8.)

The only way to get htdig to handle different encodings simultaneously
would be to map them all to a common, non-8-bit encoding (e.g. UTF-8),
but it will take a huge effort to revise htdig to do this.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to