Tom Tromey writes:
> I wasn't planning to do it at all. When using Utf-8, you can simply
> use the ordinary strcmp, strncmp, etc. unicode_strlen is special as
> it returns the number of characters (not bytes) in the string.
Yes, and that's a very good reason for choosing UTF-8 as an internal
charset. However functions like strndup or strncmp and in general
string functions that require to move to the Nth character have a
problem with UTF-8 and alternatives functions must be
re-implemented. Same problem for functions like strchr since the char
argument must be a string and not a char for UTF-8 sequences that are
more than one char. And there is a problem with printf too when you use
the %.*s sequence, for instance. There also is an issue regarding case
transformation for strcasecmp and others.
> There is no mailing list. Hari tends to fix config/dist problems
> only. CC'ing Robert Brady and Jim Blandy would be good though.
Ok, I'll keep that in mind. I understand that the master CVS
site is gnome.org.
> Loic> What about regular expressions ? Is libunicode used in a regular
> Loic> expression engine that we could use (either C or C++ ?).
>
> Henry Spencer's latest regexp package will deal with Utf-8. This is
> what Tcl uses.
Good. I've not been able to find the cannonical distribution of this
latest regexp package though. Does it exist ?
Cheers,
--
Loic Dachary
ECILA
100 av. du Gal Leclerc
93500 Pantin - France
Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.