Tom Tromey writes:
 > I wasn't planning to do it at all.  When using Utf-8, you can simply
 > use the ordinary strcmp, strncmp, etc.  unicode_strlen is special as
 > it returns the number of characters (not bytes) in the string.

 Yes, and that's a very good reason for choosing UTF-8 as an internal
charset. However functions like strndup or strncmp and in general
string functions that require to move to the Nth character have a
problem with UTF-8 and alternatives functions must be
re-implemented. Same problem for functions like strchr since the char
argument must be a string and not a char for UTF-8 sequences that are
more than one char. And there is a problem with printf too when you use
the %.*s sequence, for instance. There also is an issue regarding case
transformation for strcasecmp and others.

 > There is no mailing list.  Hari tends to fix config/dist problems
 > only.  CC'ing Robert Brady and Jim Blandy would be good though.

 Ok, I'll keep that in mind. I understand that the master CVS 
site is gnome.org. 

 > Loic> What about regular expressions ? Is libunicode used in a regular
 > Loic> expression engine that we could use (either C or C++ ?).
 > 
 > Henry Spencer's latest regexp package will deal with Utf-8.  This is
 > what Tcl uses.

 Good. I've not been able to find the cannonical distribution of this
latest regexp package though. Does it exist ?

       Cheers,

-- 
                Loic Dachary

                ECILA
                100 av. du Gal Leclerc
                93500 Pantin - France
                Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
                e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to