Loic>  Yes, and that's a very good reason for choosing UTF-8 as an
Loic> internal charset. However functions like strndup or strncmp and
Loic> in general string functions that require to move to the Nth
Loic> character have a problem with UTF-8 and alternatives functions
Loic> must be re-implemented.

The "N" argument to these functions refers to bytes, not characters.
You can still use these same functions; you just need a way to map the
character number to the byte number.

I wouldn't be averse to implementing a function like that.  In fact it
would be very useful.  What should it be called?

Loic> Same problem for functions like strchr since the char argument
Loic> must be a string and not a char for UTF-8 sequences that are
Loic> more than one char.

That's true.  Ideally I suppose the char would be a unicode_char_t,
and internally we could convert and use strstr or something.
I'll add this to TODO.

Loic> There also is an issue regarding case transformation for
Loic> strcasecmp and others.

I'm also adding that to TODO.

Loic>  Ok, I'll keep that in mind. I understand that the master CVS
Loic> site is gnome.org.

Yes.

Loic>  Good. I've not been able to find the cannonical distribution of
Loic> this latest regexp package though. Does it exist ?

It does exist but offhand I don't know where to get it.  A modified
version appears in Tcl; maybe it has a reference to the original.

Tom

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to