On Wed, Apr 06, 2005 at 12:36:26PM +0200, Leonard den Ottolander wrote: > > > * the _size_ of a string (as well as for other objects) is the number of > > > bytes that is allocated for it. For arrays, it is the number of > > > entries of the array. For strings it is at least _length_ + 1. > > > > > > * the _length_ of a string is the number of characters in it, excluding > > > the terminating '\0'. > > > It seems to me that this terminology is not yet multibyte-aware. Since UTF-8 > > becomes an everyday issue and AFAIR is planned for mainstream mc 4.7.0, IMHO > > it is very important to create a clear terminology for this even if it's not > > yet officially implemented now. > > It seems you haven't read Roland's post very well. He clearly > differentiates between size (raw number of bytes) and length (number of > characters represented on the screen). From discussions with him I know > he writes this post explicitly with multibyte charsets in mind. "ecs" in > ecssup.{c,h} stands for "extended charset". > > Or am I missing your point?
No, it seems that I missed Roland's point. Roland says that size >= length + 1. Just to clarify things: I guess there are two completely different reasons why size can be greater than (and not equal to) length + 1. a) One can allocate a larger buffer than strlen+1. For example, x=malloc(10); strcpy(x, "asdf"); in this example length is 4, size is 10. Or is size==5 in this case? b) Each multibyte character (e.g. any accented letters in UTF-8) counts as 1 for length, but at least two for size. Am I right? -- Egmont _______________________________________________ Mc-devel mailing list http://mail.gnome.org/mailman/listinfo/mc-devel