On 21 Nov 2008, at 16:16, Michael Schnell wrote:

So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü') would be 1.
Or 2, depending on whether it's predcomposed or decomposed.
I seem to remember that we discussed this some time ago and the result was that the compose (MAC style ?)

Decomposed and precomposed have nothing to do with Windows vs Mac OS X vs Linux vs whatever. They are both equally valid ways to represent UTF strings and both have their uses (on all platforms). All programs should also be prepared to deal with them, since you never know what kind of input you will get.

characters in fact are a single code point (Unicode character) that consists of two (maybe more ? ) complete code points that are tied together by some special coding, so IMHO it can be considered as a single Unicode character in both cases. If this would result in a huge table of possibly composed characters I thing we would stick to the concept of providing a decent functionality and restrict on those that are currently used by the "customers" we normally address (Mac in Europe and America).

I think you are talking about a different "we". Further, inventing our own meanings of what a "code point" or "unicode character" means is an extremely bad idea (you'd also have to rename UTF*Point* routines to UTF*FPCLikeChar* so they properly indicate the fact that they do not deal with code points). UTF by itself already has enough variations to deal with, we will not add our own.

which does not make sense if UTF8PointLength(utfstring_1) is smaller than UTF8PointLength(utfstring_2).
It does not make any sense under any circumstances, because there is no way for "UTF8PointSetLength" to know how many bytes it has to allocate when you pass a value (any value, regardless of where it comes from) to it.
If UTF8PointLength(utfstring_1) is greater than UTF8PointLength(utfstring_2) no new bytes need to be allocated

but the function is just equivalent to

utfstring1 := UTF8PointCopy(utfstring1, 1, UTF8PointLength(utfstring_2));

To me this does not seem to impose any problem.

Except if the point is to reserve exactly enough space for utfstring1 and to overwrite its contents with something else afterwards (using move() or whatever). That's a very common use of setlength (at least in the FPC run time library, and I guess elsewhere as well). The fact that it also doesn't work if the string has to be made longer is basically the same problem.

Your system just does not work, and the more examples you give the more it falls down, as far as I can see. Please first write a wiki page explaining how to deal with all cases, or at least noting which cases will not work. Only then it is possible to decide on whether or not it is both feasible and worthwhile to go through the trouble of implementing all this. Without it, I feel I am mainly wasting my time writing these mails because it seems you haven't thought it through yet at all.


Jonas_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to