Re: [Lazarus] substr return wrong string with some utf8 char

Hans-Peter Diettrich Fri, 11 Feb 2011 06:16:18 -0800

José Mejuto schrieb:

If no checks about utf8 integrity are performed they should not be
that "lot slower", only a bit slower, at least utf8pos, utf8copy is
for sure slower.

I see no need for integrity checks, when the procedures are called withreasonable arguments. Before e.g. Copy can be called, the requiredparameters have to be determined, and *this* is where the use of theappropriate functions will automatically return valid arguments.

A different thing is that current implementation is a bit overengined
which add some overhead.

Is it logical/safe that utf8 functions do not check utf8 integrity ?
I'm talking about utf8pos, utf8copy, etc...

There exists no need for an utf8pos function, for use with an utf8copy,when Pos already returns the correct start index for Copy. Only thecount parameter deserves different handling in utf8copy - where thedetermination of the byte count can be done once, e.g. in an(UTF8)ByteCount function. Then Copy can allocate immediately therequested number of bytes, then move the same number of bytes. The useof the ByteCount function is not required when the end index is alreadyknown, from e.g. another Pos call.

It also would help to ensure text integrity when indexed access tobytes/chars in (MBCS/UTF) strings simply would be dropped. Then either adifferent string type or different access methods have to be used, atthe choice of the coder.


DoDi


--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Re: [Lazarus] substr return wrong string with some utf8 char

Reply via email to