On 15.08.2017 10:34, Tony Whyman via Lazarus wrote: > On 14/08/17 17:47, Sven Barth via Lazarus wrote: >> The main problem of such a dynamic type would be the inability to do >> fast indexing as the compiler would need to insert runtime checks for >> the size of a character. I had already thought the same, but then had >> to discard the idea due to this. > > Is this really a big problem? It is not as if it would be necessary to > do a table lookup everytime you index a string as the indexing method > could be an attribute of the string and updated with the character > encoding attribute. Is it really that complicated for the compiler to > generate code that jumps to an indexing method depending upon a data > attribute?
In a tight loop where one accesss the string character by character (take Pos() for example) this will lead to a significant slowdown as the compiler (without optimizations) will have to insert a call to the lookup function for each access. While I generally don't consider performance degradation as a backwards compatibility issue I do in this case, due to the significant decrease in performance. Take this evaluation example: === code begin === program tperf; {$mode objfpc}{$H+} uses SysUtils; function lookup(const aStr: String; aIndex: SizeInt): Char; begin Result := aStr[aIndex]; end; var str: String; starttime, endtime: TDateTime; i, j: LongInt; begin SetLength(str, 10000); starttime := Now; for i := 0 to 10000 do for j := 1 to Length(str) do if str[j] <> '' then ; endtime := Now; Writeln('Direct: ', FormatDateTime('hh:nn:ss.zzz', endtime - starttime)); starttime := Now; for i := 0 to 10000 do for j := 1 to Length(str) do if lookup(str, j) <> '' then ; endtime := Now; Writeln('Lookup: ', FormatDateTime('hh:nn:ss.zzz', endtime - starttime)); end. === code end === === output begin === Direct: 00:00:01.766 Lookup: 00:00:02.061 === output end === While this example is of course artificial it nevertheless shows the slow down. > Is your problem really more about the result type as, depending on the > character width, the result could be an AnsiChar or WideChar or a UTF8 > character for which I don't believe there is a defined char type (other > than an arguable mis-use of UCS4Char)? That is indeed also a problem. I might not have had that one in mind with my mail above, but I did back then when I had brainstormed this. > I can accept that a clear up of this area would also have to extend to > the char types as well - but I would also argue that that is well > overdue. On a quick count, I found 7 different char types in the system > unit. And most important of all: any solution that is developed *MUST* be backwards compatible, so that means that in the least that type aliases would remain anyway. Regards, Sven -- _______________________________________________ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus